pax_global_header00006660000000000000000000000064142011011520014475gustar00rootroot0000000000000052 comment=635157ea802c95b5f9de740eeef2e92fb296515c ROCR-Runtime-rocm-5.0.0/000077500000000000000000000000001420110115200146235ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/LICENSE.txt000066400000000000000000000033241420110115200164500ustar00rootroot00000000000000The University of Illinois/NCSA Open Source License (NCSA) Copyright (c) 2014-2017, Advanced Micro Devices, Inc. All rights reserved. Developed by: AMD Research and AMD HSA Software Development Advanced Micro Devices, Inc. www.amd.com Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal with the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimers. - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimers in the documentation and/or other materials provided with the distribution. - Neither the names of Advanced Micro Devices, Inc, nor the names of its contributors may be used to endorse or promote products derived from this Software without specific prior written permission. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE SOFTWARE. ROCR-Runtime-rocm-5.0.0/README.md000066400000000000000000000121501420110115200161010ustar00rootroot00000000000000### HSA Runtime API and runtime for ROCm This repository includes the user-mode API interfaces and libraries necessary for host applications to launch compute kernels to available HSA ROCm kernel agents. Reference source code for the core runtime is also available. #### Initial target platform requirements * CPU: Intel Haswell or newer, Core i5, Core i7, Xeon E3 v4 & v5; Xeon E5 v3 * GPU: Fiji ASIC (AMD R9 Nano, R9 Fury and R9 Fury X) * GPU: Polaris ASIC (AMD RX480) #### Source code The HSA core runtime source code for the ROCR runtime is located in the src subdirectory. Please consult the associated README.md file for contents and build instructions. #### Binaries for Ubuntu & Fedora and installation instructions Pre-built binaries are available for installation from the ROCm package repository. For ROCR, they include: Core runtime package: * HSA include files to support application development on the HSA runtime for the ROCR runtime * A 64-bit version of AMD's HSA core runtime for the ROCR runtime Runtime extension package: * A 64-bit version of AMD's runtime tools library * A 64-bit version of AMD's runtime image library The contents of these packages are installed in /opt/rocm/hsa and /opt/rocm by default. The core runtime package depends on the hsakmt-roct-dev package Installation instructions can be found in the ROCm manifest repository README.md: https://github.com/RadeonOpenCompute/ROCm #### Infrastructure The HSA runtime is a thin, user-mode API that exposes the necessary interfaces to access and interact with graphics hardware driven by the AMDGPU driver set and the ROCK kernel driver. Together they enable programmers to directly harness the power of AMD discrete graphics devices by allowing host applications to launch compute kernels directly to the graphics hardware. The capabilities expressed by the HSA Runtime API are: * Error handling * Runtime initialization and shutdown * System and agent information * Signals and synchronization * Architected dispatch * Memory management * HSA runtime fits into a typical software architecture stack. The HSA runtime provides direct access to the graphics hardware to give the programmer more control of the execution. An example of low level hardware access is the support of one or more user mode queues provides programmers with a low-latency kernel dispatch interface, allowing them to develop customized dispatch algorithms specific to their application. The HSA Architected Queuing Language is an open standard, defined by the HSA Foundation, specifying the packet syntax used to control supported AMD/ATI Radeon (c) graphics devices. The AQL language supports several packet types, including packets that can command the hardware to automatically resolve inter-packet dependencies (barrier AND & barrier OR packet), kernel dispatch packets and agent dispatch packets. In addition to user mode queues and AQL, the HSA runtime exposes various virtual address ranges that can be accessed by one or more of the system's graphics devices, and possibly the host. The exposed virtual address ranges either support a fine grained or a coarse grained access. Updates to memory in a fine grained region are immediately visible to all devices that can access it, but only one device can have access to a coarse grained allocation at a time. Ownership of a coarse grained region can be changed using the HSA runtime memory APIs, but this transfer of ownership must be explicitly done by the host application. Programmers should consult the HSA Runtime Programmer's Reference Manual for a full description of the HSA Runtime APIs, AQL and the HSA memory policy. #### Known issues * Each HSA process creates an internal DMA queue, but there is a system-wide limit of four DMA queues. When the limit is reached HSA processes will use internal kernels for copies. #### Disclaimer The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD's products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Copyright (c) 2014-2021 Advanced Micro Devices, Inc. All rights reserved. ROCR-Runtime-rocm-5.0.0/src/000077500000000000000000000000001420110115200154125ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/CMakeLists.txt000066400000000000000000000533521420110115200201620ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2014-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ cmake_minimum_required ( VERSION 3.7 ) ## Clear target dependency data. ## Needed to allow UI transitions between static and dynamic builds. ## Need an update to CMake 3.12 to remove this hack. See CMake policy change CMP0073. unset ( hsa-runtime64_LIB_DEPENDS CACHE ) ## Set core runtime module name and project name. set ( CORE_RUNTIME_NAME "hsa-runtime64" ) set ( CORE_RUNTIME_TARGET "${CORE_RUNTIME_NAME}" ) set ( CORE_RUNTIME_LIBRARY "lib${CORE_RUNTIME_TARGET}" ) ## Set project name project( ${CORE_RUNTIME_TARGET} ) ## Utilty functions list ( APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake_modules" ) include ( utils ) include ( hsa_common ) include ( GNUInstallDirs ) ## Expose static library option if ( NOT DEFINED BUILD_SHARED_LIBS ) set ( BUILD_SHARED_LIBS ON ) endif() set ( BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS} CACHE BOOL "Build shared library (.so) or not.") ## Adjust target name for static builds ## Original name will be an interface target that adds --whole-archive linker options around the target. if( NOT ${BUILD_SHARED_LIBS} ) set ( CORE_RUNTIME_TARGET "${CORE_RUNTIME_TARGET}_static" ) endif() # Optionally, build HSA Runtime with ccache. set(ROCM_CCACHE_BUILD OFF CACHE BOOL "Set to ON for a ccache enabled build") if (ROCM_CCACHE_BUILD) find_program(CCACHE_PROGRAM ccache) if (CCACHE_PROGRAM) set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE ${CCACHE_PROGRAM}) else() message(WARNING "Unable to find ccache. Falling back to real compiler") endif() # if (CCACHE_PROGRAM) endif() # if (ROCM_CCACHE_BUILD) ## Get version strings get_version ( "1.5.0" ) if ( ${ROCM_PATCH_VERSION} ) set ( VERSION_PATCH ${ROCM_PATCH_VERSION}) endif() set ( SO_VERSION_STRING "${VERSION_MAJOR}.${VERSION_MINOR}.${VERSION_PATCH}" ) set ( PACKAGE_VERSION_STRING "${VERSION_MAJOR}.${VERSION_MINOR}.${VERSION_COMMIT_COUNT}" ) ## Find external dependencies. find_package(LibElf REQUIRED) find_package(hsakmt 1.0 REQUIRED HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm) ## Create the rocr target. add_library( ${CORE_RUNTIME_TARGET} "" ) ## Enforce uniform output file naming. set_property(TARGET ${CORE_RUNTIME_TARGET} PROPERTY OUTPUT_NAME ${CORE_RUNTIME_NAME} ) ## Compiler preproc definitions. target_compile_definitions(${CORE_RUNTIME_TARGET} PRIVATE "${HSA_COMMON_DEFS}" __linux__ HSA_EXPORT=1 HSA_EXPORT_FINALIZER=1 HSA_EXPORT_IMAGES=1 HSA_DEPRECATED= ROCR_BUILD_ID="${PACKAGE_VERSION_STRING}-${VERSION_JOB}-${VERSION_HASH}" ) ## Check for memfd_create syscall include(CheckSymbolExists) CHECK_SYMBOL_EXISTS ( "__NR_memfd_create" "sys/syscall.h" HAVE_MEMFD_CREATE ) if ( HAVE_MEMFD_CREATE ) target_compile_definitions(${CORE_RUNTIME_TARGET} PRIVATE HAVE_MEMFD_CREATE ) endif() ## Set include directories for ROCr runtime target_include_directories( ${CORE_RUNTIME_TARGET} PUBLIC $ $ PRIVATE ${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/libamdhsacode ) ## Set RUNPATH - ../../lib covers use of the legacy symlink in /hsa/lib/ set_property(TARGET ${CORE_RUNTIME_TARGET} PROPERTY INSTALL_RPATH "$ORIGIN;$ORIGIN/../../lib;$ORIGIN/../../lib64;$ORIGIN/../lib64" ) ## ------------------------- Linux Compiler and Linker options ------------------------- set ( HSA_CXX_FLAGS ${HSA_COMMON_CXX_FLAGS} -fexceptions -fno-rtti -fvisibility=hidden -Wno-error=missing-braces -Wno-error=sign-compare -Wno-sign-compare -Wno-write-strings -Wno-conversion-null -fno-math-errno -fno-threadsafe-statics -fmerge-all-constants -fms-extensions -Wno-error=comment -Wno-comment -Wno-error=pointer-arith -Wno-pointer-arith -Wno-error=unused-variable -Wno-error=unused-function ) ## Extra image settings - audit! set ( HSA_CXX_FLAGS ${HSA_CXX_FLAGS} -Wno-deprecated-declarations ) if ( CMAKE_COMPILER_IS_GNUCXX ) set ( HSA_CXX_FLAGS ${HSA_CXX_FLAGS} -Wno-error=maybe-uninitialized -Wno-error=unused-but-set-variable) endif () if ( CMAKE_CXX_COMPILER_ID MATCHES "Clang") set ( HSA_CXX_FLAGS ${HSA_CXX_FLAGS} -Wno-error=self-assign) if( ${CMAKE_CXX_COMPILER_VERSION} VERSION_GREATER_EQUAL 13) set ( HSA_CXX_FLAGS ${HSA_CXX_FLAGS} -Wno-error=unused-but-set-variable) endif() endif() set ( DRVDEF "${CMAKE_CURRENT_SOURCE_DIR}/hsacore.so.def" ) set ( LNKSCR "hsacore.so.link" ) set ( HSA_SHARED_LINK_FLAGS "-Wl,-Bdynamic -Wl,-z,noexecstack -Wl,${CMAKE_CURRENT_SOURCE_DIR}/${LNKSCR} -Wl,--version-script=${DRVDEF} -Wl,--enable-new-dtags" ) target_compile_options(${CORE_RUNTIME_TARGET} PRIVATE ${HSA_CXX_FLAGS}) #target_link_options not available prior to CMake 3.13 set_property(TARGET ${CORE_RUNTIME_TARGET} PROPERTY LINK_FLAGS ${HSA_SHARED_LINK_FLAGS}) ## ------------------------- End Compiler and Linker options ---------------------------- ## Source files. set ( SRCS core/util/lnx/os_linux.cpp core/util/small_heap.cpp core/util/timer.cpp core/util/flag.cpp core/runtime/amd_blit_kernel.cpp core/runtime/amd_blit_sdma.cpp core/runtime/amd_cpu_agent.cpp core/runtime/amd_gpu_agent.cpp core/runtime/amd_hsa_loader.cpp core/runtime/amd_aql_queue.cpp core/runtime/amd_loader_context.cpp core/runtime/hsa_ven_amd_loader.cpp core/runtime/amd_memory_region.cpp core/runtime/amd_filter_device.cpp core/runtime/amd_topology.cpp core/runtime/default_signal.cpp core/runtime/host_queue.cpp core/runtime/hsa.cpp core/runtime/hsa_api_trace.cpp core/runtime/hsa_ext_amd.cpp core/runtime/hsa_ext_interface.cpp core/runtime/interrupt_signal.cpp core/runtime/intercept_queue.cpp core/runtime/ipc_signal.cpp core/runtime/isa.cpp core/runtime/runtime.cpp core/runtime/signal.cpp core/runtime/queue.cpp core/runtime/cache.cpp core/common/shared.cpp core/common/hsa_table_interface.cpp loader/executable.cpp libamdhsacode/amd_elf_image.cpp libamdhsacode/amd_hsa_code_util.cpp libamdhsacode/amd_hsa_locks.cpp libamdhsacode/amd_options.cpp libamdhsacode/amd_hsa_code.cpp ) target_sources( ${CORE_RUNTIME_TARGET} PRIVATE ${SRCS} ) if ( NOT DEFINED IMAGE_SUPPORT ) set ( IMAGE_SUPPORT ON ) endif() set ( IMAGE_SUPPORT ${IMAGE_SUPPORT} CACHE BOOL "Build with image support (default: ON)." ) ## Optional image module defintions. if(${IMAGE_SUPPORT}) ## Image definitons - audit! target_compile_definitions(${CORE_RUNTIME_TARGET} PRIVATE HSA_IMAGE_SUPPORT UNIX_OS LINUX __AMD64__ __x86_64__ AMD_INTERNAL_BUILD BRAHMA_BUILD=1 ) set ( IMAGE_SRCS image/addrlib/src/addrinterface.cpp image/addrlib/src/core/coord.cpp image/addrlib/src/core/addrlib.cpp image/addrlib/src/core/addrlib1.cpp image/addrlib/src/core/addrlib2.cpp image/addrlib/src/core/addrobject.cpp image/addrlib/src/core/addrelemlib.cpp image/addrlib/src/r800/ciaddrlib.cpp image/addrlib/src/r800/egbaddrlib.cpp image/addrlib/src/r800/siaddrlib.cpp image/addrlib/src/gfx9/gfx9addrlib.cpp image/addrlib/src/gfx10/gfx10addrlib.cpp image/device_info.cpp image/hsa_ext_image.cpp image/image_runtime.cpp image/image_manager.cpp image/image_manager_kv.cpp image/image_manager_ai.cpp image/image_manager_nv.cpp image/image_lut_kv.cpp image/blit_object_gfx7xx.cpp image/blit_object_gfx8xx.cpp image/blit_object_gfx9xx.cpp image/blit_kernel.cpp ${CMAKE_CURRENT_BINARY_DIR}/image/blit_src/opencl_blit_objects.cpp ) set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/image/blit_src/opencl_blit_objects.cpp PROPERTIES GENERATED TRUE) target_include_directories( ${CORE_RUNTIME_TARGET} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/image ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/ ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/inc ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/src ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/src/core ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/src/r800 ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/src/gfx9 ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/src/gfx10 ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/src/chip/r800 ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/src/chip/gfx9 ${CMAKE_CURRENT_SOURCE_DIR}/image/addrlib/src/chip/gfx10 ) target_sources( ${CORE_RUNTIME_TARGET} PRIVATE ${IMAGE_SRCS} ) ## Depend on blit kernel target. add_subdirectory( ${CMAKE_CURRENT_SOURCE_DIR}/image/blit_src ) add_dependencies( ${CORE_RUNTIME_TARGET} opencl_blit_objects ) endif() ## Link dependencies. target_link_libraries ( ${CORE_RUNTIME_TARGET} PRIVATE hsakmt::hsakmt ) target_link_libraries ( ${CORE_RUNTIME_TARGET} PRIVATE elf::elf dl pthread rt ) ## Set the VERSION and SOVERSION values set_property ( TARGET ${CORE_RUNTIME_TARGET} PROPERTY VERSION "${SO_VERSION_STRING}" ) set_property ( TARGET ${CORE_RUNTIME_TARGET} PROPERTY SOVERSION "${VERSION_MAJOR}" ) ## Add the public interface export target if doing a static build. ## Bind ROCr dependencies to the interface target rather than to the source build ## target so that -Wl,--whole-archive is tightly applied. Requires binding ## indirectly to the source build taret. if( NOT ${BUILD_SHARED_LIBS} ) add_library(${CORE_RUNTIME_NAME} INTERFACE) ## Bind to source build target interface but not its link requirements. target_include_directories( ${CORE_RUNTIME_NAME} INTERFACE $ ) target_link_libraries ( ${CORE_RUNTIME_NAME} INTERFACE -Wl,$/lib/cmake/${CORE_RUNTIME_NAME}/${LNKSCR} -Wl,--whole-archive $ -Wl,--no-whole-archive) add_dependencies( ${CORE_RUNTIME_NAME} ${CORE_RUNTIME_TARGET} ) ## Add external link requirements. target_link_libraries ( ${CORE_RUNTIME_NAME} INTERFACE hsakmt::hsakmt ) target_link_libraries ( ${CORE_RUNTIME_NAME} INTERFACE elf::elf dl pthread rt ) install ( TARGETS ${CORE_RUNTIME_NAME} EXPORT ${CORE_RUNTIME_NAME}Targets ) endif() ## Create symlinks for legacy packaging and install add_custom_target ( hsa_include_link ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../include/hsa hsa_include_link ) if ( ${BUILD_SHARED_LIBS} ) add_custom_target ( hsa_lib_link ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../lib/${CORE_RUNTIME_LIBRARY}.so ${CORE_RUNTIME_LIBRARY}-link.so ) add_custom_target ( hsa_lib_link2 ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../lib/${CORE_RUNTIME_LIBRARY}.so.${VERSION_MAJOR} ${CORE_RUNTIME_LIBRARY}-link.so.${VERSION_MAJOR} ) endif() ## Set install information # Installs binaries and exports the library usage data to ${HSAKMT_TARGET}Targets # TODO: Fix me for flat directory layout. Should be ${CMAKE_INSTALL_LIBDIR} install ( TARGETS ${CORE_RUNTIME_TARGET} EXPORT ${CORE_RUNTIME_NAME}Targets ARCHIVE DESTINATION lib COMPONENT binary LIBRARY DESTINATION lib COMPONENT binary ) # Install license install ( FILES ${CMAKE_CURRENT_SOURCE_DIR}/LICENSE.md DESTINATION ${CMAKE_INSTALL_DOCDIR} COMPONENT binary ) # Install public headers # TODO: Fix me for flat directory layout. Should be ${CMAKE_INSTALL_INCLUDEDIR} install ( DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/inc/ DESTINATION include/hsa COMPONENT dev ) # Legacy symlink - not packaged (CPack is bugged until ~3.18, see https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4637) install ( FILES ${CMAKE_CURRENT_BINARY_DIR}/hsa_include_link DESTINATION hsa/include PERMISSIONS OWNER_WRITE OWNER_READ RENAME hsa ) # Legacy symlinks. if ( ${BUILD_SHARED_LIBS} ) install ( FILES ${CMAKE_CURRENT_BINARY_DIR}/${CORE_RUNTIME_LIBRARY}-link.so DESTINATION hsa/lib PERMISSIONS OWNER_WRITE OWNER_READ RENAME ${CORE_RUNTIME_LIBRARY}.so COMPONENT binary) install ( FILES ${CMAKE_CURRENT_BINARY_DIR}/${CORE_RUNTIME_LIBRARY}-link.so.${VERSION_MAJOR} DESTINATION hsa/lib PERMISSIONS OWNER_WRITE OWNER_READ RENAME ${CORE_RUNTIME_LIBRARY}.so.${VERSION_MAJOR} COMPONENT binary) endif () ## Configure and install package config file # Record our usage data for clients find_package calls. # TODO: Fix me for flat directory layout. Should be ${CMAKE_INSTALL_LIBDIR} install ( EXPORT ${CORE_RUNTIME_NAME}Targets FILE ${CORE_RUNTIME_NAME}Targets.cmake NAMESPACE ${CORE_RUNTIME_NAME}:: DESTINATION lib/cmake/${CORE_RUNTIME_NAME} COMPONENT dev) # Adds the target alias hsa-runtime64::hsa-runtime64 to the local cmake cache. # This isn't necessary today. It's harmless preparation for some # hypothetical future in which the we might be included by add_subdirectory() # in some other project's cmake file. It allows uniform use of find_package # and target_link_library() without regard to whether a target is external or # a subdirectory of the current build. add_library( ${CORE_RUNTIME_NAME}::${CORE_RUNTIME_NAME} ALIAS ${CORE_RUNTIME_NAME} ) # Create cmake configuration files include(CMakePackageConfigHelpers) # TODO: Fix me for flat directory layout. Should be ${CMAKE_INSTALL_LIBDIR} configure_package_config_file(${CORE_RUNTIME_NAME}-config.cmake.in ${CORE_RUNTIME_NAME}-config.cmake INSTALL_DESTINATION lib/cmake/${CORE_RUNTIME_NAME} ) write_basic_package_version_file(${CORE_RUNTIME_NAME}-config-version.cmake VERSION ${SO_VERSION_STRING} COMPATIBILITY AnyNewerVersion ) # TODO: Fix me for flat directory layout. Should be ${CMAKE_INSTALL_LIBDIR} install(FILES ${CMAKE_CURRENT_BINARY_DIR}/${CORE_RUNTIME_NAME}-config.cmake ${CMAKE_CURRENT_BINARY_DIR}/${CORE_RUNTIME_NAME}-config-version.cmake DESTINATION lib/cmake/${CORE_RUNTIME_NAME} COMPONENT dev) # Install build files needed only when using a static build. if( NOT ${BUILD_SHARED_LIBS} ) # libelf find package module install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/cmake_modules/FindLibElf.cmake ${CMAKE_CURRENT_SOURCE_DIR}/cmake_modules/COPYING-CMAKE-SCRIPTS DESTINATION lib/cmake/${CORE_RUNTIME_NAME} COMPONENT dev) # Linker script (defines function aliases) install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/${LNKSCR} DESTINATION lib/cmake/${CORE_RUNTIME_NAME} COMPONENT dev) endif() # Optionally record the package's find module in the user's package cache. if ( NOT DEFINED EXPORT_TO_USER_PACKAGE_REGISTRY ) set ( EXPORT_TO_USER_PACKAGE_REGISTRY "off" ) endif() set ( EXPORT_TO_USER_PACKAGE_REGISTRY ${EXPORT_TO_USER_PACKAGE_REGISTRY} CACHE BOOL "Add cmake package config location to the user's cmake package registry.") if(${EXPORT_TO_USER_PACKAGE_REGISTRY}) # Enable writing to the registry set(CMAKE_EXPORT_PACKAGE_REGISTRY ON) # Generate a target file for the build export(TARGETS ${CORE_RUNTIME_NAME} NAMESPACE ${CORE_RUNTIME_NAME}:: FILE ${CORE_RUNTIME_NAME}Targets.cmake) # Record the package in the user's cache. export(PACKAGE ${CORE_RUNTIME_NAME}) endif() ## Packaging directives set ( CPACK_GENERATOR "DEB;RPM" CACHE STRING "Package types to build") set ( ENABLE_LDCONFIG ON CACHE BOOL "Set library links and caches using ldconfig.") ## Only pack the "binary" and "dev" components, post install script will add the directory link. set ( CPACK_COMPONENTS_ALL binary dev ) set ( CPACK_DEB_COMPONENT_INSTALL ON) set ( CPACK_RPM_COMPONENT_INSTALL ON) set ( CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc." ) set ( CPACK_PACKAGE_VERSION ${PACKAGE_VERSION_STRING} ) set ( CPACK_PACKAGE_CONTACT "TODO Advanced Micro Devices, Inc." ) set ( CPACK_COMPONENT_BINARY_DESCRIPTION "AMD Heterogeneous System Architecture HSA - Linux HSA Runtime for Boltzmann (ROCm) platforms" ) set ( CPACK_COMPONENT_DEV_DESCRIPTION "AMD Heterogeneous System Architecture HSA development package.\n This package contains the headers and cmake files for the hsa-rocr package." ) set ( CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE.md" ) if ( DEFINED ENV{ROCM_LIBPATCH_VERSION} ) set ( CPACK_PACKAGE_VERSION "${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}" ) message ( "Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}" ) endif() # Debian package specific variables set ( CPACK_DEBIAN_BINARY_PACKAGE_NAME "hsa-rocr") set ( CPACK_DEBIAN_DEV_PACKAGE_NAME "hsa-rocr-dev") if ( DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE} ) set ( CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE} ) else() set ( CPACK_DEBIAN_PACKAGE_RELEASE "local" ) endif() message ( "Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}" ) set ( CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT" ) set ( CPACK_DEBIAN_PACKAGE_HOMEPAGE "https://github.com/RadeonOpenCompute/ROCR-Runtime" ) ## Process the Debian install/remove scripts to update the CPACK variables configure_file ( ${CMAKE_CURRENT_SOURCE_DIR}/DEBIAN/Binary/postinst.in DEBIAN/Binary/postinst @ONLY ) configure_file ( ${CMAKE_CURRENT_SOURCE_DIR}/DEBIAN/Binary/prerm.in DEBIAN/Binary/prerm @ONLY ) configure_file ( ${CMAKE_CURRENT_SOURCE_DIR}/DEBIAN/Dev/postinst.in DEBIAN/Dev/postinst @ONLY ) configure_file ( ${CMAKE_CURRENT_SOURCE_DIR}/DEBIAN/Dev/prerm.in DEBIAN/Dev/prerm @ONLY ) set ( CPACK_DEBIAN_BINARY_PACKAGE_CONTROL_EXTRA "DEBIAN/Binary/postinst;DEBIAN/Binary/prerm" ) set ( CPACK_DEBIAN_DEV_PACKAGE_CONTROL_EXTRA "DEBIAN/Dev/postinst;DEBIAN/Dev/prerm" ) # Declare package relationships (hsa-ext-rocr-dev is a legacy package that we subsume) set ( CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS "hsakmt-roct" ) set ( CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "hsa-rocr" ) if ( ROCM_DEP_ROCMCORE ) string ( APPEND CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS ", rocm-core" ) string ( APPEND CPACK_DEBIAN_DEV_PACKAGE_DEPENDS ", rocm-core" ) endif() set ( CPACK_DEBIAN_PACKAGE_BREAKS "hsa-ext-rocr-dev" ) set ( CPACK_DEBIAN_PACKAGE_REPLACES "hsa-ext-rocr-dev" ) # RPM package specific variables set ( CPACK_RPM_BINARY_PACKAGE_NAME "hsa-rocr" ) set ( CPACK_RPM_DEV_PACKAGE_NAME "hsa-rocr-devel" ) if ( DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE} ) set ( CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE} ) else() set ( CPACK_RPM_PACKAGE_RELEASE "local" ) endif() string ( APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}" ) set ( CPACK_RPM_FILE_NAME "RPM-DEFAULT" ) message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}") set( CPACK_RPM_PACKAGE_LICENSE "NCSA" ) ## Process the Rpm install/remove scripts to update the CPACK variables configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/RPM/Binary/post.in" RPM/Binary/post @ONLY ) configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/RPM/Binary/postun.in" RPM/Binary/postun @ONLY ) configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/RPM/Dev/post.in" RPM/Dev/post @ONLY ) configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/RPM/Dev/postun.in" RPM/Dev/postun @ONLY ) set ( CPACK_RPM_BINARY_POST_INSTALL_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/RPM/Binary/post" ) set ( CPACK_RPM_BINARY_POST_UNINSTALL_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/RPM/Binary/postun" ) set ( CPACK_RPM_DEV_POST_INSTALL_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/RPM/Dev/post" ) set ( CPACK_RPM_DEV_POST_UNINSTALL_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/RPM/Dev/postun" ) # Declare package relationships (hsa-ext-rocr-dev is a legacy package that we subsume) set ( CPACK_RPM_BINARY_PACKAGE_REQUIRES "hsakmt-roct" ) set ( CPACK_RPM_DEV_PACKAGE_REQUIRES "hsa-rocr" ) if ( ROCM_DEP_ROCMCORE ) string ( APPEND CPACK_RPM_BINARY_PACKAGE_REQUIRES " rocm-core" ) string ( APPEND CPACK_RPM_DEV_PACKAGE_REQUIRES " rocm-core" ) endif() set ( CPACK_RPM_PACKAGE_PROVIDES "hsa-ext-rocr-dev hsa-rocr-dev" ) set ( CPACK_RPM_PACKAGE_OBSOLETES "hsa-ext-rocr-dev" ) ## Include packaging include ( CPack ) ROCR-Runtime-rocm-5.0.0/src/DEBIAN/000077500000000000000000000000001420110115200163345ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/DEBIAN/Binary/000077500000000000000000000000001420110115200175605ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/DEBIAN/Binary/postinst.in000066400000000000000000000044771420110115200220070ustar00rootroot00000000000000#!/bin/bash ################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2020-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ set -e # left-hand term originates from @ENABLE_LDCONFIG@ = ON/OFF at package build do_ldconfig() { if [ "@ENABLE_LDCONFIG@" == "ON" ]; then echo @CPACK_PACKAGING_INSTALL_PREFIX@/lib > /etc/ld.so.conf.d/hsa-rocr.conf ldconfig fi } case "$1" in ( configure ) do_ldconfig ;; ( * ) exit 0 ;; esac ROCR-Runtime-rocm-5.0.0/src/DEBIAN/Binary/prerm.in000066400000000000000000000044261420110115200212430ustar00rootroot00000000000000#!/bin/bash ################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2020-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ set -e # left-hand term originates from @ENABLE_LDCONFIG@ = ON/OFF at package build rm_ldconfig() { if [ "@ENABLE_LDCONFIG@" == "ON" ]; then rm -f /etc/ld.so.conf.d/hsa-rocr.conf ldconfig fi } case "$1" in ( remove ) rm_ldconfig ;; ( * ) exit 0 ;; esac ROCR-Runtime-rocm-5.0.0/src/DEBIAN/Dev/000077500000000000000000000000001420110115200170525ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/DEBIAN/Dev/postinst.in000066400000000000000000000044151420110115200212710ustar00rootroot00000000000000#!/bin/bash ################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2020-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ set -e case "$1" in ( configure ) # Workaround for CPACK directory symlink handling error. mkdir -p @CPACK_PACKAGING_INSTALL_PREFIX@/hsa/include ln -sf ../../include/hsa @CPACK_PACKAGING_INSTALL_PREFIX@/hsa/include/hsa ;; ( * ) exit 0 ;; esac ROCR-Runtime-rocm-5.0.0/src/DEBIAN/Dev/prerm.in000066400000000000000000000044731420110115200205370ustar00rootroot00000000000000#!/bin/bash ################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2020-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ set -e case "$1" in ( remove | upgrade ) # Workaround for CPACK directory symlink handling error. # Needed for remove and upgrade scenarios since # upgrade installs to new folder and old folders need to be cleaned rm -rf @CPACK_PACKAGING_INSTALL_PREFIX@/hsa ;; ( * ) exit 0 ;; esac ROCR-Runtime-rocm-5.0.0/src/LICENSE.md000066400000000000000000000033521420110115200170210ustar00rootroot00000000000000ROCR-Runtime LICENSE The University of Illinois/NCSA Open Source License (NCSA) Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. Developed by: AMD Research and AMD HSA Software Development Advanced Micro Devices, Inc. www.amd.com Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal with the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimers. - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimers in the documentation and/or other materials provided with the distribution. - Neither the names of Advanced Micro Devices, Inc, nor the names of its contributors may be used to endorse or promote products derived from this Software without specific prior written permission. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE SOFTWARE. ROCR-Runtime-rocm-5.0.0/src/README.md000066400000000000000000000172321420110115200166760ustar00rootroot00000000000000### Package Contents This directory contains the ROC Runtime source code based on the HSA Runtime but modified to support AMD/ATI discrete GPUs. #### Source & Include Directories core - Contains the source code for AMD's implementation of the core HSA Runtime API's. cmake_modules - CMake support modules and files. inc - Contains the public and AMD specific header files exposing the HSA Runtimes interfaces. libamdhsacode - Code object definitions and interface. loader - Used to load code objects. utils - Utilities required to build the core runtime. #### Build Environment CMake build framework is used to build the ROC runtime. The minimum version is 3.7. Obtain cmake infrastructure: http://www.cmake.org/download/ Export cmake bin into your PATH #### Package Dependencies The following support packages are required to successfully build the runtime: * libelf-dev * g++ #### Building the Runtime To build the runtime a compatible version of the libhsakmt library and the hsakmt.h header file must be available. The latest version of these files can be obtained from the ROCT-Thunk-Interface repository, available here: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface As of ROCm release 3.7 libhsakmt development packages now include a cmake package config file. The runtime will now locate libhsakmt via find_package if libhsakmt is installed to a standard location. For installations that do not use ROCm standard paths set cmake variables CMAKE_PREFIX_PATH or hsakmt_DIR to override find_package search paths. As of ROCm release 3.7 the runtime includes an optional image support module (previously hsa-ext-rocr-dev). By default this module is included in builds of the runtime. The image module may be excluded the runtime by setting cmake variable IMAGE_SUPPORT to OFF. When building the optional image module additional build dependencies are required. An amdgcn compatible clang and device library must be installed to build the image module. The latest version of these requirements can be obtained from the ROCm package repository (see: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html) The latest source for these projects may be found here: https://github.com/RadeonOpenCompute/llvm-project https://github.com/RadeonOpenCompute/ROCm-Device-Libs Additionally xxd must be installed. The runtime optionally supports use of the cmake user package registry. By default the registry is not modified. Set cmake variable EXPORT_TO_USER_PACKAGE_REGISTRY to ON to enable updating the package registry. For example, to build, install, and produce packages on a system with standard ROCm packages installed, execute the following from src/: mkdir build cd build cmake -DCMAKE_INSTALL_PATH=/opt/rocm .. make make install make package Example with a custom installation path, build dependency path, and options: cmake -DIMAGE_SUPPORT=OFF \ -DEXPORT_TO_USER_PACKAGE_REGISTRY=ON \ -DCMAKE_VERBOSE_MAKEFILE=1 \ -DCMAKE_PREFIX_PATH= \ -DCMAKE_INSTALL_PATH= \ .. Alternately ccmake and cmake-gui are supported: mkdir build cd build ccmake .. press c to configure populate variables as desired press c again press g to generate and exit make #### Building Against the Runtime The runtime provides a cmake package config file, installed by default to /opt/rocm/lib/cmake/hsa-runtime64. The runtime exports cmake target hsa-runtime64 in namespace hsa-runtime64. A cmake project (Foo) using the runtime may locate, include, and link the runtime with the following template: Add /opt/rocm to CMAKE_PREFIX_PATH. find_package(hsa-runtime64 1.0 REQUIRED) ... add_library(Foo ...) ... target_link_library(Foo PRIVATE hsa-runtime64::hsa-runtime64) #### Specs http://www.hsafoundation.com/standards/ HSA Runtime Specification 1.1 HSA Programmer Reference Manual Specification 1.1 HSA Platform System Architecture Specification 1.1 #### Runtime Design Overview The AMD ROC runtime consists of three primary layers: * C interface adaptors * C++ interfaces classes and common functions * AMD device specific implementations Additionally the runtime is dependent on a small utility library which provides simple common functions, limited operating system and compiler abstraction, as well as atomic operation interfaces. #### C Interface Adaptors Files: * hsa.h(cpp) * hsa_ext_interface.h(cpp) The C interface layer provides C99 APIs as defined in the HSA Runtime Specification 1.1. The interfaces and default definitions for the standard extensions are also provided. The interface functions simply forward to a function pointer table defined here. The table is initialized to point to default definitions, which simply return an appropriate error code. If available the extension library is loaded as part of runtime initialization and the table is updated to point into the extension library. #### C++ Interfaces Classes & Common Functions Files: * runtime.h(cpp) * agent.h * queue.h * signal.h * memory_region.h(cpp) * checked.h * memory_database.h(cpp) * default_signal.h(cpp) The C++ interface layer provides abstract interface classes encapsulating commands to HSA Signals, Agents, and Queues. This layer also contains the implementation of device independent commands, such as hsa_init and hsa_system_get_info, and a default signal and queue implementation. #### Device Specific Implementations Files: * amd_cpu_agent.h(cpp) * amd_gpu_agent.h(cpp) * amd_hw_aql_command_processor.h(cpp) * amd_memory_region.h(cpp) * amd_memory_registration.h(cpp) * amd_topology.h(cpp) * host_queue.h(cpp) * interrupt_signal.h(cpp) * hsa_ext_private_amd.h(cpp) The device specific layer contains implementations of the C++ interface classes which implement HSA functionality for ROCm supported devices. #### Implemented Functionality * The following queries are not implemented: * hsa_code_symbol_get_info: * HSA_CODE_SYMBOL_INFO_INDIRECT_FUNCTION_CALL_CONVENTION * hsa_executable_symbol_get_info: * HSA_EXECUTABLE_SYMBOL_INFO_INDIRECT_FUNCTION_OBJECT * HSA_EXECUTABLE_SYMBOL_INFO_INDIRECT_FUNCTION_CALL_CONVENTION #### Known Issues * hsa_agent_get_exception_policies is not implemented. * hsa_system_get_extension_table is not implemented for HSA_EXTENSION_AMD_PROFILER. #### Disclaimer The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD's products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Copyright (c) 2014-2021 Advanced Micro Devices, Inc. All rights reserved. ROCR-Runtime-rocm-5.0.0/src/RPM/000077500000000000000000000000001420110115200160505ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/RPM/Binary/000077500000000000000000000000001420110115200172745ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/RPM/Binary/post.in000066400000000000000000000042721420110115200206160ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2016-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ # left-hand term originates from @ENABLE_LDCONFIG@ = ON/OFF at package build if [ "@ENABLE_LDCONFIG@" == "ON" ]; then echo @CPACK_PACKAGING_INSTALL_PREFIX@/hsa/lib > /etc/ld.so.conf.d/hsa-rocr.conf ldconfig fi ROCR-Runtime-rocm-5.0.0/src/RPM/Binary/postun.in000066400000000000000000000042441420110115200211600ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2016-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ # left-hand term originates from @ENABLE_LDCONFIG@ = ON/OFF at package build if [ $1 -eq 0 ] && [ "@ENABLE_LDCONFIG@" == "ON" ]; then rm -f /etc/ld.so.conf.d/hsa-rocr.conf ldconfig fi ROCR-Runtime-rocm-5.0.0/src/RPM/Dev/000077500000000000000000000000001420110115200165665ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/RPM/Dev/post.in000066400000000000000000000042351420110115200201070ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2016-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ # Workaround for CPACK directory symlink handling error. mkdir -p @CPACK_PACKAGING_INSTALL_PREFIX@/hsa/include ln -sf ../../include/hsa @CPACK_PACKAGING_INSTALL_PREFIX@/hsa/include/hsa ROCR-Runtime-rocm-5.0.0/src/RPM/Dev/postun.in000066400000000000000000000043401420110115200204470ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2016-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ if [ $1 -le 1 ]; then # Workaround for CPACK directory symlink handling error. # Needed for uninstall and upgrade scenarios since # upgrade install to new folder and old folders need to be cleaned rm -rf @CPACK_PACKAGING_INSTALL_PREFIX@/hsa fi ROCR-Runtime-rocm-5.0.0/src/cmake_modules/000077500000000000000000000000001420110115200202225ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/cmake_modules/COPYING-CMAKE-SCRIPTS000066400000000000000000000024571420110115200232300ustar00rootroot00000000000000Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ROCR-Runtime-rocm-5.0.0/src/cmake_modules/FindLibElf.cmake000066400000000000000000000033101420110115200231570ustar00rootroot00000000000000# - Try to find libelf # Once done this will define # # LIBELF_FOUND - system has libelf # LIBELF_INCLUDE_DIRS - the libelf include directory # LIBELF_LIBRARIES - Link these to use libelf # LIBELF_DEFINITIONS - Compiler switches required for using libelf # # Copyright (c) 2008 Bernhard Walle # # Redistribution and use is allowed according to the terms of the New # BSD license. # For details see the accompanying COPYING-CMAKE-SCRIPTS file. # if (LIBELF_FOUND) return() endif (LIBELF_FOUND) find_path (LIBELF_INCLUDE_DIRS NAMES libelf.h PATHS /usr/include /usr/include/libelf /usr/local/include /usr/local/include/libelf /opt/local/include /opt/local/include/libelf /sw/include /sw/include/libelf ENV CPATH) find_library (LIBELF_LIBRARIES NAMES elf PATHS /usr/lib /usr/local/lib /opt/local/lib /sw/lib ENV LIBRARY_PATH ENV LD_LIBRARY_PATH) include (FindPackageHandleStandardArgs) # handle the QUIETLY and REQUIRED arguments and set LIBELF_FOUND to TRUE if all listed variables are TRUE FIND_PACKAGE_HANDLE_STANDARD_ARGS(LibElf DEFAULT_MSG LIBELF_LIBRARIES LIBELF_INCLUDE_DIRS) SET(CMAKE_REQUIRED_LIBRARIES elf) INCLUDE(CheckCXXSourceCompiles) CHECK_CXX_SOURCE_COMPILES("#include int main() { Elf *e = (Elf*)0; size_t sz; elf_getshdrstrndx(e, &sz); return 0; }" ELF_GETSHDRSTRNDX) mark_as_advanced(LIBELF_INCLUDE_DIRS LIBELF_LIBRARIES ELF_GETSHDRSTRNDX) if(LIBELF_FOUND) add_library(elf::elf UNKNOWN IMPORTED) set_property(TARGET elf::elf PROPERTY IMPORTED_LOCATION ${LIBELF_LIBRARIES}) set_property(TARGET elf::elf PROPERTY INTERFACE_INCLUDE_DIRECTORIES ${LIBELF_INCLUDE_DIRS}) endif() ROCR-Runtime-rocm-5.0.0/src/cmake_modules/hsa_common.cmake000066400000000000000000000060571420110115200233570ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ # # HSA Build compiler definitions common between components. # set(IS64BIT 0) set(ONLY64STR "32") if(CMAKE_SIZEOF_VOID_P EQUAL 8) set(IS64BIT 1) set(ONLY64STR "64") endif() set(HSA_COMMON_CXX_FLAGS "-Wall" "-std=c++11") set(HSA_COMMON_CXX_FLAGS ${HSA_COMMON_CXX_FLAGS} "-fPIC") if (CMAKE_COMPILER_IS_GNUCXX) set(HSA_COMMON_CXX_FLAGS ${HSA_COMMON_CXX_FLAGS} "-Wl,--unresolved-symbols=ignore-in-shared-libs") endif () set(HSA_COMMON_CXX_FLAGS ${HSA_COMMON_CXX_FLAGS} "-fno-strict-aliasing") if ( CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64") set( HSA_COMMON_CXX_FLAGS ${HSA_COMMON_CXX_FLAGS} "-m64" "-msse" "-msse2") elseif ( CMAKE_SYSTEM_PROCESSOR STREQUAL "x86" ) set ( HSA_COMMON_CXX_FLAGS ${HSA_COMMON_CXX_FLAGS} "-m32") endif () if ( "${CMAKE_BUILD_TYPE}" STREQUAL Debug ) set ( HSA_COMMON_CXX_FLAGS ${HSA_COMMON_CXX_FLAGS} "-O0" "-ggdb") endif () set( HSA_COMMON_DEFS "__STDC_LIMIT_MACROS") set( HSA_COMMON_DEFS ${HSA_COMMON_DEFS} "__STDC_CONSTANT_MACROS") set( HSA_COMMON_DEFS ${HSA_COMMON_DEFS} "__STDC_FORMAT_MACROS") set( HSA_COMMON_DEFS ${HSA_COMMON_DEFS} "LITTLEENDIAN_CPU=1") ROCR-Runtime-rocm-5.0.0/src/cmake_modules/utils.cmake000066400000000000000000000205121420110115200223640ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ function( get_path LIB CACHED_PATH HELP ) set( options "") set( oneValueArgs RESULT ) set( multiValueArgs HINTS NAMES ) cmake_parse_arguments(ARGS "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN} ) # Search for canary file. if( ${LIB} ) find_library( FULLPATH NAMES ${ARGS_NAMES} HINTS ${${CACHED_PATH}} ${ARGS_HINTS} ) else() find_file( FULLPATH NAMES ${ARGS_NAMES} HINTS ${${CACHED_PATH}} ${ARGS_HINTS} ) endif() set( RESULT (NOT ${FULLPATH} MATCHES NOTFOUND) ) # Extract path get_filename_component ( DIRPATH ${FULLPATH} DIRECTORY ) # Check path against cache if( NOT "${${CACHED_PATH}}" STREQUAL "" ) if ( NOT "${${CACHED_PATH}}" STREQUAL "${DIRPATH}" ) message(WARNING "${CACHED_PATH} may be incorrect." ) set( DIRPATH ${${CACHED_PATH}} ) endif() elseif(NOT ${RESULT}) message(WARNING "${CACHED_PATH} not located during path search.") endif() # Set cache variable and help text set( ${CACHED_PATH} ${DIRPATH} CACHE PATH ${HELP} FORCE ) unset( FULLPATH CACHE ) # Return success flag if( NOT ${ARGS_RESULT} STREQUAL "" ) set( ${ARGS_RESULT} ${RESULT} PARENT_SCOPE) endif() endfunction() ## Searches for a file using include paths and stores the path to that file in the cache ## using the cached value if set. Search paths are optional. Returns success in RESULT. ## get_include_path( NAMES name1 [name2...] [HINTS path1 [path2 ... ENV var]] [RESULT ] macro( get_include_path CACHED_PATH HELP ) get_path( 0 ${ARGV} ) endmacro() ## Searches for a file using library paths and stores the path to that file in the cache ## using the cached value if set. Search paths are optional. Returns success in RESULT. ## get_library_path( NAMES name1 [name2...] [HINTS path1 [path2 ... ENV var]] [RESULT ] macro( get_library_path CACHED_PATH HELP ) get_path( 1 ${ARGV} ) endmacro() ## Parses the VERSION_STRING variable and places ## the first, second and third number values in ## the major, minor and patch variables. function( parse_version VERSION_STRING ) string ( FIND ${VERSION_STRING} "-" STRING_INDEX ) if ( ${STRING_INDEX} GREATER -1 ) math ( EXPR STRING_INDEX "${STRING_INDEX} + 1" ) string ( SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD ) endif () string ( REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING} ) list ( LENGTH VERSIONS VERSION_COUNT ) if ( ${VERSION_COUNT} GREATER 0) list ( GET VERSIONS 0 MAJOR ) set ( VERSION_MAJOR ${MAJOR} PARENT_SCOPE ) endif () if ( ${VERSION_COUNT} GREATER 1 ) list ( GET VERSIONS 1 MINOR ) set ( VERSION_MINOR ${MINOR} PARENT_SCOPE ) endif () if ( ${VERSION_COUNT} GREATER 2 ) list ( GET VERSIONS 2 PATCH ) set ( VERSION_PATCH ${PATCH} PARENT_SCOPE ) endif () endfunction () ## Gets the current version of the repository ## using versioning tags and git describe. ## Passes back a packaging version string ## and a library version string. function ( get_version DEFAULT_VERSION_STRING ) set( VERSION_JOB "local-build" ) set( VERSION_COMMIT_COUNT 0 ) set( VERSION_HASH "unknown" ) find_program( GIT NAMES git ) if( GIT ) #execute_process ( COMMAND git describe --tags --dirty --long # WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} # OUTPUT_VARIABLE GIT_TAG_STRING # OUTPUT_STRIP_TRAILING_WHITESPACE # RESULT_VARIABLE RESULT ) # Get branch commit (common ancestor) of current branch and master branch. execute_process(COMMAND git merge-base HEAD origin/HEAD WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} OUTPUT_VARIABLE GIT_MERGE_BASE OUTPUT_STRIP_TRAILING_WHITESPACE RESULT_VARIABLE RESULT ) if( ${RESULT} EQUAL 0 ) # Count commits from branch point. execute_process(COMMAND git rev-list --count ${GIT_MERGE_BASE}..HEAD WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} OUTPUT_VARIABLE VERSION_COMMIT_COUNT OUTPUT_STRIP_TRAILING_WHITESPACE RESULT_VARIABLE RESULT ) if(NOT ${RESULT} EQUAL 0 ) set( VERSION_COMMIT_COUNT 0 ) endif() endif() # Get current short hash. execute_process(COMMAND git rev-parse --short HEAD WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} OUTPUT_VARIABLE VERSION_HASH OUTPUT_STRIP_TRAILING_WHITESPACE RESULT_VARIABLE RESULT ) if( ${RESULT} EQUAL 0 ) # Check for dirty workspace. execute_process(COMMAND git diff --quiet WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} RESULT_VARIABLE RESULT ) if(${RESULT} EQUAL 1) set(VERSION_HASH "${VERSION_HASH}-dirty") endif() else() set( VERSION_HASH "unknown" ) endif() endif() # Build automation IDs if(DEFINED ENV{ROCM_BUILD_ID}) set( VERSION_JOB $ENV{ROCM_BUILD_ID} ) endif() parse_version(${DEFAULT_VERSION_STRING}) set( VERSION_MAJOR "${VERSION_MAJOR}" PARENT_SCOPE ) set( VERSION_MINOR "${VERSION_MINOR}" PARENT_SCOPE ) set( VERSION_PATCH "${VERSION_PATCH}" PARENT_SCOPE ) set( VERSION_COMMIT_COUNT "${VERSION_COMMIT_COUNT}" PARENT_SCOPE ) set( VERSION_HASH "${VERSION_HASH}" PARENT_SCOPE ) set( VERSION_JOB "${VERSION_JOB}" PARENT_SCOPE ) #message("${VERSION_MAJOR}" ) #message("${VERSION_MINOR}" ) #message("${VERSION_PATCH}" ) #message("${VERSION_COMMIT_COUNT}") #message("${VERSION_HASH}") #message("${VERSION_JOB}") endfunction() ## Collects subdirectory names and returns them in a list function ( listsubdirs DIRPATH SUBDIRECTORIES ) file( GLOB CONTENTS RELATIVE ${DIRPATH} "${DIRPATH}/*" ) set ( FOLDERS, "" ) foreach( ITEM IN LISTS CONTENTS) if( IS_DIRECTORY "${DIRPATH}/${ITEM}" ) list( APPEND FOLDERS ${ITEM} ) endif() endforeach() set (${SUBDIRECTORIES} ${FOLDERS} PARENT_SCOPE) endfunction() ROCR-Runtime-rocm-5.0.0/src/core/000077500000000000000000000000001420110115200163425ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/core/common/000077500000000000000000000000001420110115200176325ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/core/common/hsa_table_interface.cpp000066400000000000000000001402671420110115200243120ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "inc/hsa_api_trace.h" #include "core/inc/hsa_api_trace_int.h" static const HsaApiTable* hsaApiTable; static const CoreApiTable* coreApiTable; static const AmdExtTable* amdExtTable; void hsa_table_interface_init(const HsaApiTable* apiTable) { hsaApiTable = apiTable; coreApiTable = apiTable->core_; amdExtTable = apiTable->amd_ext_; } const HsaApiTable* hsa_table_interface_get_table() { return hsaApiTable; } class Init { public: Init() { rocr::core::LoadInitialHsaApiTable(); } }; static Init LinkAtLoadOrFirstTranslationUnitAccess; // Pass through stub functions hsa_status_t HSA_API hsa_init() { return coreApiTable->hsa_init_fn(); } hsa_status_t HSA_API hsa_shut_down() { return coreApiTable->hsa_shut_down_fn(); } hsa_status_t HSA_API hsa_system_get_info(hsa_system_info_t attribute, void* value) { return coreApiTable->hsa_system_get_info_fn(attribute, value); } hsa_status_t HSA_API hsa_extension_get_name(uint16_t extension, const char** name) { return coreApiTable->hsa_extension_get_name_fn(extension, name); } hsa_status_t HSA_API hsa_system_extension_supported(uint16_t extension, uint16_t version_major, uint16_t version_minor, bool* result) { return coreApiTable->hsa_system_extension_supported_fn( extension, version_major, version_minor, result); } hsa_status_t HSA_API hsa_system_major_extension_supported(uint16_t extension, uint16_t version_major, uint16_t* version_minor, bool* result) { return coreApiTable->hsa_system_major_extension_supported_fn(extension, version_major, version_minor, result); } hsa_status_t HSA_API hsa_system_get_extension_table(uint16_t extension, uint16_t version_major, uint16_t version_minor, void* table) { return coreApiTable->hsa_system_get_extension_table_fn( extension, version_major, version_minor, table); } hsa_status_t HSA_API hsa_system_get_major_extension_table(uint16_t extension, uint16_t version_major, size_t table_length, void* table) { return coreApiTable->hsa_system_get_major_extension_table_fn(extension, version_major, table_length, table); } hsa_status_t HSA_API hsa_iterate_agents(hsa_status_t (*callback)(hsa_agent_t agent, void* data), void* data) { return coreApiTable->hsa_iterate_agents_fn(callback, data); } hsa_status_t HSA_API hsa_agent_get_info(hsa_agent_t agent, hsa_agent_info_t attribute, void* value) { return coreApiTable->hsa_agent_get_info_fn(agent, attribute, value); } hsa_status_t HSA_API hsa_agent_get_exception_policies(hsa_agent_t agent, hsa_profile_t profile, uint16_t* mask) { return coreApiTable->hsa_agent_get_exception_policies_fn(agent, profile, mask); } hsa_status_t HSA_API hsa_cache_get_info(hsa_cache_t cache, hsa_cache_info_t attribute, void* value) { return coreApiTable->hsa_cache_get_info_fn(cache, attribute, value); } hsa_status_t HSA_API hsa_agent_iterate_caches( hsa_agent_t agent, hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* value) { return coreApiTable->hsa_agent_iterate_caches_fn(agent, callback, value); } hsa_status_t HSA_API hsa_agent_extension_supported(uint16_t extension, hsa_agent_t agent, uint16_t version_major, uint16_t version_minor, bool* result) { return coreApiTable->hsa_agent_extension_supported_fn( extension, agent, version_major, version_minor, result); } hsa_status_t HSA_API hsa_agent_major_extension_supported(uint16_t extension, hsa_agent_t agent, uint16_t version_major, uint16_t* version_minor, bool* result) { return coreApiTable->hsa_agent_major_extension_supported_fn(extension, agent, version_major, version_minor, result); } hsa_status_t HSA_API hsa_queue_create(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type, void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data, uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue) { return coreApiTable->hsa_queue_create_fn(agent, size, type, callback, data, private_segment_size, group_segment_size, queue); } hsa_status_t HSA_API hsa_soft_queue_create(hsa_region_t region, uint32_t size, hsa_queue_type32_t type, uint32_t features, hsa_signal_t completion_signal, hsa_queue_t** queue) { return coreApiTable->hsa_soft_queue_create_fn(region, size, type, features, completion_signal, queue); } hsa_status_t HSA_API hsa_queue_destroy(hsa_queue_t* queue) { return coreApiTable->hsa_queue_destroy_fn(queue); } hsa_status_t HSA_API hsa_queue_inactivate(hsa_queue_t* queue) { return coreApiTable->hsa_queue_inactivate_fn(queue); } uint64_t HSA_API hsa_queue_load_read_index_scacquire(const hsa_queue_t* queue) { return coreApiTable->hsa_queue_load_read_index_scacquire_fn(queue); } uint64_t HSA_API hsa_queue_load_read_index_relaxed(const hsa_queue_t* queue) { return coreApiTable->hsa_queue_load_read_index_relaxed_fn(queue); } uint64_t HSA_API hsa_queue_load_write_index_scacquire(const hsa_queue_t* queue) { return coreApiTable->hsa_queue_load_write_index_scacquire_fn(queue); } uint64_t HSA_API hsa_queue_load_write_index_relaxed(const hsa_queue_t* queue) { return coreApiTable->hsa_queue_load_write_index_relaxed_fn(queue); } void HSA_API hsa_queue_store_write_index_relaxed(const hsa_queue_t* queue, uint64_t value) { return coreApiTable->hsa_queue_store_write_index_relaxed_fn(queue, value); } void HSA_API hsa_queue_store_write_index_screlease(const hsa_queue_t* queue, uint64_t value) { return coreApiTable->hsa_queue_store_write_index_screlease_fn(queue, value); } uint64_t HSA_API hsa_queue_cas_write_index_scacq_screl(const hsa_queue_t* queue, uint64_t expected, uint64_t value) { return coreApiTable->hsa_queue_cas_write_index_scacq_screl_fn(queue, expected, value); } uint64_t HSA_API hsa_queue_cas_write_index_scacquire(const hsa_queue_t* queue, uint64_t expected, uint64_t value) { return coreApiTable->hsa_queue_cas_write_index_scacquire_fn(queue, expected, value); } uint64_t HSA_API hsa_queue_cas_write_index_relaxed(const hsa_queue_t* queue, uint64_t expected, uint64_t value) { return coreApiTable->hsa_queue_cas_write_index_relaxed_fn(queue, expected, value); } uint64_t HSA_API hsa_queue_cas_write_index_screlease(const hsa_queue_t* queue, uint64_t expected, uint64_t value) { return coreApiTable->hsa_queue_cas_write_index_screlease_fn(queue, expected, value); } uint64_t HSA_API hsa_queue_add_write_index_scacq_screl(const hsa_queue_t* queue, uint64_t value) { return coreApiTable->hsa_queue_add_write_index_scacq_screl_fn(queue, value); } uint64_t HSA_API hsa_queue_add_write_index_scacquire(const hsa_queue_t* queue, uint64_t value) { return coreApiTable->hsa_queue_add_write_index_scacquire_fn(queue, value); } uint64_t HSA_API hsa_queue_add_write_index_relaxed(const hsa_queue_t* queue, uint64_t value) { return coreApiTable->hsa_queue_add_write_index_relaxed_fn(queue, value); } uint64_t HSA_API hsa_queue_add_write_index_screlease(const hsa_queue_t* queue, uint64_t value) { return coreApiTable->hsa_queue_add_write_index_screlease_fn(queue, value); } void HSA_API hsa_queue_store_read_index_relaxed(const hsa_queue_t* queue, uint64_t value) { return coreApiTable->hsa_queue_store_read_index_relaxed_fn(queue, value); } void HSA_API hsa_queue_store_read_index_screlease(const hsa_queue_t* queue, uint64_t value) { return coreApiTable->hsa_queue_store_read_index_screlease_fn(queue, value); } hsa_status_t HSA_API hsa_agent_iterate_regions( hsa_agent_t agent, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) { return coreApiTable->hsa_agent_iterate_regions_fn(agent, callback, data); } hsa_status_t HSA_API hsa_region_get_info(hsa_region_t region, hsa_region_info_t attribute, void* value) { return coreApiTable->hsa_region_get_info_fn(region, attribute, value); } hsa_status_t HSA_API hsa_memory_register(void* address, size_t size) { return coreApiTable->hsa_memory_register_fn(address, size); } hsa_status_t HSA_API hsa_memory_deregister(void* address, size_t size) { return coreApiTable->hsa_memory_deregister_fn(address, size); } hsa_status_t HSA_API hsa_memory_allocate(hsa_region_t region, size_t size, void** ptr) { return coreApiTable->hsa_memory_allocate_fn(region, size, ptr); } hsa_status_t HSA_API hsa_memory_free(void* ptr) { return coreApiTable->hsa_memory_free_fn(ptr); } hsa_status_t HSA_API hsa_memory_copy(void* dst, const void* src, size_t size) { return coreApiTable->hsa_memory_copy_fn(dst, src, size); } hsa_status_t HSA_API hsa_memory_assign_agent(void* ptr, hsa_agent_t agent, hsa_access_permission_t access) { return coreApiTable->hsa_memory_assign_agent_fn(ptr, agent, access); } hsa_status_t HSA_API hsa_signal_create(hsa_signal_value_t initial_value, uint32_t num_consumers, const hsa_agent_t* consumers, hsa_signal_t* signal) { return coreApiTable->hsa_signal_create_fn(initial_value, num_consumers, consumers, signal); } hsa_status_t HSA_API hsa_signal_destroy(hsa_signal_t signal) { return coreApiTable->hsa_signal_destroy_fn(signal); } hsa_signal_value_t HSA_API hsa_signal_load_relaxed(hsa_signal_t signal) { return coreApiTable->hsa_signal_load_relaxed_fn(signal); } hsa_signal_value_t HSA_API hsa_signal_load_scacquire(hsa_signal_t signal) { return coreApiTable->hsa_signal_load_scacquire_fn(signal); } void HSA_API hsa_signal_store_relaxed(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_store_relaxed_fn(signal, value); } void HSA_API hsa_signal_store_screlease(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_store_screlease_fn(signal, value); } void HSA_API hsa_signal_silent_store_relaxed(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_silent_store_relaxed_fn(signal, value); } void HSA_API hsa_signal_silent_store_screlease(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_silent_store_screlease_fn(signal, value); } hsa_signal_value_t HSA_API hsa_signal_wait_relaxed(hsa_signal_t signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_expectancy_hint) { return coreApiTable->hsa_signal_wait_relaxed_fn( signal, condition, compare_value, timeout_hint, wait_expectancy_hint); } hsa_signal_value_t HSA_API hsa_signal_wait_scacquire(hsa_signal_t signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_expectancy_hint) { return coreApiTable->hsa_signal_wait_scacquire_fn(signal, condition, compare_value, timeout_hint, wait_expectancy_hint); } hsa_status_t HSA_API hsa_signal_group_create(uint32_t num_signals, const hsa_signal_t* signals, uint32_t num_consumers, const hsa_agent_t* consumers, hsa_signal_group_t* signal_group) { return coreApiTable->hsa_signal_group_create_fn(num_signals, signals, num_consumers, consumers, signal_group); } hsa_status_t HSA_API hsa_signal_group_destroy(hsa_signal_group_t signal_group) { return coreApiTable->hsa_signal_group_destroy_fn(signal_group); } hsa_status_t HSA_API hsa_signal_group_wait_any_relaxed(hsa_signal_group_t signal_group, const hsa_signal_condition_t* conditions, const hsa_signal_value_t* compare_values, hsa_wait_state_t wait_state_hint, hsa_signal_t* signal, hsa_signal_value_t* value) { return coreApiTable->hsa_signal_group_wait_any_relaxed_fn( signal_group, conditions, compare_values, wait_state_hint, signal, value); } hsa_status_t HSA_API hsa_signal_group_wait_any_scacquire(hsa_signal_group_t signal_group, const hsa_signal_condition_t* conditions, const hsa_signal_value_t* compare_values, hsa_wait_state_t wait_state_hint, hsa_signal_t* signal, hsa_signal_value_t* value) { return coreApiTable->hsa_signal_group_wait_any_scacquire_fn( signal_group, conditions, compare_values, wait_state_hint, signal, value); } void HSA_API hsa_signal_and_relaxed(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_and_relaxed_fn(signal, value); } void HSA_API hsa_signal_and_scacquire(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_and_scacquire_fn(signal, value); } void HSA_API hsa_signal_and_screlease(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_and_screlease_fn(signal, value); } void HSA_API hsa_signal_and_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_and_scacq_screl_fn(signal, value); } void HSA_API hsa_signal_or_relaxed(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_or_relaxed_fn(signal, value); } void HSA_API hsa_signal_or_scacquire(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_or_scacquire_fn(signal, value); } void HSA_API hsa_signal_or_screlease(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_or_screlease_fn(signal, value); } void HSA_API hsa_signal_or_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_or_scacq_screl_fn(signal, value); } void HSA_API hsa_signal_xor_relaxed(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_xor_relaxed_fn(signal, value); } void HSA_API hsa_signal_xor_scacquire(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_xor_scacquire_fn(signal, value); } void HSA_API hsa_signal_xor_screlease(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_xor_screlease_fn(signal, value); } void HSA_API hsa_signal_xor_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_xor_scacq_screl_fn(signal, value); } void HSA_API hsa_signal_add_relaxed(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_add_relaxed_fn(signal, value); } void HSA_API hsa_signal_add_scacquire(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_add_scacquire_fn(signal, value); } void HSA_API hsa_signal_add_screlease(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_add_screlease_fn(signal, value); } void HSA_API hsa_signal_add_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_add_scacq_screl_fn(signal, value); } void HSA_API hsa_signal_subtract_relaxed(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_subtract_relaxed_fn(signal, value); } void HSA_API hsa_signal_subtract_scacquire(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_subtract_scacquire_fn(signal, value); } void HSA_API hsa_signal_subtract_screlease(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_subtract_screlease_fn(signal, value); } void HSA_API hsa_signal_subtract_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_subtract_scacq_screl_fn(signal, value); } hsa_signal_value_t HSA_API hsa_signal_exchange_relaxed(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_exchange_relaxed_fn(signal, value); } hsa_signal_value_t HSA_API hsa_signal_exchange_scacquire(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_exchange_scacquire_fn(signal, value); } hsa_signal_value_t HSA_API hsa_signal_exchange_screlease(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_exchange_screlease_fn(signal, value); } hsa_signal_value_t HSA_API hsa_signal_exchange_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value) { return coreApiTable->hsa_signal_exchange_scacq_screl_fn(signal, value); } hsa_signal_value_t HSA_API hsa_signal_cas_relaxed(hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value) { return coreApiTable->hsa_signal_cas_relaxed_fn(signal, expected, value); } hsa_signal_value_t HSA_API hsa_signal_cas_scacquire(hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value) { return coreApiTable->hsa_signal_cas_scacquire_fn(signal, expected, value); } hsa_signal_value_t HSA_API hsa_signal_cas_screlease(hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value) { return coreApiTable->hsa_signal_cas_screlease_fn(signal, expected, value); } hsa_signal_value_t HSA_API hsa_signal_cas_scacq_screl(hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value) { return coreApiTable->hsa_signal_cas_scacq_screl_fn(signal, expected, value); } //===--- Instruction Set Architecture -------------------------------------===// hsa_status_t HSA_API hsa_isa_from_name( const char *name, hsa_isa_t *isa) { return coreApiTable->hsa_isa_from_name_fn(name, isa); } hsa_status_t HSA_API hsa_agent_iterate_isas( hsa_agent_t agent, hsa_status_t (*callback)(hsa_isa_t isa, void *data), void *data) { return coreApiTable->hsa_agent_iterate_isas_fn(agent, callback, data); } /* deprecated */ hsa_status_t HSA_API hsa_isa_get_info( hsa_isa_t isa, hsa_isa_info_t attribute, uint32_t index, void *value) { return coreApiTable->hsa_isa_get_info_fn(isa, attribute, index, value); } hsa_status_t HSA_API hsa_isa_get_info_alt( hsa_isa_t isa, hsa_isa_info_t attribute, void *value) { return coreApiTable->hsa_isa_get_info_alt_fn(isa, attribute, value); } hsa_status_t HSA_API hsa_isa_get_exception_policies( hsa_isa_t isa, hsa_profile_t profile, uint16_t *mask) { return coreApiTable->hsa_isa_get_exception_policies_fn(isa, profile, mask); } hsa_status_t HSA_API hsa_isa_get_round_method( hsa_isa_t isa, hsa_fp_type_t fp_type, hsa_flush_mode_t flush_mode, hsa_round_method_t *round_method) { return coreApiTable->hsa_isa_get_round_method_fn( isa, fp_type, flush_mode, round_method); } hsa_status_t HSA_API hsa_wavefront_get_info( hsa_wavefront_t wavefront, hsa_wavefront_info_t attribute, void *value) { return coreApiTable->hsa_wavefront_get_info_fn(wavefront, attribute, value); } hsa_status_t HSA_API hsa_isa_iterate_wavefronts( hsa_isa_t isa, hsa_status_t (*callback)(hsa_wavefront_t wavefront, void *data), void *data) { return coreApiTable->hsa_isa_iterate_wavefronts_fn(isa, callback, data); } /* deprecated */ hsa_status_t HSA_API hsa_isa_compatible( hsa_isa_t code_object_isa, hsa_isa_t agent_isa, bool *result) { return coreApiTable->hsa_isa_compatible_fn( code_object_isa, agent_isa, result); } //===--- Code Objects (deprecated) ----------------------------------------===// /* deprecated */ hsa_status_t HSA_API hsa_code_object_serialize( hsa_code_object_t code_object, hsa_status_t (*alloc_callback)(size_t size, hsa_callback_data_t data, void **address), hsa_callback_data_t callback_data, const char *options, void **serialized_code_object, size_t *serialized_code_object_size) { return coreApiTable->hsa_code_object_serialize_fn( code_object, alloc_callback, callback_data, options, serialized_code_object, serialized_code_object_size); } /* deprecated */ hsa_status_t HSA_API hsa_code_object_deserialize( void *serialized_code_object, size_t serialized_code_object_size, const char *options, hsa_code_object_t *code_object) { return coreApiTable->hsa_code_object_deserialize_fn( serialized_code_object, serialized_code_object_size, options, code_object); } /* deprecated */ hsa_status_t HSA_API hsa_code_object_destroy( hsa_code_object_t code_object) { return coreApiTable->hsa_code_object_destroy_fn(code_object); } /* deprecated */ hsa_status_t HSA_API hsa_code_object_get_info( hsa_code_object_t code_object, hsa_code_object_info_t attribute, void *value) { return coreApiTable->hsa_code_object_get_info_fn( code_object, attribute, value); } /* deprecated */ hsa_status_t HSA_API hsa_code_object_get_symbol( hsa_code_object_t code_object, const char *symbol_name, hsa_code_symbol_t *symbol) { return coreApiTable->hsa_code_object_get_symbol_fn( code_object, symbol_name, symbol); } /* deprecated */ hsa_status_t HSA_API hsa_code_object_get_symbol_from_name( hsa_code_object_t code_object, const char *module_name, const char *symbol_name, hsa_code_symbol_t *symbol) { return coreApiTable->hsa_code_object_get_symbol_from_name_fn( code_object, module_name, symbol_name, symbol); } /* deprecated */ hsa_status_t HSA_API hsa_code_symbol_get_info( hsa_code_symbol_t code_symbol, hsa_code_symbol_info_t attribute, void *value) { return coreApiTable->hsa_code_symbol_get_info_fn( code_symbol, attribute, value); } /* deprecated */ hsa_status_t HSA_API hsa_code_object_iterate_symbols( hsa_code_object_t code_object, hsa_status_t (*callback)(hsa_code_object_t code_object, hsa_code_symbol_t symbol, void *data), void *data) { return coreApiTable->hsa_code_object_iterate_symbols_fn( code_object, callback, data); } //===--- Executable -------------------------------------------------------===// hsa_status_t HSA_API hsa_code_object_reader_create_from_file( hsa_file_t file, hsa_code_object_reader_t *code_object_reader) { return coreApiTable->hsa_code_object_reader_create_from_file_fn( file, code_object_reader); } hsa_status_t HSA_API hsa_code_object_reader_create_from_memory( const void *code_object, size_t size, hsa_code_object_reader_t *code_object_reader) { return coreApiTable->hsa_code_object_reader_create_from_memory_fn( code_object, size, code_object_reader); } hsa_status_t HSA_API hsa_code_object_reader_destroy( hsa_code_object_reader_t code_object_reader) { return coreApiTable->hsa_code_object_reader_destroy_fn(code_object_reader); } /* deprecated */ hsa_status_t HSA_API hsa_executable_create( hsa_profile_t profile, hsa_executable_state_t executable_state, const char *options, hsa_executable_t *executable) { return coreApiTable->hsa_executable_create_fn( profile, executable_state, options, executable); } hsa_status_t HSA_API hsa_executable_create_alt( hsa_profile_t profile, hsa_default_float_rounding_mode_t default_float_rounding_mode, const char *options, hsa_executable_t *executable) { return coreApiTable->hsa_executable_create_alt_fn( profile, default_float_rounding_mode, options, executable); } hsa_status_t HSA_API hsa_executable_destroy( hsa_executable_t executable) { return coreApiTable->hsa_executable_destroy_fn(executable); } /* deprecated */ hsa_status_t HSA_API hsa_executable_load_code_object( hsa_executable_t executable, hsa_agent_t agent, hsa_code_object_t code_object, const char *options) { return coreApiTable->hsa_executable_load_code_object_fn( executable, agent, code_object, options); } hsa_status_t HSA_API hsa_executable_load_program_code_object( hsa_executable_t executable, hsa_code_object_reader_t code_object_reader, const char *options, hsa_loaded_code_object_t *loaded_code_object) { return coreApiTable->hsa_executable_load_program_code_object_fn( executable, code_object_reader, options, loaded_code_object); } hsa_status_t HSA_API hsa_executable_load_agent_code_object( hsa_executable_t executable, hsa_agent_t agent, hsa_code_object_reader_t code_object_reader, const char *options, hsa_loaded_code_object_t *loaded_code_object) { return coreApiTable->hsa_executable_load_agent_code_object_fn( executable, agent, code_object_reader, options, loaded_code_object); } hsa_status_t HSA_API hsa_executable_freeze( hsa_executable_t executable, const char *options) { return coreApiTable->hsa_executable_freeze_fn(executable, options); } hsa_status_t HSA_API hsa_executable_get_info( hsa_executable_t executable, hsa_executable_info_t attribute, void *value) { return coreApiTable->hsa_executable_get_info_fn(executable, attribute, value); } hsa_status_t HSA_API hsa_executable_global_variable_define( hsa_executable_t executable, const char *variable_name, void *address) { return coreApiTable->hsa_executable_global_variable_define_fn( executable, variable_name, address); } hsa_status_t HSA_API hsa_executable_agent_global_variable_define( hsa_executable_t executable, hsa_agent_t agent, const char *variable_name, void *address) { return coreApiTable->hsa_executable_agent_global_variable_define_fn( executable, agent, variable_name, address); } hsa_status_t HSA_API hsa_executable_readonly_variable_define( hsa_executable_t executable, hsa_agent_t agent, const char *variable_name, void *address) { return coreApiTable->hsa_executable_readonly_variable_define_fn( executable, agent, variable_name, address); } hsa_status_t HSA_API hsa_executable_validate( hsa_executable_t executable, uint32_t *result) { return coreApiTable->hsa_executable_validate_fn(executable, result); } hsa_status_t HSA_API hsa_executable_validate_alt( hsa_executable_t executable, const char *options, uint32_t *result) { return coreApiTable->hsa_executable_validate_alt_fn( executable, options, result); } /* deprecated */ hsa_status_t HSA_API hsa_executable_get_symbol( hsa_executable_t executable, const char *module_name, const char *symbol_name, hsa_agent_t agent, int32_t call_convention, hsa_executable_symbol_t *symbol) { return coreApiTable->hsa_executable_get_symbol_fn( executable, module_name, symbol_name, agent, call_convention, symbol); } hsa_status_t HSA_API hsa_executable_get_symbol_by_name( hsa_executable_t executable, const char *symbol_name, const hsa_agent_t *agent, hsa_executable_symbol_t *symbol) { return coreApiTable->hsa_executable_get_symbol_by_name_fn( executable, symbol_name, agent, symbol); } hsa_status_t HSA_API hsa_executable_symbol_get_info( hsa_executable_symbol_t executable_symbol, hsa_executable_symbol_info_t attribute, void *value) { return coreApiTable->hsa_executable_symbol_get_info_fn( executable_symbol, attribute, value); } /* deprecated */ hsa_status_t HSA_API hsa_executable_iterate_symbols( hsa_executable_t executable, hsa_status_t (*callback)(hsa_executable_t executable, hsa_executable_symbol_t symbol, void *data), void *data) { return coreApiTable->hsa_executable_iterate_symbols_fn( executable, callback, data); } hsa_status_t HSA_API hsa_executable_iterate_agent_symbols( hsa_executable_t executable, hsa_agent_t agent, hsa_status_t (*callback)(hsa_executable_t exec, hsa_agent_t agent, hsa_executable_symbol_t symbol, void *data), void *data) { return coreApiTable->hsa_executable_iterate_agent_symbols_fn( executable, agent, callback, data); } hsa_status_t HSA_API hsa_executable_iterate_program_symbols( hsa_executable_t executable, hsa_status_t (*callback)(hsa_executable_t exec, hsa_executable_symbol_t symbol, void *data), void *data) { return coreApiTable->hsa_executable_iterate_program_symbols_fn( executable, callback, data); } //===--- Runtime Notifications --------------------------------------------===// hsa_status_t HSA_API hsa_status_string( hsa_status_t status, const char **status_string) { return coreApiTable->hsa_status_string_fn(status, status_string); } /* * Following set of functions are bundled as AMD Extension Apis */ // Pass through stub functions hsa_status_t HSA_API hsa_amd_coherency_get_type(hsa_agent_t agent, hsa_amd_coherency_type_t* type) { return amdExtTable->hsa_amd_coherency_get_type_fn(agent, type); } // Pass through stub functions hsa_status_t HSA_API hsa_amd_coherency_set_type(hsa_agent_t agent, hsa_amd_coherency_type_t type) { return amdExtTable->hsa_amd_coherency_set_type_fn(agent, type); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_profiling_set_profiler_enabled(hsa_queue_t* queue, int enable) { return amdExtTable->hsa_amd_profiling_set_profiler_enabled_fn( queue, enable); } hsa_status_t HSA_API hsa_amd_profiling_async_copy_enable(bool enable) { return amdExtTable->hsa_amd_profiling_async_copy_enable_fn(enable); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_profiling_get_dispatch_time( hsa_agent_t agent, hsa_signal_t signal, hsa_amd_profiling_dispatch_time_t* time) { return amdExtTable->hsa_amd_profiling_get_dispatch_time_fn( agent, signal, time); } hsa_status_t HSA_API hsa_amd_profiling_get_async_copy_time( hsa_signal_t hsa_signal, hsa_amd_profiling_async_copy_time_t* time) { return amdExtTable->hsa_amd_profiling_get_async_copy_time_fn(hsa_signal, time); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_profiling_convert_tick_to_system_domain(hsa_agent_t agent, uint64_t agent_tick, uint64_t* system_tick) { return amdExtTable->hsa_amd_profiling_convert_tick_to_system_domain_fn( agent, agent_tick, system_tick); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_signal_async_handler(hsa_signal_t signal, hsa_signal_condition_t cond, hsa_signal_value_t value, hsa_amd_signal_handler handler, void* arg) { return amdExtTable->hsa_amd_signal_async_handler_fn( signal, cond, value, handler, arg); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_async_function(void (*callback)(void* arg), void* arg) { return amdExtTable->hsa_amd_async_function_fn(callback, arg); } // Mirrors Amd Extension Apis uint32_t HSA_API hsa_amd_signal_wait_any(uint32_t signal_count, hsa_signal_t* signals, hsa_signal_condition_t* conds, hsa_signal_value_t* values, uint64_t timeout_hint, hsa_wait_state_t wait_hint, hsa_signal_value_t* satisfying_value) { return amdExtTable->hsa_amd_signal_wait_any_fn( signal_count, signals, conds, values, timeout_hint, wait_hint, satisfying_value); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_queue_cu_set_mask(const hsa_queue_t* queue, uint32_t num_cu_mask_count, const uint32_t* cu_mask) { return amdExtTable->hsa_amd_queue_cu_set_mask_fn( queue, num_cu_mask_count, cu_mask); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_queue_cu_get_mask(const hsa_queue_t* queue, uint32_t num_cu_mask_count, uint32_t* cu_mask) { return amdExtTable->hsa_amd_queue_cu_get_mask_fn(queue, num_cu_mask_count, cu_mask); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_pool_get_info(hsa_amd_memory_pool_t memory_pool, hsa_amd_memory_pool_info_t attribute, void* value) { return amdExtTable->hsa_amd_memory_pool_get_info_fn( memory_pool, attribute, value); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_agent_iterate_memory_pools( hsa_agent_t agent, hsa_status_t (*callback)(hsa_amd_memory_pool_t memory_pool, void* data), void* data) { return amdExtTable->hsa_amd_agent_iterate_memory_pools_fn( agent, callback, data); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_pool_allocate(hsa_amd_memory_pool_t memory_pool, size_t size, uint32_t flags, void** ptr) { return amdExtTable->hsa_amd_memory_pool_allocate_fn( memory_pool, size, flags, ptr); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_pool_free(void* ptr) { return amdExtTable->hsa_amd_memory_pool_free_fn(ptr); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_async_copy(void* dst, hsa_agent_t dst_agent, const void* src, hsa_agent_t src_agent, size_t size, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal) { return amdExtTable->hsa_amd_memory_async_copy_fn( dst, dst_agent, src, src_agent, size, num_dep_signals, dep_signals, completion_signal); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_async_copy_rect( const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, hsa_agent_t copy_agent, hsa_amd_copy_direction_t dir, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal) { return amdExtTable->hsa_amd_memory_async_copy_rect_fn(dst, dst_offset, src, src_offset, range, copy_agent, dir, num_dep_signals, dep_signals, completion_signal); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_agent_memory_pool_get_info( hsa_agent_t agent, hsa_amd_memory_pool_t memory_pool, hsa_amd_agent_memory_pool_info_t attribute, void* value) { return amdExtTable->hsa_amd_agent_memory_pool_get_info_fn( agent, memory_pool, attribute, value); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_agents_allow_access(uint32_t num_agents, const hsa_agent_t* agents, const uint32_t* flags, const void* ptr) { return amdExtTable->hsa_amd_agents_allow_access_fn( num_agents, agents, flags, ptr); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_pool_can_migrate(hsa_amd_memory_pool_t src_memory_pool, hsa_amd_memory_pool_t dst_memory_pool, bool* result) { return amdExtTable->hsa_amd_memory_pool_can_migrate_fn( src_memory_pool, dst_memory_pool, result); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_migrate(const void* ptr, hsa_amd_memory_pool_t memory_pool, uint32_t flags) { return amdExtTable->hsa_amd_memory_migrate_fn( ptr, memory_pool, flags); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_lock(void* host_ptr, size_t size, hsa_agent_t* agents, int num_agent, void** agent_ptr) { return amdExtTable->hsa_amd_memory_lock_fn( host_ptr, size, agents, num_agent, agent_ptr); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_lock_to_pool(void* host_ptr, size_t size, hsa_agent_t* agents, int num_agent, hsa_amd_memory_pool_t pool, uint32_t flags, void** agent_ptr) { return amdExtTable->hsa_amd_memory_lock_to_pool_fn(host_ptr, size, agents, num_agent, pool, flags, agent_ptr); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_unlock(void* host_ptr) { return amdExtTable->hsa_amd_memory_unlock_fn(host_ptr); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_memory_fill(void* ptr, uint32_t value, size_t count) { return amdExtTable->hsa_amd_memory_fill_fn(ptr, value, count); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_interop_map_buffer(uint32_t num_agents, hsa_agent_t* agents, int interop_handle, uint32_t flags, size_t* size, void** ptr, size_t* metadata_size, const void** metadata) { return amdExtTable->hsa_amd_interop_map_buffer_fn( num_agents, agents, interop_handle, flags, size, ptr, metadata_size, metadata); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_interop_unmap_buffer(void* ptr) { return amdExtTable->hsa_amd_interop_unmap_buffer_fn(ptr); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_image_create( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, const hsa_amd_image_descriptor_t *image_layout, const void *image_data, hsa_access_permission_t access_permission, hsa_ext_image_t *image) { return amdExtTable->hsa_amd_image_create_fn(agent, image_descriptor, image_layout, image_data, access_permission, image); } // Mirrors Amd Extension Apis hsa_status_t hsa_amd_pointer_info(const void* ptr, hsa_amd_pointer_info_t* info, void* (*alloc)(size_t), uint32_t* num_agents_accessible, hsa_agent_t** accessible) { return amdExtTable->hsa_amd_pointer_info_fn(ptr, info, alloc, num_agents_accessible, accessible); } // Mirrors Amd Extension Apis hsa_status_t hsa_amd_pointer_info_set_userdata(const void* ptr, void* userptr) { return amdExtTable->hsa_amd_pointer_info_set_userdata_fn(ptr, userptr); } // Mirrors Amd Extension Apis hsa_status_t hsa_amd_ipc_memory_create(void* ptr, size_t len, hsa_amd_ipc_memory_t* handle) { return amdExtTable->hsa_amd_ipc_memory_create_fn(ptr, len, handle); } // Mirrors Amd Extension Apis hsa_status_t hsa_amd_ipc_memory_attach(const hsa_amd_ipc_memory_t* ipc, size_t len, uint32_t num_agents, const hsa_agent_t* mapping_agents, void** mapped_ptr) { return amdExtTable->hsa_amd_ipc_memory_attach_fn(ipc, len, num_agents, mapping_agents, mapped_ptr); } // Mirrors Amd Extension Apis hsa_status_t hsa_amd_ipc_memory_detach(void* mapped_ptr) { return amdExtTable->hsa_amd_ipc_memory_detach_fn(mapped_ptr); } // Mirrors Amd Extension Apis hsa_status_t hsa_amd_signal_create(hsa_signal_value_t initial_value, uint32_t num_consumers, const hsa_agent_t* consumers, uint64_t attributes, hsa_signal_t* signal) { return amdExtTable->hsa_amd_signal_create_fn(initial_value, num_consumers, consumers, attributes, signal); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_ipc_signal_create(hsa_signal_t signal, hsa_amd_ipc_signal_t* handle) { return amdExtTable->hsa_amd_ipc_signal_create_fn(signal, handle); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_ipc_signal_attach(const hsa_amd_ipc_signal_t* handle, hsa_signal_t* signal) { return amdExtTable->hsa_amd_ipc_signal_attach_fn(handle, signal); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_register_system_event_handler( hsa_amd_system_event_callback_t callback, void* data) { return amdExtTable->hsa_amd_register_system_event_handler_fn(callback, data); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_queue_set_priority(hsa_queue_t* queue, hsa_amd_queue_priority_t priority) { return amdExtTable->hsa_amd_queue_set_priority_fn(queue, priority); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_register_deallocation_callback(void* ptr, hsa_amd_deallocation_callback_t callback, void* user_data) { return amdExtTable->hsa_amd_register_deallocation_callback_fn(ptr, callback, user_data); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_deregister_deallocation_callback(void* ptr, hsa_amd_deallocation_callback_t callback) { return amdExtTable->hsa_amd_deregister_deallocation_callback_fn(ptr, callback); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_signal_value_pointer(hsa_signal_t signal, volatile hsa_signal_value_t** value_ptr) { return amdExtTable->hsa_amd_signal_value_pointer_fn(signal, value_ptr); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_svm_attributes_set(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count) { return amdExtTable->hsa_amd_svm_attributes_set_fn(ptr, size, attribute_list, attribute_count); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_svm_attributes_get(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count) { return amdExtTable->hsa_amd_svm_attributes_get_fn(ptr, size, attribute_list, attribute_count); } // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_svm_prefetch_async(void* ptr, size_t size, hsa_agent_t agent, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal) { return amdExtTable->hsa_amd_svm_prefetch_async_fn(ptr, size, agent, num_dep_signals, dep_signals, completion_signal); } // Tools only table interfaces. namespace rocr { // Mirrors Amd Extension Apis hsa_status_t hsa_amd_queue_intercept_create( hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type, void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data, uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue) { return amdExtTable->hsa_amd_queue_intercept_create_fn( agent_handle, size, type, callback, data, private_segment_size, group_segment_size, queue); } // Mirrors Amd Extension Apis hsa_status_t hsa_amd_queue_intercept_register(hsa_queue_t* queue, hsa_amd_queue_intercept_handler callback, void* user_data) { return amdExtTable->hsa_amd_queue_intercept_register_fn(queue, callback, user_data); } } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/common/shared.cpp000066400000000000000000000043331420110115200216070ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/common/shared.h" namespace rocr { namespace core { std::function BaseShared::allocate_ = nullptr; std::function BaseShared::free_ = nullptr; } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/common/shared.h000066400000000000000000000162761420110115200212650ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTME_CORE_INC_SHARED_H_ #define HSA_RUNTME_CORE_INC_SHARED_H_ #include #include #include #include #include "core/util/utils.h" namespace rocr { namespace core { /// @brief Base class encapsulating the allocator and deallocator for /// shared shared object. As used this will allocate GPU visible host /// memory mapped to all GPUs. class BaseShared { public: static void SetAllocateAndFree( const std::function& allocate, const std::function& free) { allocate_ = allocate; free_ = free; } protected: static std::function allocate_; static std::function free_; }; /// @brief Default Allocator for Shared. Ensures allocations are whole pages. template class PageAllocator : private BaseShared { public: __forceinline static T* alloc(int flags = 0) { T* ret = reinterpret_cast(allocate_(AlignUp(sizeof(T), 4096), 4096, flags)); if (ret == nullptr) throw std::bad_alloc(); MAKE_NAMED_SCOPE_GUARD(throwGuard, [&]() { free_(ret); }); new (ret) T; throwGuard.Dismiss(); return ret; } __forceinline static void free(T* ptr) { if (ptr != nullptr) { ptr->~T(); free_(ptr); } } }; /// @brief Container for object located in GPU visible host memory. /// If a custom allocator is not given then data will be placed in dedicated pages. template > class Shared final : private BaseShared { public: explicit Shared(Allocator* pool = nullptr, int flags = 0) : pool_(pool) { assert(allocate_ != nullptr && free_ != nullptr && "Shared object allocator is not set"); if (pool_) shared_object_ = pool_->alloc(); else shared_object_ = PageAllocator::alloc(flags); } ~Shared() { assert(allocate_ != nullptr && free_ != nullptr && "Shared object allocator is not set"); if (pool_) pool_->free(shared_object_); else PageAllocator::free(shared_object_); } Shared(Shared&& rhs) { this->~Shared(); shared_object_ = rhs.shared_object_; rhs.shared_object_ = nullptr; pool_ = rhs.pool_; rhs.pool_ = nullptr; } Shared& operator=(Shared&& rhs) { this->~Shared(); shared_object_ = rhs.shared_object_; rhs.shared_object_ = nullptr; pool_ = rhs.pool_; rhs.pool_ = nullptr; return *this; } T* shared_object() const { return shared_object_; } private: T* shared_object_; Allocator* pool_; }; template class Shared> final : private BaseShared { public: Shared() { assert(allocate_ != nullptr && free_ != nullptr && "Shared object allocator is not set"); shared_object_ = PageAllocator::alloc(); } ~Shared() { assert(allocate_ != nullptr && free_ != nullptr && "Shared object allocator is not set"); PageAllocator::free(shared_object_); } Shared(Shared&& rhs) { this->~Shared(); shared_object_ = rhs.shared_object_; rhs.shared_object_ = nullptr; } Shared& operator=(Shared&& rhs) { this->~Shared(); shared_object_ = rhs.shared_object_; rhs.shared_object_ = nullptr; return *this; } T* shared_object() const { return shared_object_; } private: T* shared_object_; }; /// @brief Container for array located in GPU visible host memory. /// Alignment defaults to __alignof(T) but may be increased. template class SharedArray final : private BaseShared { public: SharedArray() : shared_object_(nullptr) {} explicit SharedArray(size_t length) : shared_object_(nullptr), len(length) { assert(allocate_ != nullptr && free_ != nullptr && "Shared object allocator is not set"); static_assert((__alignof(T) <= Align) || (Align == 0), "Align is less than alignof(T)"); shared_object_ = reinterpret_cast(allocate_(sizeof(T) * length, Max(__alignof(T), Align), 0)); if (shared_object_ == nullptr) throw std::bad_alloc(); size_t i = 0; MAKE_NAMED_SCOPE_GUARD(loopGuard, [&]() { for (size_t t = 0; t < i - 1; t++) shared_object_[t].~T(); free_(shared_object_); }); for (; i < length; i++) new (&shared_object_[i]) T; loopGuard.Dismiss(); } ~SharedArray() { assert(allocate_ != nullptr && free_ != nullptr && "Shared object allocator is not set"); if (shared_object_ != nullptr) { for (size_t i = 0; i < len; i++) shared_object_[i].~T(); free_(shared_object_); } } SharedArray(SharedArray&& rhs) { this->~SharedArray(); shared_object_ = rhs.shared_object_; rhs.shared_object_ = nullptr; len = rhs.len; } SharedArray& operator=(SharedArray&& rhs) { this->~SharedArray(); shared_object_ = rhs.shared_object_; rhs.shared_object_ = nullptr; len = rhs.len; return *this; } T& operator[](size_t index) { assert(index < len && "Index out of bounds."); return shared_object_[index]; } const T& operator[](size_t index) const { assert(index < len && "Index out of bounds."); return shared_object_[index]; } private: T* shared_object_; size_t len; }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/000077500000000000000000000000001420110115200171135ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/core/inc/agent.h000066400000000000000000000275441420110115200203760ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA runtime C++ interface file. #ifndef HSA_RUNTME_CORE_INC_AGENT_H_ #define HSA_RUNTME_CORE_INC_AGENT_H_ #include #include #include "core/inc/checked.h" #include "core/inc/isa.h" #include "core/inc/queue.h" #include "core/inc/memory_region.h" #include "core/util/utils.h" #include "core/util/locks.h" namespace rocr { // Forward declare AMD::MemoryRegion namespace AMD { class MemoryRegion; } namespace core { class Signal; typedef void (*HsaEventCallback)(hsa_status_t status, hsa_queue_t* source, void* data); // Agent is intended to be an pure interface class and may be wrapped or // replaced by tools libraries. All funtions other than Convert, node_id, // device_type, and public_handle must be virtual. class Agent : public Checked<0xF6BC25EB17E6F917> { friend class rocr::AMD::MemoryRegion; public: // @brief Convert agent object into hsa_agent_t. // // @param [in] agent Pointer to an agent. // // @retval hsa_agent_t static __forceinline hsa_agent_t Convert(Agent* agent) { const hsa_agent_t agent_handle = { static_cast(reinterpret_cast(agent))}; return agent_handle; } // @brief Convert agent object into const hsa_agent_t. // // @param [in] agent Pointer to an agent. // // @retval const hsa_agent_t static __forceinline const hsa_agent_t Convert(const Agent* agent) { const hsa_agent_t agent_handle = { static_cast(reinterpret_cast(agent))}; return agent_handle; } // @brief Convert hsa_agent_t handle into Agent*. // // @param [in] agent An hsa_agent_t handle. // // @retval Agent* static __forceinline Agent* Convert(hsa_agent_t agent) { return reinterpret_cast(agent.handle); } // Lightweight RTTI for vendor specific implementations. enum DeviceType { kAmdGpuDevice = 0, kAmdCpuDevice = 1, kUnknownDevice = 2 }; // @brief Agent class contructor. // // @param [in] type CPU or GPU or other. explicit Agent(uint32_t node_id, DeviceType type) : node_id_(node_id), device_type_(uint32_t(type)), profiling_enabled_(false) { public_handle_ = Convert(this); } // @brief Agent class contructor. // // @param [in] type CPU or GPU or other. explicit Agent(uint32_t node_id, uint32_t type) : node_id_(node_id), device_type_(type), profiling_enabled_(false) { public_handle_ = Convert(this); } // @brief Agent class destructor. virtual ~Agent() {} // @brief Submit DMA copy command to move data from src to dst and wait // until it is finished. // // @details The agent must be able to access @p dst and @p src. // // @param [in] dst Memory address of the destination. // @param [in] src Memory address of the source. // @param [in] size Copy size in bytes. // // @retval HSA_STATUS_SUCCESS The memory copy is finished and successful. virtual hsa_status_t DmaCopy(void* dst, const void* src, size_t size) { return HSA_STATUS_ERROR; } // @brief Submit DMA copy command to move data from src to dst. This call // does not wait until the copy is finished // // @details The agent must be able to access @p dst and @p src. Memory copy // will be performed after all signals in @p dep_signals have value of 0. // On memory copy completion, the value of out_signal is decremented. // // @param [in] dst Memory address of the destination. // @param [in] dst_agent Agent that owns the memory pool associated with @p // dst. // @param [in] src Memory address of the source. // @param [in] src_agent Agent that owns the memory pool associated with @p // src. // @param [in] size Copy size in bytes. // @param [in] dep_signals Array of signal dependency. // @param [in] out_signal Completion signal. // // @retval HSA_STATUS_SUCCESS The memory copy is finished and successful. virtual hsa_status_t DmaCopy(void* dst, core::Agent& dst_agent, const void* src, core::Agent& src_agent, size_t size, std::vector& dep_signals, core::Signal& out_signal) { return HSA_STATUS_ERROR; } // @brief Submit DMA command to set the content of a pointer and wait // until it is finished. // // @details The agent must be able to access @p ptr // // @param [in] ptr Address of the memory to be set. // @param [in] value The value/pattern that will be used to set @p ptr. // @param [in] count Number of uint32_t element to be set. // // @retval HSA_STATUS_SUCCESS The memory fill is finished and successful. virtual hsa_status_t DmaFill(void* ptr, uint32_t value, size_t count) { return HSA_STATUS_ERROR; } // @brief Invoke the user provided callback for each region accessible by // this agent. // // @param [in] callback User provided callback function. // @param [in] data User provided pointer as input for @p callback. // // @retval ::HSA_STATUS_SUCCESS if the callback function for each traversed // region returns ::HSA_STATUS_SUCCESS. virtual hsa_status_t IterateRegion( hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const = 0; // @brief Invoke the callback for each cache useable by this agent. virtual hsa_status_t IterateCache(hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* data) const = 0; // @brief Create queue. // // @param [in] size Number of packets the queue is expected to hold. Must be a // power of 2 greater than 0. // @param [in] queue_type Queue type. // @param [in] event_callback Callback invoked for every // asynchronous event related to the newly created queue. May be NULL.The HSA // runtime passes three arguments to the callback : a code identifying the // event that triggered the invocation, a pointer to the queue where the event // originated, and the application data. // @param [in] data Application data that is passed to @p callback. // @param [in] private_segment_size A hint to indicate the maximum expected // private segment usage per work-item, in bytes. // @param [in] group_segment_size A hint to indicate the maximum expected // group segment usage per work-group, in bytes. // @param[out] queue Memory location where the HSA runtime stores a pointer // to the newly created queue. // // @retval HSA_STATUS_SUCCESS The queue has been created successfully. virtual hsa_status_t QueueCreate(size_t size, hsa_queue_type32_t queue_type, HsaEventCallback event_callback, void* data, uint32_t private_segment_size, uint32_t group_segment_size, Queue** queue) = 0; // @brief Query the value of an attribute. // // @param [in] attribute Attribute to query. // @param [out] value Pointer to store the value of the attribute. // // @param HSA_STATUS_SUCCESS @p value has been filled with the value of the // attribute. virtual hsa_status_t GetInfo(hsa_agent_info_t attribute, void* value) const = 0; // @brief Returns an array of regions owned by the agent. virtual const std::vector& regions() const = 0; // @details Returns the agent's instruction set architecture. virtual const Isa* isa() const = 0; virtual uint64_t HiveId() const { return 0; } // @brief Returns the device type (CPU/GPU/Others). __forceinline uint32_t device_type() const { return device_type_; } // @brief Returns hsa_agent_t handle exposed to end user. // // @details Only matters when tools library need to intercept HSA calls. __forceinline hsa_agent_t public_handle() const { return public_handle_; } // @brief Returns node id associated with this agent. __forceinline uint32_t node_id() const { return node_id_; } // @brief Getter for profiling_enabled_. __forceinline bool profiling_enabled() const { return profiling_enabled_; } // @brief Setter for profiling_enabled_. virtual hsa_status_t profiling_enabled(bool enable) { const hsa_status_t stat = EnableDmaProfiling(enable); if (HSA_STATUS_SUCCESS == stat) { profiling_enabled_ = enable; } return stat; } virtual void Trim() { for (auto region : regions()) region->Trim(); } protected: // Intention here is to have a polymorphic update procedure for public_handle_ // which is callable on any Agent* but only from some class dervied from // Agent*. do_set_public_handle should remain protected or private in all // derived types. static __forceinline void set_public_handle(Agent* agent, hsa_agent_t handle) { agent->do_set_public_handle(handle); } virtual void do_set_public_handle(hsa_agent_t handle) { public_handle_ = handle; } // @brief Enable profiling of the asynchronous DMA copy. The timestamp // of each copy request will be stored in the completion signal structure. // // @param enable True to enable profiling. False to disable profiling. // // @retval HSA_STATUS_SUCCESS The profiling is enabled and the // timing of subsequent async copy will be measured. virtual hsa_status_t EnableDmaProfiling(bool enable) { return HSA_STATUS_SUCCESS; } hsa_agent_t public_handle_; private: // @brief Node id. const uint32_t node_id_; const uint32_t device_type_; bool profiling_enabled_; // Used by an Agent's MemoryRegions to ensure serial memory operation on the device. // Serial memory operations are needed to ensure, among other things, that allocation failures are // due to true OOM conditions and per region caching (Trim and Allocate must be serial and // exclusive to ensure this). KernelMutex agent_memory_lock_; // Forbid copying and moving of this object DISALLOW_COPY_AND_ASSIGN(Agent); }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_aql_queue.h000066400000000000000000000247511420110115200220770ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_AMD_HW_AQL_COMMAND_PROCESSOR_H_ #define HSA_RUNTIME_CORE_INC_AMD_HW_AQL_COMMAND_PROCESSOR_H_ #include "core/inc/runtime.h" #include "core/inc/signal.h" #include "core/inc/queue.h" #include "core/inc/amd_gpu_agent.h" #include "core/util/locks.h" namespace rocr { namespace AMD { /// @brief Encapsulates HW Aql Command Processor functionality. It /// provide the interface for things such as Doorbell register, read, /// write pointers and a buffer. class AqlQueue : public core::Queue, private core::LocalSignal, public core::DoorbellSignal { public: static __forceinline bool IsType(core::Signal* signal) { return signal->IsType(&rtti_id_); } static __forceinline bool IsType(core::Queue* queue) { return queue->IsType(&rtti_id_); } // Acquires/releases queue resources and requests HW schedule/deschedule. AqlQueue(GpuAgent* agent, size_t req_size_pkts, HSAuint32 node_id, ScratchInfo& scratch, core::HsaEventCallback callback, void* err_data, bool is_kv = false); ~AqlQueue(); /// @brief Queue interfaces hsa_status_t Inactivate() override; /// @brief Change the scheduling priority of the queue hsa_status_t SetPriority(HSA_QUEUE_PRIORITY priority) override; /// @brief Destroy ref counted queue void Destroy() override; /// @brief Atomically reads the Read index of with Acquire semantics /// /// @return uint64_t Value of read index uint64_t LoadReadIndexAcquire() override; /// @brief Atomically reads the Read index of with Relaxed semantics /// /// @return uint64_t Value of read index uint64_t LoadReadIndexRelaxed() override; /// @brief Atomically reads the Write index of with Acquire semantics /// /// @return uint64_t Value of write index uint64_t LoadWriteIndexAcquire() override; /// @brief Atomically reads the Write index of with Relaxed semantics /// /// @return uint64_t Value of write index uint64_t LoadWriteIndexRelaxed() override; /// @brief This operation is illegal void StoreReadIndexRelaxed(uint64_t value) override { assert(false); } /// @brief This operation is illegal void StoreReadIndexRelease(uint64_t value) override { assert(false); } /// @brief Atomically writes the Write index of with Relaxed semantics /// /// @param value New value of write index to update with void StoreWriteIndexRelaxed(uint64_t value) override; /// @brief Atomically writes the Write index of with Release semantics /// /// @param value New value of write index to update with void StoreWriteIndexRelease(uint64_t value) override; /// @brief Compares and swaps Write index using Acquire and Release semantics /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t CasWriteIndexAcqRel(uint64_t expected, uint64_t value) override; /// @brief Compares and swaps Write index using Acquire semantics /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t CasWriteIndexAcquire(uint64_t expected, uint64_t value) override; /// @brief Compares and swaps Write index using Relaxed semantics /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t CasWriteIndexRelaxed(uint64_t expected, uint64_t value) override; /// @brief Compares and swaps Write index using Release semantics /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t CasWriteIndexRelease(uint64_t expected, uint64_t value) override; /// @brief Updates the Write index using Acquire and Release semantics /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t AddWriteIndexAcqRel(uint64_t value) override; /// @brief Updates the Write index using Acquire semantics /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t AddWriteIndexAcquire(uint64_t value) override; /// @brief Updates the Write index using Relaxed semantics /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t AddWriteIndexRelaxed(uint64_t value) override; /// @brief Updates the Write index using Release semantics /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t AddWriteIndexRelease(uint64_t value) override; /// @brief Set CU Masking /// /// @param num_cu_mask_count size of mask bit array /// /// @param cu_mask pointer to cu mask /// /// @return hsa_status_t hsa_status_t SetCUMasking(uint32_t num_cu_mask_count, const uint32_t* cu_mask) override; /// @brief Get CU Masking /// /// @param num_cu_mask_count size of mask bit array /// /// @param cu_mask pointer to cu mask /// /// @return hsa_status_t hsa_status_t GetCUMasking(uint32_t num_cu_mask_count, uint32_t* cu_mask) override; // @brief Submits a block of PM4 and waits until it has been executed. void ExecutePM4(uint32_t* cmd_data, size_t cmd_size_b) override; /// @brief Update signal value using Relaxed semantics void StoreRelaxed(hsa_signal_value_t value) override; /// @brief Update signal value using Release semantics void StoreRelease(hsa_signal_value_t value) override; /// @brief Enable use of GWS from this queue. hsa_status_t EnableGWS(int gws_slot_count); protected: bool _IsA(Queue::rtti_t id) const override { return id == &rtti_id_; } private: uint32_t ComputeRingBufferMinPkts(); uint32_t ComputeRingBufferMaxPkts(); // (De)allocates and (de)registers ring_buf_. void AllocRegisteredRingBuffer(uint32_t queue_size_pkts); void FreeRegisteredRingBuffer(); /// @brief Abstracts the file handle use for double mapping queues. void CloseRingBufferFD(const char* ring_buf_shm_path, int fd) const; int CreateRingBufferFD(const char* ring_buf_shm_path, uint32_t ring_buf_phys_size_bytes) const; /// @brief Define the Scratch Buffer Descriptor and related parameters /// that enable kernel access scratch memory void InitScratchSRD(); /// @brief Halt the queue without destroying it or fencing memory. void Suspend(); /// @brief Handler for hardware queue events. template static bool DynamicScratchHandler(hsa_signal_value_t error_code, void* arg); /// @brief Handler for KFD exceptions. static bool ExceptionHandler(hsa_signal_value_t error_code, void* arg); // AQL packet ring buffer void* ring_buf_; // Size of ring_buf_ allocation. // This may be larger than (amd_queue_.hsa_queue.size * sizeof(AqlPacket)). uint32_t ring_buf_alloc_bytes_; // Id of the Queue used in communication with thunk HSA_QUEUEID queue_id_; // Indicates if queue is active std::atomic active_; // Cached value of HsaNodeProperties.HSA_CAPABILITY.DoorbellType int doorbell_type_; // Handle of agent, which queue is attached to GpuAgent* agent_; uint32_t queue_full_workaround_; // Handle of scratch memory descriptor ScratchInfo queue_scratch_; AMD::callback_t errors_callback_; void* errors_data_; // Is KV device queue bool is_kv_queue_; // GPU-visible indirect buffer holding PM4 commands. void* pm4_ib_buf_; uint32_t pm4_ib_size_b_; KernelMutex pm4_ib_mutex_; // Error handler control variable. std::atomic dynamicScratchState, exceptionState; enum { ERROR_HANDLER_DONE = 1, ERROR_HANDLER_TERMINATE = 2, ERROR_HANDLER_SCRATCH_RETRY = 4 }; // Queue currently suspended or scheduled bool suspended_; // Thunk dispatch and wavefront scheduling priority HSA_QUEUE_PRIORITY priority_; // Exception notification signal Signal* exception_signal_; // CU mask lock KernelMutex mask_lock_; // Current CU mask std::vector cu_mask_; // Shared event used for queue errors static HsaEvent* queue_event_; // Queue count - used to ref count queue_event_ static std::atomic queue_count_; // Mutex for queue_event_ manipulation static KernelMutex queue_lock_; static int rtti_id_; // Forbid copying and moving of this object DISALLOW_COPY_AND_ASSIGN(AqlQueue); }; } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_blit_kernel.h000066400000000000000000000155741420110115200224130ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_AMD_BLIT_KERNEL_H_ #define HSA_RUNTIME_CORE_INC_AMD_BLIT_KERNEL_H_ #include #include #include #include "core/inc/blit.h" namespace rocr { namespace AMD { class BlitKernel : public core::Blit { public: explicit BlitKernel(core::Queue* queue); virtual ~BlitKernel() override; /// @brief Initialize a blit kernel object. /// /// @param agent Pointer to the agent that will execute the AQL packets. /// /// @return hsa_status_t hsa_status_t Initialize(const core::Agent& agent); /// @brief Marks the blit kernel object as invalid and uncouples its link with /// the underlying AQL kernel queue. Use of the blit object /// once it has been release is illegal and any behavior is indeterminate /// /// @note: The call will block until all AQL packets have been executed. /// /// @param agent Agent passed to Initialize. /// /// @return hsa_status_t virtual hsa_status_t Destroy(const core::Agent& agent) override; /// @brief Submit an AQL packet to perform vector copy. The call is blocking /// until the command execution is finished. /// /// @param dst Memory address of the copy destination. /// @param src Memory address of the copy source. /// @param size Size of the data to be copied. virtual hsa_status_t SubmitLinearCopyCommand(void* dst, const void* src, size_t size) override; /// @brief Submit a linear copy command to the the underlying compute device's /// control block. The call is non blocking. The memory transfer will start /// after all dependent signals are satisfied. After the transfer is /// completed, the out signal will be decremented. /// /// @param dst Memory address of the copy destination. /// @param src Memory address of the copy source. /// @param size Size of the data to be copied. /// @param dep_signals Arrays of dependent signal. /// @param out_signal Output signal. virtual hsa_status_t SubmitLinearCopyCommand( void* dst, const void* src, size_t size, std::vector& dep_signals, core::Signal& out_signal) override; /// @brief Submit an AQL packet to perform memory fill. The call is blocking /// until the command execution is finished. /// /// @param ptr Memory address of the fill destination. /// @param value Value to be set. /// @param count Number of uint32_t element to be set to the value. virtual hsa_status_t SubmitLinearFillCommand(void* ptr, uint32_t value, size_t count) override; virtual hsa_status_t EnableProfiling(bool enable) override; private: union KernelArgs { struct __ALIGNED__(16) { uint64_t phase1_src_start; uint64_t phase1_dst_start; uint64_t phase2_src_start; uint64_t phase2_dst_start; uint64_t phase3_src_start; uint64_t phase3_dst_start; uint64_t phase4_src_start; uint64_t phase4_dst_start; uint64_t phase4_src_end; uint64_t phase4_dst_end; uint32_t num_workitems; } copy_aligned; struct __ALIGNED__(16) { uint64_t phase1_src_start; uint64_t phase1_dst_start; uint64_t phase2_src_start; uint64_t phase2_dst_start; uint64_t phase2_src_end; uint64_t phase2_dst_end; uint32_t num_workitems; } copy_misaligned; struct __ALIGNED__(16) { uint64_t phase1_dst_start; uint64_t phase2_dst_start; uint64_t phase2_dst_end; uint32_t fill_value; uint32_t num_workitems; } fill; }; /// Reserve a slot in the queue buffer. The call will wait until the queue /// buffer has a room. uint64_t AcquireWriteIndex(uint32_t num_packet); /// Update the queue doorbell register with ::write_index. This /// function also serializes concurrent doorbell update to ensure that the /// packet processor doesn't get invalid packet. void ReleaseWriteIndex(uint64_t write_index, uint32_t num_packet); void PopulateQueue(uint64_t index, uint64_t code_handle, void* args, uint32_t grid_size_x, hsa_signal_t completion_signal); KernelArgs* ObtainAsyncKernelCopyArg(); /// AQL code object and size for each kernel. enum class KernelType { CopyAligned, CopyMisaligned, Fill, }; struct KernelCode { void* code_buf_; size_t code_buf_size_; }; std::map kernels_; /// AQL queue for submitting the vector copy kernel. core::Queue* queue_; uint32_t queue_bitmask_; /// Pointer to the kernel argument buffer. KernelArgs* kernarg_async_; uint32_t kernarg_async_mask_; volatile uint32_t kernarg_async_counter_; /// Completion signal for every kernel dispatched. hsa_signal_t completion_signal_; /// Lock to synchronize access to kernarg_ and completion_signal_ std::mutex lock_; /// Number of CUs on the underlying agent. int num_cus_; }; } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_blit_sdma.h000066400000000000000000000270351420110115200220520ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_AMD_BLIT_SDMA_H_ #define HSA_RUNTIME_CORE_INC_AMD_BLIT_SDMA_H_ #include #include #include #include "hsakmt.h" #include "core/inc/amd_gpu_agent.h" #include "core/inc/blit.h" #include "core/inc/runtime.h" #include "core/inc/signal.h" #include "core/util/utils.h" namespace rocr { namespace AMD { class BlitSdmaBase : public core::Blit { public: static const size_t kQueueSize; static const size_t kCopyPacketSize; static const size_t kMaxSingleCopySize; static const size_t kMaxSingleFillSize; virtual bool isSDMA() const override { return true; } virtual hsa_status_t Initialize(const core::Agent& agent, bool use_xgmi) = 0; virtual hsa_status_t SubmitCopyRectCommand(const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, std::vector& dep_signals, core::Signal& out_signal) = 0; }; // RingIndexTy: 32/64-bit monotonic ring index, counting in bytes. // HwIndexMonotonic: true if SDMA HW index is monotonic, false if it wraps at end of ring. // SizeToCountOffset: value added to size (in bytes) to form SDMA command count field. template class BlitSdma : public BlitSdmaBase { public: BlitSdma(); virtual ~BlitSdma() override; /// @brief Initialize a User Mode SDMA Queue object. Input parameters specify /// properties of queue being created. /// /// @param agent Pointer to the agent that will execute the PM4 commands. /// /// @return hsa_status_t virtual hsa_status_t Initialize(const core::Agent& agent, bool use_xgmi) override; /// @brief Marks the queue object as invalid and uncouples its link with /// the underlying compute device's control block. Use of queue object /// once it has been release is illegal and any behavior is indeterminate /// /// @note: The call will block until all packets have executed. /// /// @param agent Agent passed to Initialize. /// /// @return hsa_status_t virtual hsa_status_t Destroy(const core::Agent& agent) override; /// @brief Submit a linear copy command to the queue buffer. /// /// @param dst Memory address of the copy destination. /// @param src Memory address of the copy source. /// @param size Size of the data to be copied. virtual hsa_status_t SubmitLinearCopyCommand(void* dst, const void* src, size_t size) override; /// @brief Submit a linear copy command to the the underlying compute device's /// control block. The call is non blocking. The memory transfer will start /// after all dependent signals are satisfied. After the transfer is /// completed, the out signal will be decremented. /// /// @param dst Memory address of the copy destination. /// @param src Memory address of the copy source. /// @param size Size of the data to be copied. /// @param dep_signals Arrays of dependent signal. /// @param out_signal Output signal. virtual hsa_status_t SubmitLinearCopyCommand( void* dst, const void* src, size_t size, std::vector& dep_signals, core::Signal& out_signal) override; virtual hsa_status_t SubmitCopyRectCommand(const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, std::vector& dep_signals, core::Signal& out_signal) override; /// @brief Submit a linear fill command to the queue buffer /// /// @param ptr Memory address of the fill destination. /// @param value Value to be set. /// @param count Number of uint32_t element to be set to the value. virtual hsa_status_t SubmitLinearFillCommand(void* ptr, uint32_t value, size_t count) override; virtual hsa_status_t EnableProfiling(bool enable) override; private: /// @brief Acquires the address into queue buffer where a new command /// packet of specified size could be written. The address that is /// returned is guaranteed to be unique even in a multi-threaded access /// scenario. This function is guaranteed to return a pointer for writing /// data into the queue buffer. /// /// @param cmd_size Command packet size in bytes. /// /// @param curr_index (output) Index to pass to ReleaseWriteAddress. /// /// @return pointer into the queue buffer where a PM4 packet of specified size /// could be written. NULL if input size is greater than the size of queue /// buffer. char* AcquireWriteAddress(uint32_t cmd_size, RingIndexTy& curr_index); void UpdateWriteAndDoorbellRegister(RingIndexTy curr_index, RingIndexTy new_index); /// @brief Updates the Write Register of compute device to the end of /// SDMA packet written into queue buffer. The update to Write Register /// will be safe under multi-threaded usage scenario. Furthermore, updates /// to Write Register are blocking until all prior updates are completed /// i.e. if two threads T1 & T2 were to call release, then updates by T2 /// will block until T1 has completed its update (assumes T1 acquired the /// write address first). /// /// @param curr_index Index passed back from AcquireWriteAddress. /// /// @param cmd_size Command packet size in bytes. void ReleaseWriteAddress(RingIndexTy curr_index, uint32_t cmd_size); /// @brief Writes NO-OP words into queue buffer in case writing a command /// causes the queue buffer to wrap. /// /// @param curr_index Index to begin padding from. void PadRingToEnd(RingIndexTy curr_index); uint32_t WrapIntoRing(RingIndexTy index); bool CanWriteUpto(RingIndexTy upto_index); /// @brief Build fence command void BuildFenceCommand(char* fence_command_addr, uint32_t* fence, uint32_t fence_value); /// @brief Build Hdp Flush command void BuildHdpFlushCommand(char* cmd_addr); void BuildCopyCommand(char* cmd_addr, uint32_t num_copy_command, void* dst, const void* src, size_t size); void BuildCopyRectCommand(const std::function& append, const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range); void BuildFillCommand(char* cmd_addr, uint32_t num_fill_command, void* ptr, uint32_t value, size_t count); void BuildPollCommand(char* cmd_addr, void* addr, uint32_t reference); void BuildAtomicDecrementCommand(char* cmd_addr, void* addr); void BuildGetGlobalTimestampCommand(char* cmd_addr, void* write_address); void BuildTrapCommand(char* cmd_addr, uint32_t event_id); void BuildGCRCommand(char* cmd_addr, bool invalidate); hsa_status_t SubmitCommand(const void* cmds, size_t cmd_size, const std::vector& dep_signals, core::Signal& out_signal); hsa_status_t SubmitBlockingCommand(const void* cmds, size_t cmd_size); // Agent object owning the SDMA engine. GpuAgent* agent_; /// Base address of the Queue buffer at construction time. char* queue_start_addr_; // Internal signals for blocking APIs core::unique_signal_ptr signals_[2]; KernelMutex lock_; bool parity_; /// Queue resource descriptor for doorbell, read /// and write indices HsaQueueResource queue_resource_; // Monotonic ring indices, in bytes, tracking written and submitted commands. RingIndexTy cached_reserve_index_; RingIndexTy cached_commit_index_; static const uint32_t linear_copy_command_size_; static const uint32_t fill_command_size_; static const uint32_t fence_command_size_; static const uint32_t poll_command_size_; static const uint32_t flush_command_size_; static const uint32_t atomic_command_size_; static const uint32_t timestamp_command_size_; static const uint32_t trap_command_size_; static const uint32_t gcr_command_size_; // Max copy size of a single linear copy command packet. size_t max_single_linear_copy_size_; /// Max total copy size supported by the queue. size_t max_total_linear_copy_size_; /// Max count of uint32_t of a single fill command packet. size_t max_single_fill_size_; /// Max total fill count supported by the queue. size_t max_total_fill_size_; /// True if platform atomic is supported. bool platform_atomic_support_; /// True if sDMA supports HDP flush bool hdp_flush_support_; }; // Ring indices are 32-bit. // HW ring indices are not monotonic (wrap at end of ring). // Count fields of SDMA commands are 0-based. typedef BlitSdma BlitSdmaV2V3; // Ring indices are 64-bit. // HW ring indices are monotonic (do not wrap at end of ring). // Count fields of SDMA commands are 1-based. typedef BlitSdma BlitSdmaV4; // Ring indices are 64-bit. // HW ring indices are monotonic (do not wrap at end of ring). // Count fields of SDMA commands are 1-based. // SDMA is connected to gL2. typedef BlitSdma BlitSdmaV5; } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_cpu_agent.h000066400000000000000000000142651420110115200220620ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // AMD specific HSA backend. #ifndef HSA_RUNTIME_CORE_INC_AMD_CPU_AGENT_H_ #define HSA_RUNTIME_CORE_INC_AMD_CPU_AGENT_H_ #include #include "hsakmt.h" #include "core/inc/runtime.h" #include "core/inc/agent.h" #include "core/inc/queue.h" #include "core/inc/cache.h" namespace rocr { namespace AMD { // @brief Class to represent a CPU device. class CpuAgent : public core::Agent { public: // @brief CpuAgent constructor. // // @param [in] node Node id. Each CPU in different socket will get distinct // id. // @param [in] node_props Node property. CpuAgent(HSAuint32 node, const HsaNodeProperties& node_props); // @brief CpuAgent destructor. ~CpuAgent(); // @brief Invoke the user provided callback for each region accessible by // this agent. // // @param [in] include_peer If true, the callback will be also invoked on each // peer memory region accessible by this agent. If false, only invoke the // callback on memory region owned by this agent. // @param [in] callback User provided callback function. // @param [in] data User provided pointer as input for @p callback. // // @retval ::HSA_STATUS_SUCCESS if the callback function for each traversed // region returns ::HSA_STATUS_SUCCESS. hsa_status_t VisitRegion(bool include_peer, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const; // @brief Override from core::Agent. hsa_status_t IterateRegion(hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const override; // @brief Override from core::Agent. hsa_status_t IterateCache(hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* value) const override; // @brief Override from core::Agent. hsa_status_t GetInfo(hsa_agent_info_t attribute, void* value) const override; // @brief Override from core::Agent. hsa_status_t QueueCreate(size_t size, hsa_queue_type32_t queue_type, core::HsaEventCallback event_callback, void* data, uint32_t private_segment_size, uint32_t group_segment_size, core::Queue** queue) override; // @brief Returns number of data caches. __forceinline size_t num_cache() const { return cache_props_.size(); } // @brief Returns Hive ID __forceinline uint64_t HiveId() const override { return properties_.HiveID; } // @brief Returns data cache property. // // @param [in] idx Cache level. __forceinline const HsaCacheProperties& cache_prop(int idx) const { return cache_props_[idx]; } // @brief Override from core::Agent. const std::vector& regions() const override { return regions_; } // @brief OVerride from core::Agent. const core::Isa* isa() const override { return NULL; } private: // @brief Query the driver to get the region list owned by this agent. void InitRegionList(); // @brief Query the driver to get the cache properties. void InitCacheList(); // @brief Invoke the user provided callback for every region in @p regions. // // @param [in] regions Array of region object. // @param [in] callback User provided callback function. // @param [in] data User provided pointer as input for @p callback. // // @retval ::HSA_STATUS_SUCCESS if the callback function for each traversed // region returns ::HSA_STATUS_SUCCESS. hsa_status_t VisitRegion( const std::vector& regions, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const; // @brief Node property. const HsaNodeProperties properties_; // @brief Array of data cache property. The array index represents the cache // level. std::vector cache_props_; // @brief Array of HSA cache objects. std::vector> caches_; // @brief Array of regions owned by this agent. std::vector regions_; DISALLOW_COPY_AND_ASSIGN(CpuAgent); }; } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_elf_image.hpp000066400000000000000000000210521420110115200223550ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_ELF_IMAGE_HPP_ #define AMD_ELF_IMAGE_HPP_ #include #include #include #include namespace rocr { namespace amd { namespace elf { class Symbol; class SymbolTable; class Section; class RelocationSection; class Segment { public: virtual ~Segment() { } virtual uint64_t type() const = 0; virtual uint64_t memSize() const = 0; virtual uint64_t align() const = 0; virtual uint64_t imageSize() const = 0; virtual uint64_t vaddr() const = 0; virtual uint64_t flags() const = 0; virtual uint64_t offset() const = 0; virtual const char* data() const = 0; virtual uint16_t getSegmentIndex() = 0; virtual bool updateAddSection(Section *section) = 0; }; class Section { public: virtual ~Section() { } virtual uint16_t getSectionIndex() const = 0; virtual uint32_t type() const = 0; virtual std::string Name() const = 0; virtual uint64_t offset() const = 0; virtual uint64_t addr() const = 0; virtual bool updateAddr(uint64_t addr) = 0; virtual uint64_t addralign() const = 0; virtual uint64_t flags() const = 0; virtual uint64_t size() const = 0; virtual uint64_t nextDataOffset(uint64_t align) const = 0; virtual uint64_t addData(const void *src, uint64_t size, uint64_t align) = 0; virtual bool getData(uint64_t offset, void* dest, uint64_t size) = 0; virtual Segment* segment() = 0; virtual RelocationSection* asRelocationSection() = 0; virtual bool hasRelocationSection() const = 0; virtual RelocationSection* relocationSection(SymbolTable* symtab = 0) = 0; virtual bool setMemSize(uint64_t s) = 0; virtual uint64_t memSize() const = 0; virtual bool setAlign(uint64_t a) = 0; virtual uint64_t memAlign() const = 0; }; class Relocation { public: virtual ~Relocation() { } virtual RelocationSection* section() = 0; virtual uint32_t type() = 0; virtual uint32_t symbolIndex() = 0; virtual Symbol* symbol() = 0; virtual uint64_t offset() = 0; virtual int64_t addend() = 0; }; class RelocationSection : public virtual Section { public: virtual Relocation* addRelocation(uint32_t type, Symbol* symbol, uint64_t offset, int64_t addend) = 0; virtual size_t relocationCount() const = 0; virtual Relocation* relocation(size_t i) = 0; virtual Section* targetSection() = 0; }; class StringTable : public virtual Section { public: virtual const char* addString(const std::string& s) = 0; virtual size_t addString1(const std::string& s) = 0; virtual const char* getString(size_t ndx) = 0; virtual size_t getStringIndex(const char* name) = 0; }; class Symbol { public: virtual ~Symbol() { } virtual uint32_t index() = 0; virtual uint32_t type() = 0; virtual uint32_t binding() = 0; virtual uint64_t size() = 0; virtual uint64_t value() = 0; virtual unsigned char other() = 0; virtual std::string name() = 0; virtual Section* section() = 0; virtual void setValue(uint64_t value) = 0; virtual void setSize(uint64_t size) = 0; }; class SymbolTable : public virtual Section { public: virtual Symbol* addSymbol(Section* section, const std::string& name, uint64_t value, uint64_t size, unsigned char type, unsigned char binding, unsigned char other = 0) = 0; virtual size_t symbolCount() = 0; virtual Symbol* symbol(size_t i) = 0; }; class NoteSection : public virtual Section { public: virtual bool addNote(const std::string& name, uint32_t type, const void* desc = 0, uint32_t desc_size = 0) = 0; virtual bool getNote(const std::string& name, uint32_t type, void** desc, uint32_t* desc_size) = 0; }; class Image { public: virtual ~Image() { } virtual bool initNew(uint16_t machine, uint16_t type, uint8_t os_abi = 0, uint8_t abi_version = 0, uint32_t e_flags = 0) = 0; virtual bool loadFromFile(const std::string& filename) = 0; virtual bool saveToFile(const std::string& filename) = 0; virtual bool initFromBuffer(const void* buffer, size_t size) = 0; virtual bool initAsBuffer(const void* buffer, size_t size) = 0; virtual bool writeTo(const std::string& filename) = 0; virtual bool copyToBuffer(void** buf, size_t* size = 0) = 0; // Copy to new buffer allocated with malloc virtual bool copyToBuffer(void* buf, size_t size) = 0; // Copy to existing buffer of given size. virtual const char* data() = 0; virtual uint64_t size() = 0; virtual uint16_t Machine() = 0; virtual uint16_t Type() = 0; virtual uint32_t EFlags() = 0; virtual uint32_t ABIVersion() = 0; virtual uint32_t EClass() = 0; virtual uint32_t OsAbi() = 0; std::string output() { return out.str(); } virtual bool Freeze() = 0; virtual bool Validate() = 0; virtual StringTable* shstrtab() = 0; virtual StringTable* strtab() = 0; virtual SymbolTable* symtab() = 0; virtual SymbolTable* getSymtab(uint16_t index) = 0; virtual StringTable* addStringTable(const std::string& name) = 0; virtual StringTable* getStringTable(uint16_t index) = 0; virtual SymbolTable* addSymbolTable(const std::string& name, StringTable* stab = 0) = 0; virtual size_t segmentCount() = 0; virtual Segment* segment(size_t i) = 0; virtual Segment* segmentByVAddr(uint64_t vaddr) = 0; virtual size_t sectionCount() = 0; virtual Section* section(size_t i) = 0; virtual Section* sectionByVAddr(uint64_t vaddr) = 0; virtual NoteSection* note() = 0; virtual NoteSection* addNoteSection(const std::string& name) = 0; virtual Segment* initSegment(uint32_t type, uint32_t flags, uint64_t paddr = 0) = 0; virtual bool addSegments() = 0; virtual Section* addSection(const std::string &name, uint32_t type, uint64_t flags = 0, uint64_t entsize = 0, Segment* segment = 0) = 0; virtual RelocationSection* relocationSection(Section* sec, SymbolTable* symtab = 0) = 0; protected: std::ostringstream out; }; Image* NewElf32Image(); Image* NewElf64Image(); uint64_t ElfSize(const void* buffer); std::string GetNoteString(uint32_t s_size, const char* s); } // namespace elf } // namespace amd } // namespace rocr #endif // AMD_ELF_IMAGE_HPP_ ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_filter_device.h000066400000000000000000000204011420110115200227060ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_AMD_FILTER_DEVICE_H_ #define HSA_RUNTIME_CORE_INC_AMD_FILTER_DEVICE_H_ #include #include #include #include #include #include #include "hsakmt.h" namespace rocr { namespace AMD { // ROCr allows users to filter and reorder various Gpu devices that are // present on ROCm system. This ability is made available via environment // variable ROCR_VISIBLE_DEVICES (RVD). Users are allowed to specify a list // of Gpu Identifiers separated by comma delimiter as the value of this env // variable. // // On a ROCm platform instance, a Gpu device could be identified by its: // // Index - Position at which ROCr reports it upon device enumeration // UUID - A string that is unique and is immutable i.e. tags Gpu // instance across systems and power cycles. UUID values // are defined to begin with "GPU-" prefix // // @note: Not all Gpu devices will report valid UUID's. For example, // Only devices from Gfx9 and later will encode valid UUID's. To account // for this and other reasons, the UUID string "GPU-XX" is defined as // indicating those devices. Users can still select those Gpu devices // by using their enumeration index // // Users are allowed to select a device by specifying its UUID string in // full or part. A UUID string that does not uniquely match an agent's // valid UUID prefix is interpreted as terminating. The UUID string // "GPU-XX" will not match and therefore will terminate // // RVD interpreter treats an empty token list as filtering all devices. // Users can use this mode to report ZERO Gpu devices // // RVD interpreter treats a token as Illegal if can't be evaluated into an // instance of Device UUID or Enumeration Index // // RVD interpreter treats a Legal instance of Enumeration Index as Terminating // if any ONE of the following conditions apply: // Value of index lies outside the interval [0 - (numGpuDevices - 1)] // Value of index maps to a device that has been previously selected // // RVD interpreter treats a Legal instance of Device UUID as Terminating // if any ONE of the following conditions apply: // Value of UUID is the literal "GPU-XX" // Value of UUID matches ZERO devices on system // Value of UUID matches TWO or more devices on system // Value of UUID maps to a device that has been previously selected // // RVD interpreter builds the list of Gpu devices to surface using tokens // that are Legal and NOT Terminating // // Following are some examples of RVD value strings and their intepretation // on a ROCm system with four Gpu devices. Assume for now the UUID's of the // four Gpu devices are: // Gpu-0: "GPU-BABABABABABABABA" // Gpu-1: "GPU-ABBAABBAABBAABBA" // Gpu-2: "GPU-BABAABBAABBABABA" // Gpu-3: "GPU-ABBABABABABAABBA" // // Surface ZERO devices // A1) ROCR_VISIBLE_DEVICES="" // A2) ROCR_VISIBLE_DEVICES="-1" // A3) ROCR_VISIBLE_DEVICES="GPU-XX" // // Surface Gpu-3 and Gpu-0 devices in that order // B) ROCR_VISIBLE_DEVICES="3,GPU-BABABABABABABABA,4" // // Surface Gpu-1 and Gpu-2 devices in that order // C) ROCR_VISIBLE_DEVICES="1,GPU-ABBAABBAABBAABBA,GPU-XX" // // Surface Gpu-3 and Gpu-2 devices in that order // D) ROCR_VISIBLE_DEVICES="3,GPU-BABAABBA,GPU-XX" // class RvdFilter { public: /// @brief Constructor RvdFilter() {} // @brief Destructor. ~RvdFilter() {} /// @brief Determine if user has specified environment variable /// ROCR_VISIBLE_DEVICES (RVD) to filter and reorder Gpu devices /// /// @return TRUE if user has defined the env RVD static bool FilterDevices(); /// @brief Determine if user has specified environment variable /// ROCR_VISIBLE_DEVICES (RVD) to filter out all Gpu devices i.e. /// surface ZERO devices /// /// @return TRUE if user has specified ZERO to be surfaced bool SelectZeroDevices(); /// @brief Builds the list of tokens specified by user to filter /// and reorder Gpu devices. A token represents either a Gpu's /// enumeration index or its UUID value. It is possible for the /// list to have no tokens i.e. user has selected zero devices void BuildRvdTokenList(); /// @brief Build the list of Gpu device UUIDs as enumerated by ROCt /// /// @param numNodes Number of ROCm devices present on system, includes /// both Cpu and Gpu's devices void BuildDeviceUuidList(uint32_t numNodes); /// @brief Build the list of Gpu devices that will be enumerated to user /// /// @return Number of Gpu devices to surface upon devices enumeration uint32_t BuildUsrDeviceList(); /// @brief Processes UUID token and returns its enumeration index /// /// @param token RVD token encoding a device's UUID value /// @return int32_t if it is valid, -1 otherwise int32_t ProcessUuidToken(const std::string& token); /// @brief Get the number of Gpu devices that will be surface /// upon device enumeration /// /// @uint32_t Number of devices to enumerate including possibly /// ZERO devices uint32_t GetUsrDeviceListSize(); /// @brief Return the rank of queried Gpu device. If queried device /// is surfaced the number of Gpu devices that will be surface /// upon device enumeration /// /// @int32_t -1 if queried device is not surfaced, else a value in /// the range [0 - (numGpus - 1)] int32_t GetUsrDeviceRank(uint32_t roctIdx); #ifndef NDEBUG /// @brief Set debug UUID values to Gpu devices. This is intended to /// help debug and test RVD module functionality void SetDeviceUuidList(); /// @brief Print the list of Uuids of Gpu devices present on system void PrintDeviceUuidList(); /// @brief Print the list of Gpu devices per their enumeration order void PrintUsrDeviceList(); /// @brief Print the list of tokens specified by user to filter /// and reorder Gpu devices void PrintRvdTokenList(); #endif private: /// @brief List of tokens specified by user to select and reorder std::vector rvdTokenList_; /// @brief Ordered list of ROCt enumerated Gpu device's UUID values std::vector devUuidList_; /// @brief Ordered list of ROCr enumerated Gpu devices std::map usrDeviceList_; }; // End of class RvdFilter } // namespace amd } // namespace rocr #endif // header guard - HSA_RUNTIME_CORE_INC_AMD_FILTER_DEVICE_H_ ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_gpu_agent.h000066400000000000000000000462051420110115200220650ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // AMD specific HSA backend. #ifndef HSA_RUNTIME_CORE_INC_AMD_GPU_AGENT_H_ #define HSA_RUNTIME_CORE_INC_AMD_GPU_AGENT_H_ #include #include #include "hsakmt.h" #include "core/inc/runtime.h" #include "core/inc/agent.h" #include "core/inc/blit.h" #include "core/inc/signal.h" #include "core/inc/cache.h" #include "core/inc/scratch_cache.h" #include "core/util/small_heap.h" #include "core/util/locks.h" #include "core/util/lazy_ptr.h" namespace rocr { namespace AMD { class MemoryRegion; typedef ScratchCache::ScratchInfo ScratchInfo; // @brief Interface to represent a GPU agent. class GpuAgentInt : public core::Agent { public: // @brief Constructor GpuAgentInt(uint32_t node_id) : core::Agent(node_id,core::Agent::DeviceType::kAmdGpuDevice) {} // @brief Ensure blits are ready (performance hint). virtual void PreloadBlits() {} // @brief Initialization hook invoked after tools library has loaded, // to allow tools interception of interface functions. // // @retval HSA_STATUS_SUCCESS if initialization is successful. virtual hsa_status_t PostToolsInit() = 0; // @brief Invoke the user provided callback for each region accessible by // this agent. // // @param [in] include_peer If true, the callback will be also invoked on each // peer memory region accessible by this agent. If false, only invoke the // callback on memory region owned by this agent. // @param [in] callback User provided callback function. // @param [in] data User provided pointer as input for @p callback. // // @retval ::HSA_STATUS_SUCCESS if the callback function for each traversed // region returns ::HSA_STATUS_SUCCESS. virtual hsa_status_t VisitRegion(bool include_peer, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const = 0; // @brief Carve scratch memory from scratch pool. // // @param [in/out] scratch Structure to be populated with the carved memory // information. virtual void AcquireQueueScratch(ScratchInfo& scratch) = 0; // @brief Release scratch memory back to scratch pool. // // @param [in/out] scratch Scratch memory previously acquired with call to // ::AcquireQueueScratch. virtual void ReleaseQueueScratch(ScratchInfo& base) = 0; // @brief Translate the kernel start and end dispatch timestamp from agent // domain to host domain. // // @param [in] signal Pointer to signal that provides the dispatch timing. // @param [out] time Structure to be populated with the host domain value. virtual void TranslateTime(core::Signal* signal, hsa_amd_profiling_dispatch_time_t& time) = 0; // @brief Translate the async copy start and end timestamp from agent // domain to host domain. // // @param [in] signal Pointer to signal that provides the async copy timing. // @param [out] time Structure to be populated with the host domain value. virtual void TranslateTime(core::Signal* signal, hsa_amd_profiling_async_copy_time_t& time) = 0; // @brief Translate timestamp agent domain to host domain. // // @param [out] time Timestamp in agent domain. virtual uint64_t TranslateTime(uint64_t tick) = 0; // @brief Invalidate caches on the agent which may hold code object data. virtual void InvalidateCodeCaches() = 0; // @brief Sets the coherency type of this agent. // // @param [in] type New coherency type. // // @retval true The new coherency type is set successfuly. virtual bool current_coherency_type(hsa_amd_coherency_type_t type) = 0; // @brief Returns the current coherency type of this agent. // // @retval Coherency type. virtual hsa_amd_coherency_type_t current_coherency_type() const = 0; // @brief Query if agent represent Kaveri GPU. // // @retval true if agent is Kaveri GPU. virtual bool is_kv_device() const = 0; // @brief Query the agent HSA profile. // // @retval HSA profile. virtual hsa_profile_t profile() const = 0; // @brief Query the agent memory bus width in bit. // // @retval Bus width in bit. virtual uint32_t memory_bus_width() const = 0; // @brief Query the agent memory maximum frequency in MHz. // // @retval Bus width in MHz. virtual uint32_t memory_max_frequency() const = 0; }; class GpuAgent : public GpuAgentInt { public: // @brief GPU agent constructor. // // @param [in] node Node id. Each CPU in different socket will get distinct // id. // @param [in] node_props Node property. // @param [in] xnack_mode XNACK mode of device. GpuAgent(HSAuint32 node, const HsaNodeProperties& node_props, bool xnack_mode, uint32_t index); // @brief GPU agent destructor. ~GpuAgent(); // @brief Ensure blits are ready (performance hint). void PreloadBlits() override; // @brief Override from core::Agent. hsa_status_t PostToolsInit() override; uint16_t GetMicrocodeVersion() const; uint16_t GetSdmaMicrocodeVersion() const; // @brief Assembles SP3 shader source into ISA or AQL code object. // // @param [in] src_sp3 SP3 shader source text representation. // @param [in] func_name Name of the SP3 function to assemble. // @param [in] assemble_target ISA or AQL assembly target. // @param [out] code_buf Code object buffer. // @param [out] code_buf_size Size of code object buffer in bytes. enum class AssembleTarget { ISA, AQL }; void AssembleShader(const char* func_name, AssembleTarget assemble_target, void*& code_buf, size_t& code_buf_size) const; // @brief Frees code object created by AssembleShader. // // @param [in] code_buf Code object buffer. // @param [in] code_buf_size Size of code object buffer in bytes. void ReleaseShader(void* code_buf, size_t code_buf_size) const; // @brief Override from core::Agent. hsa_status_t VisitRegion(bool include_peer, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const override; // @brief Override from core::Agent. hsa_status_t IterateRegion(hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const override; // @brief Override from core::Agent. hsa_status_t IterateCache(hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* value) const override; // @brief Override from core::Agent. hsa_status_t DmaCopy(void* dst, const void* src, size_t size) override; // @brief Override from core::Agent. hsa_status_t DmaCopy(void* dst, core::Agent& dst_agent, const void* src, core::Agent& src_agent, size_t size, std::vector& dep_signals, core::Signal& out_signal) override; // @brief Override from core::Agent. hsa_status_t DmaCopyRect(const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, hsa_amd_copy_direction_t dir, std::vector& dep_signals, core::Signal& out_signal); // @brief Override from core::Agent. hsa_status_t DmaFill(void* ptr, uint32_t value, size_t count) override; // @brief Override from core::Agent. hsa_status_t GetInfo(hsa_agent_info_t attribute, void* value) const override; // @brief Override from core::Agent. hsa_status_t QueueCreate(size_t size, hsa_queue_type32_t queue_type, core::HsaEventCallback event_callback, void* data, uint32_t private_segment_size, uint32_t group_segment_size, core::Queue** queue) override; // @brief Decrement GWS ref count. void GWSRelease(); // @brief Override from AMD::GpuAgentInt. void AcquireQueueScratch(ScratchInfo& scratch) override; // @brief Override from AMD::GpuAgentInt. void ReleaseQueueScratch(ScratchInfo& scratch) override; // @brief Override from AMD::GpuAgentInt. void TranslateTime(core::Signal* signal, hsa_amd_profiling_dispatch_time_t& time) override; // @brief Override from AMD::GpuAgentInt. void TranslateTime(core::Signal* signal, hsa_amd_profiling_async_copy_time_t& time) override; // @brief Override from AMD::GpuAgentInt. uint64_t TranslateTime(uint64_t tick) override; // @brief Override from AMD::GpuAgentInt. void InvalidateCodeCaches() override; // @brief Override from AMD::GpuAgentInt. bool current_coherency_type(hsa_amd_coherency_type_t type) override; hsa_amd_coherency_type_t current_coherency_type() const override { return current_coherency_type_; } // Getter & setters. // @brief Returns Hive ID __forceinline uint64_t HiveId() const override { return properties_.HiveID; } // @brief Returns node property. __forceinline const HsaNodeProperties& properties() const { return properties_; } // @brief Returns number of data caches. __forceinline size_t num_cache() const { return cache_props_.size(); } // @brief Returns data cache property. // // @param [in] idx Cache level. __forceinline const HsaCacheProperties& cache_prop(int idx) const { return cache_props_[idx]; } // @brief Override from core::Agent. const std::vector& regions() const override { return regions_; } // @brief Override from core::Agent. const core::Isa* isa() const override { return isa_; } // @brief Override from AMD::GpuAgentInt. __forceinline bool is_kv_device() const override { return is_kv_device_; } // @brief Override from AMD::GpuAgentInt. __forceinline hsa_profile_t profile() const override { return profile_; } // @brief Override from AMD::GpuAgentInt. __forceinline uint32_t memory_bus_width() const override { return memory_bus_width_; } // @brief Override from AMD::GpuAgentInt. __forceinline uint32_t memory_max_frequency() const override { return memory_max_frequency_; } // @brief Order the device is surfaced in hsa_iterate_agents counting only // GPU devices. __forceinline uint32_t enumeration_index() const { return enum_index_; } void Trim() override; const std::function& system_allocator() const { return system_allocator_; } const std::function& system_deallocator() const { return system_deallocator_; } protected: static const uint32_t minAqlSize_ = 0x1000; // 4KB min static const uint32_t maxAqlSize_ = 0x20000; // 8MB max // @brief Create an internal queue allowing tools to be notified. core::Queue* CreateInterceptibleQueue() { return CreateInterceptibleQueue(core::Queue::DefaultErrorHandler, nullptr); } // @brief // @brief Create an internal queue, with a custom error handler, allowing tools to be // notified. core::Queue* CreateInterceptibleQueue(void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data); // @brief Create SDMA blit object. // // @retval NULL if SDMA blit creation and initialization failed. core::Blit* CreateBlitSdma(bool use_xgmi); // @brief Create Kernel blit object using provided compute queue. // // @retval NULL if Kernel blit creation and initialization failed. core::Blit* CreateBlitKernel(core::Queue* queue); // @brief Invoke the user provided callback for every region in @p regions. // // @param [in] regions Array of region object. // @param [in] callback User provided callback function. // @param [in] data User provided pointer as input for @p callback. // // @retval ::HSA_STATUS_SUCCESS if the callback function for each traversed // region returns ::HSA_STATUS_SUCCESS. hsa_status_t VisitRegion( const std::vector& regions, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const; // @brief Update ::t1_ tick count. void SyncClocks(); // @brief Binds the second-level trap handler to this node. void BindTrapHandler(); // @brief Override from core::Agent. hsa_status_t EnableDmaProfiling(bool enable) override; // @brief Node properties. const HsaNodeProperties properties_; // @brief Current coherency type. hsa_amd_coherency_type_t current_coherency_type_; // @brief Maximum number of queues that can be created. uint32_t max_queues_; // @brief Object to manage scratch memory. SmallHeap scratch_pool_; // @brief Current short duration scratch memory size. size_t scratch_used_large_; // @brief Notifications for scratch release. std::map scratch_notifiers_; // @brief Default scratch size per queue. size_t queue_scratch_len_; // @brief Default scratch size per work item. size_t scratch_per_thread_; // @brief Blit interfaces for each data path. enum BlitEnum { BlitDevToDev, BlitHostToDev, BlitDevToHost, DefaultBlitCount }; // Blit objects managed by an instance of GpuAgent std::vector> blits_; // List of agents connected via xGMI std::vector xgmi_peer_list_; // Protects xgmi_peer_list_ KernelMutex xgmi_peer_list_lock_; // @brief AQL queues for cache management and blit compute usage. enum QueueEnum { QueueUtility, // Cache management and device to {host,device} blit compute QueueBlitOnly, // Host to device blit QueueCount }; lazy_ptr queues_[QueueCount]; // @brief Mutex to protect the update to coherency type. KernelMutex coherency_lock_; // @brief Mutex to protect access to scratch pool. KernelMutex scratch_lock_; // @brief Mutex to protect access to ::t1_. KernelMutex t1_lock_; // @brief Mutex to protect access to blit objects. KernelMutex blit_lock_; // @brief GPU tick on initialization. HsaClockCounters t0_; HsaClockCounters t1_; double historical_clock_ratio_; // @brief Array of GPU cache property. std::vector cache_props_; // @brief Array of HSA cache objects. std::vector> caches_; // @brief Array of regions owned by this agent. std::vector regions_; core::Isa* isa_; // @brief HSA profile. hsa_profile_t profile_; bool is_kv_device_; void* trap_code_buf_; size_t trap_code_buf_size_; // @brief Mappings from doorbell index to queue, for trap handler. // Correlates with output of s_sendmsg(MSG_GET_DOORBELL) for queue identification. amd_queue_t** doorbell_queue_map_; // @brief The GPU memory bus width in bit. uint32_t memory_bus_width_; // @brief The GPU memory maximum frequency in MHz. uint32_t memory_max_frequency_; // @brief Enumeration index uint32_t enum_index_; // @brief HDP flush registers hsa_amd_hdp_flush_t HDP_flush_ = {nullptr, nullptr}; private: // @brief Query the driver to get the region list owned by this agent. void InitRegionList(); // @brief Reserve memory for scratch pool to be used by AQL queue of this // agent. void InitScratchPool(); // @brief Query the driver to get the cache properties. void InitCacheList(); // @brief Create internal queues and blits. void InitDma(); // @brief Setup GWS accessing queue. void InitGWS(); // @brief Setup NUMA aware system memory allocator. void InitNumaAllocator(); // @brief Register signal for notification when scratch may become available. // @p signal is notified by OR'ing with @p value. bool AddScratchNotifier(hsa_signal_t signal, hsa_signal_value_t value) { if (signal.handle != 0) return false; scratch_notifiers_[signal] = value; return true; } // @brief Deregister scratch notification signals. void ClearScratchNotifiers() { scratch_notifiers_.clear(); } // @brief Releases scratch back to the driver. // caller must hold scratch_lock_. void ReleaseScratch(void* base, size_t size, bool large); // Bind index of peer device that is connected via xGMI links lazy_ptr& GetXgmiBlit(const core::Agent& peer_agent); // Bind the Blit object that will drive the copy operation // across PCIe links (H2D or D2H) or is within same device D2D lazy_ptr& GetPcieBlit(const core::Agent& dst_agent, const core::Agent& src_agent); // Bind the Blit object that will drive the copy operation lazy_ptr& GetBlitObject(const core::Agent& dst_agent, const core::Agent& src_agent, const size_t size); // @brief Alternative aperture base address. Only on KV. uintptr_t ape1_base_; // @brief Alternative aperture size. Only on KV. size_t ape1_size_; // @brief Queue with GWS access. struct { lazy_ptr queue_; int ref_ct_; KernelMutex lock_; } gws_queue_; ScratchCache scratch_cache_; // System memory allocator in the nearest NUMA node. std::function system_allocator_; std::function system_deallocator_; DISALLOW_COPY_AND_ASSIGN(GpuAgent); }; } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_gpu_pm4.h000066400000000000000000000104751420110115200214670ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_AMD_GPU_PM4_H_ #define HSA_RUNTIME_CORE_INC_AMD_GPU_PM4_H_ #define PM4_HDR_IT_OPCODE_NOP 0x10 #define PM4_HDR_IT_OPCODE_INDIRECT_BUFFER 0x3F #define PM4_HDR_IT_OPCODE_RELEASE_MEM 0x49 #define PM4_HDR_IT_OPCODE_ACQUIRE_MEM 0x58 #define PM4_HDR_SHADER_TYPE(x) (((x) & 0x1) << 1) #define PM4_HDR_IT_OPCODE(x) (((x) & 0xFF) << 8) #define PM4_HDR_COUNT(x) (((x) & 0x3FFF) << 16) #define PM4_HDR_TYPE(x) (((x) & 0x3) << 30) #define PM4_HDR(it_opcode, pkt_size_dw, gfxip_ver) ( \ PM4_HDR_SHADER_TYPE((gfxip_ver) == 7 ? 1 : 0) | \ PM4_HDR_IT_OPCODE(it_opcode) | \ PM4_HDR_COUNT(pkt_size_dw - 2) | \ PM4_HDR_TYPE(3) \ ) #define PM4_INDIRECT_BUFFER_DW1_IB_BASE_LO(x) (((x) & 0x3FFFFFFF) << 2) #define PM4_INDIRECT_BUFFER_DW2_IB_BASE_HI(x) (((x) & 0xFFFF) << 0) #define PM4_INDIRECT_BUFFER_DW3_IB_SIZE(x) (((x) & 0xFFFFF) << 0) #define PM4_INDIRECT_BUFFER_DW3_IB_VALID(x) (((x) & 0x1) << 23) #define PM4_ACQUIRE_MEM_DW1_COHER_CNTL(x) (((x) & 0x7FFFFFFF) << 0) # define PM4_ACQUIRE_MEM_COHER_CNTL_TC_WB_ACTION_ENA (1 << 18) # define PM4_ACQUIRE_MEM_COHER_CNTL_TC_ACTION_ENA (1 << 23) # define PM4_ACQUIRE_MEM_COHER_CNTL_SH_KCACHE_ACTION_ENA (1 << 27) # define PM4_ACQUIRE_MEM_COHER_CNTL_SH_ICACHE_ACTION_ENA (1 << 29) #define PM4_ACQUIRE_MEM_DW2_COHER_SIZE(x) (((x) & 0xFFFFFFFF) << 0) #define PM4_ACQUIRE_MEM_DW3_COHER_SIZE_HI(x) (((x) & 0xFF) << 0) #define PM4_ACQUIRE_MEM_DW7_GCR_CNTL(x) (((x) & 0x7FFFF) << 0) # define PM4_ACQUIRE_MEM_GCR_CNTL_GLI_INV(x) (((x) & 0x3) << 0) # define PM4_ACQUIRE_MEM_GCR_CNTL_GLK_INV (1 << 7) # define PM4_ACQUIRE_MEM_GCR_CNTL_GLV_INV (1 << 8) # define PM4_ACQUIRE_MEM_GCR_CNTL_GL1_INV (1 << 9) # define PM4_ACQUIRE_MEM_GCR_CNTL_GL2_INV (1 << 14) #define PM4_RELEASE_MEM_DW1_EVENT_INDEX(x) (((x) & 0xF) << 8) # define PM4_RELEASE_MEM_EVENT_INDEX_AQL 0x7 #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_gpu_shaders.h000066400000000000000000001311751420110115200224210ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_AMD_GPU_SHADERS_H_ #define HSA_RUNTIME_CORE_INC_AMD_GPU_SHADERS_H_ namespace rocr { namespace AMD { static const unsigned int kCodeCopyAligned7[] = { 0xC0820100, 0xC0840104, 0xC0860108, 0xC088010C, 0xC08A0110, 0xC00C0114, 0xBF8C007F, 0x8F028602, 0x4A000002, 0x7E060205, 0xD24A6A02, 0x00000900, 0xD2506A03, 0x01A90103, 0x7E0A0207, 0xD24A6A04, 0x00000D00, 0xD2506A05, 0x01A90105, 0xD1C2006A, 0x00001102, 0xBF86000F, 0x87FE6A7E, 0xDC200000, 0x01000002, 0xBF8C0F70, 0xD24A6A02, 0x00003102, 0xD2506A03, 0x01A90103, 0xDC600000, 0x00000104, 0xD24A6A04, 0x00003104, 0xD2506A05, 0x01A90105, 0xBF82FFEE, 0xBEFE04C1, 0x8F198418, 0x34020084, 0x7E060209, 0xD24A6A02, 0x00001101, 0xD2506A03, 0x01A90103, 0x7E0A020B, 0xD24A6A04, 0x00001501, 0xD2506A05, 0x01A90105, 0xD1C2006A, 0x00001902, 0xBF86000E, 0xDC380000, 0x08000002, 0xD24A6A02, 0x00003302, 0xD2506A03, 0x01A90103, 0xBF8C0F70, 0xDC780000, 0x00000804, 0xD24A6A04, 0x00003304, 0xD2506A05, 0x01A90105, 0xBF82FFEF, 0x8F198218, 0x34020082, 0x7E06020D, 0xD24A6A02, 0x00001901, 0xD2506A03, 0x01A90103, 0x7E0A020F, 0xD24A6A04, 0x00001D01, 0xD2506A05, 0x01A90105, 0xD1C2006A, 0x00002102, 0xBF86000F, 0x87FE6A7E, 0xDC300000, 0x01000002, 0xD24A6A02, 0x00003302, 0xD2506A03, 0x01A90103, 0xBF8C0F70, 0xDC700000, 0x00000104, 0xD24A6A04, 0x00003304, 0xD2506A05, 0x01A90105, 0xBF82FFEE, 0xBEFE04C1, 0x7E060211, 0xD24A6A02, 0x00002100, 0xD2506A03, 0x01A90103, 0x7E0A0213, 0xD24A6A04, 0x00002500, 0xD2506A05, 0x01A90105, 0xD1C2006A, 0x00002902, 0xBF860006, 0x87FE6A7E, 0xDC200000, 0x01000002, 0xBF8C0F70, 0xDC600000, 0x00000104, 0xBF810000, }; static const unsigned int kCodeCopyMisaligned7[] = { 0xC0820100, 0xC0840104, 0xC0860108, 0xC008010C, 0xBF8C007F, 0x8F028602, 0x4A000002, 0x7E060205, 0xD24A6A02, 0x00000900, 0xD2506A03, 0x01A90103, 0x7E0A0207, 0xD24A6A04, 0x00000D00, 0xD2506A05, 0x01A90105, 0xD1C2006A, 0x00001102, 0xBF860032, 0xDC200000, 0x06000002, 0xD24A6A02, 0x00002102, 0xD2506A03, 0x01A90103, 0xDC200000, 0x07000002, 0xD24A6A02, 0x00002102, 0xD2506A03, 0x01A90103, 0xDC200000, 0x08000002, 0xD24A6A02, 0x00002102, 0xD2506A03, 0x01A90103, 0xDC200000, 0x09000002, 0xD24A6A02, 0x00002102, 0xD2506A03, 0x01A90103, 0xBF8C0F70, 0xDC600000, 0x00000604, 0xD24A6A04, 0x00002104, 0xD2506A05, 0x01A90105, 0xDC600000, 0x00000704, 0xD24A6A04, 0x00002104, 0xD2506A05, 0x01A90105, 0xDC600000, 0x00000804, 0xD24A6A04, 0x00002104, 0xD2506A05, 0x01A90105, 0xDC600000, 0x00000904, 0xD24A6A04, 0x00002104, 0xD2506A05, 0x01A90105, 0xBF82FFCB, 0x7E060209, 0xD24A6A02, 0x00001100, 0xD2506A03, 0x01A90103, 0x7E0A020B, 0xD24A6A04, 0x00001500, 0xD2506A05, 0x01A90105, 0xD1C2006A, 0x00001902, 0xBF86000F, 0x87FE6A7E, 0xDC200000, 0x01000002, 0xD24A6A02, 0x00002102, 0xD2506A03, 0x01A90103, 0xBF8C0F70, 0xDC600000, 0x00000104, 0xD24A6A04, 0x00002104, 0xD2506A05, 0x01A90105, 0xBF82FFEE, 0xBF810000, }; static const unsigned int kCodeFill7[] = { 0xC0820100, 0xC0840104, 0xBF8C007F, 0x8F028602, 0x4A000002, 0x7E08020A, 0x7E0A020A, 0x7E0C020A, 0x7E0E020A, 0x8F0C840B, 0x34020084, 0x7E060205, 0xD24A6A02, 0x00000901, 0xD2506A03, 0x01A90103, 0xD1C2006A, 0x00000D02, 0xBF860007, 0xDC780000, 0x00000402, 0xD24A6A02, 0x00001902, 0xD2506A03, 0x01A90103, 0xBF82FFF6, 0x8F0C820B, 0x34020082, 0x7E060207, 0xD24A6A02, 0x00000D01, 0xD2506A03, 0x01A90103, 0xD1C2006A, 0x00001102, 0xBF860008, 0x87FE6A7E, 0xDC700000, 0x00000402, 0xD24A6A02, 0x00001902, 0xD2506A03, 0x01A90103, 0xBF82FFF5, 0xBF810000, }; static const unsigned int kCodeTrapHandler8[] = { 0xC0061C80, 0x000000C0, 0xBF8C007F, 0xBEFE0181, 0x80728872, 0x82738073, 0x7E000272, 0x7E020273, 0x7E0402FF, 0x80000000, 0x7E060280, 0xDD800000, 0x00000200, 0xBF8C0F70, 0x7DD40500, 0xBF870011, 0xC0061D39, 0x00000008, 0xBF8C007F, 0x86F47474, 0xBF84000C, 0x80729072, 0x82738073, 0xC0021CB9, 0x00000000, 0xBF8C007F, 0x7E000274, 0x7E020275, 0x7E040272, 0xDC700000, 0x00000200, 0xBF8C0F70, 0xBF900001, 0xBF8D0001, 0xBE801F70, }; static const unsigned int kCodeTrapHandler9[] = { /* .set SQ_WAVE_PC_HI_ADDRESS_MASK , 0xFFFF .set SQ_WAVE_PC_HI_TRAP_ID_SHIFT , 16 .set SQ_WAVE_PC_HI_TRAP_ID_SIZE , 8 .set SQ_WAVE_PC_HI_TRAP_ID_BFE , (SQ_WAVE_PC_HI_TRAP_ID_SHIFT | (SQ_WAVE_PC_HI_TRAP_ID_SIZE << 16)) .set SQ_WAVE_PC_HI_HT_MASK , 0x1000000 .set SQ_WAVE_STATUS_HALT_BIT , 13 .set SQ_WAVE_STATUS_HALT_BFE , (SQ_WAVE_STATUS_HALT_BIT | (1 << 16)) .set SQ_WAVE_TRAPSTS_ADDRESS_WATCH_MASK , 0x7080 .set SQ_WAVE_TRAPSTS_MEM_VIOL_MASK , 0x100 .set SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK , 0x800 .set SQ_WAVE_TRAPSTS_XNACK_ERROR_MASK , 0x10000000 .set SQ_WAVE_MODE_DEBUG_EN_SHIFT , 11 .set SIGNAL_CODE_MEM_VIOL , (1 << 29) .set SIGNAL_CODE_ILLEGAL_INST , (1 << 30) .set SIGNAL_CODE_LLVM_TRAP , (1 << 31) .set MAX_NUM_DOORBELLS_MASK , ((1 << 10) - 1) .set SENDMSG_M0_DOORBELL_ID_BITS , 12 .set SENDMSG_M0_DOORBELL_ID_MASK , ((1 << SENDMSG_M0_DOORBELL_ID_BITS) - 1) .set TTMP7_DISPATCH_ID_CONVERTED_BIT , 31 .set TTMP7_WAVE_STOPPED_BIT , 30 .set TTMP7_SAVED_STATUS_HALT_BIT , 29 .set TTMP7_SAVED_TRAP_ID_SHIFT , 25 .set TTMP7_SAVED_TRAP_ID_BITS , 4 .set TTMP7_SAVED_TRAP_ID_MASK , ((1 << TTMP7_SAVED_TRAP_ID_BITS) - 1) .set TTMP7_PACKET_INDEX_BITS , 25 .set TTMP7_PACKET_INDEX_MASK , ((1 << TTMP7_PACKET_INDEX_BITS) - 1) .set TTMP11_PC_HI_SHIFT , 7 .if .amdgcn.gfx_generation_number == 9 .set DEBUG_INTERRUPT_CONTEXT_ID_BIT , 23 .set TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT , 26 .set SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT , 15 .set SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK , 0x1F8000 .elseif .amdgcn.gfx_generation_number == 10 .set DEBUG_INTERRUPT_CONTEXT_ID_BIT , 22 .set TTMP11_SAVE_REPLAY_W64H_SHIFT , 31 .set TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT , 24 .set SQ_WAVE_IB_STS_REPLAY_W64H_SHIFT , 25 .set SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT , 15 .set SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK , 0x3F8000 .set SQ_WAVE_IB_STS_REPLAY_W64H_MASK , 0x2000000 .else .error "unsupported target" .endif // ABI between first and second level trap handler: // ttmp0 = PC[31:0] // ttmp1 = 0[2:0], PCRewind[3:0], HostTrap[0], TrapId[7:0], PC[47:32] // ttmp12 = SQ_WAVE_STATUS // ttmp14 = TMA[31:0] // ttmp15 = TMA[63:32] // gfx9: // ttmp11 = SQ_WAVE_IB_STS[20:15], 0[18:0], NoScratch[0], WaveIdInWG[5:0] // gfx10: // ttmp11 = SQ_WAVE_IB_STS[25], SQ_WAVE_IB_STS[21:15], 0[16:0], NoScratch[0], WaveIdInWG[5:0] .macro mGetDoorbellId s_mov_b32 exec_lo, 0x80000000 s_sendmsg sendmsg(MSG_GET_DOORBELL) .wait_sendmsg_\@: s_nop 7 s_bitcmp0_b32 exec_lo, 0x1F s_cbranch_scc0 .wait_sendmsg_\@ .endm .macro mExitTrap // Restore SQ_WAVE_IB_STS. .if .amdgcn.gfx_generation_number == 9 s_lshr_b32 ttmp2, ttmp11, (TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT - SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT) s_and_b32 ttmp2, ttmp2, SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK s_setreg_b32 hwreg(HW_REG_IB_STS), ttmp2 .endif .if .amdgcn.gfx_generation_number == 10 s_lshr_b32 ttmp2, ttmp11, (TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT - SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT) s_and_b32 ttmp3, ttmp2, SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK s_lshr_b32 ttmp2, ttmp11, (TTMP11_SAVE_REPLAY_W64H_SHIFT - SQ_WAVE_IB_STS_REPLAY_W64H_SHIFT) s_and_b32 ttmp2, ttmp2, SQ_WAVE_IB_STS_REPLAY_W64H_MASK s_or_b32 ttmp2, ttmp2, ttmp3 s_setreg_b32 hwreg(HW_REG_IB_STS), ttmp2 .endif // Restore SQ_WAVE_STATUS. s_and_b64 exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32 s_and_b64 vcc, vcc, vcc // Restore STATUS.VCCZ, not writable by s_setreg_b32 s_setreg_b32 hwreg(HW_REG_STATUS), ttmp12 // Return to shader at unmodified PC. s_rfe_b64 [ttmp0, ttmp1] .endm trap_entry: s_andn2_b32 ttmp7, ttmp7, (TTMP7_SAVED_TRAP_ID_MASK << TTMP7_SAVED_TRAP_ID_SHIFT) | (1 << TTMP7_SAVED_STATUS_HALT_BIT) // Save the entry status.halt in ttmp7.saved_status_halt s_bfe_u32 ttmp2, ttmp12, SQ_WAVE_STATUS_HALT_BFE s_lshl_b32 ttmp2, ttmp2, TTMP7_SAVED_STATUS_HALT_BIT s_or_b32 ttmp7, ttmp7, ttmp2 // If trap raised (non-zero trap id) then branch. s_bfe_u32 ttmp2, ttmp1, SQ_WAVE_PC_HI_TRAP_ID_BFE s_cbranch_scc1 .trap_raised // If non-masked exception raised then branch. s_getreg_b32 ttmp2, hwreg(HW_REG_TRAPSTS) s_and_b32 ttmp3, ttmp2, (SQ_WAVE_TRAPSTS_MEM_VIOL_MASK | SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK) s_cbranch_scc1 .excp_raised .signal_debugger: // Fetch doorbell index for our queue. s_mov_b32 ttmp2, exec_lo s_mov_b32 ttmp3, exec_hi mGetDoorbellId s_mov_b32 exec_hi, ttmp3 // Restore exec_lo, move the doorbell_id into ttmp3 s_and_b32 ttmp3, exec_lo, SENDMSG_M0_DOORBELL_ID_MASK s_mov_b32 exec_lo, ttmp2 // Set the debug interrupt context id. // FIXME: Make conditional when exceptions are handled. s_bitset1_b32 ttmp3, DEBUG_INTERRUPT_CONTEXT_ID_BIT // Send an interrupt to trigger event notification. s_mov_b32 ttmp2, m0 s_mov_b32 m0, ttmp3 s_nop 0x0 // Manually inserted wait states s_sendmsg sendmsg(MSG_INTERRUPT) // Restore m0 s_mov_b32 m0, ttmp2 // Parking the wave requires saving the original pc in the preserved ttmps. // Since all ttmps are used, we must first free ttmp6 by compressing the // 40bit dispatch ptr in ttmp6:7 into a 25bit queue packet id. // // Register layout before parking the wave: // // ttmp6: dispatch_ptr[31:6] 0[5:0] // ttmp7: 0[0] wave_stopped[0] status_halt[0] trap_id[3:0] 0[16:0] dispatch_ptr[39:32] // ttmp11: 1st_level_ttmp11[31:23] 0[15:0] 1st_level_ttmp11[6:0] // // After parking the wave: // // ttmp6: pc_lo[31:0] // ttmp7: 1[0] wave_stopped[0] status_halt[0] trap_id[3:0] packet_id[24:0] // ttmp11: 1st_level_ttmp11[31:23] pc_hi[15:0] 1st_level_ttmp11[6:0] // // The conversion from dispatch ptr to queue packet index only needs to be // done once, the first time the wave executes the trap handler. .if ((.amdgcn.gfx_generation_number == 10 && .amdgcn.gfx_generation_minor >= 3) || .amdgcn.gfx_generation_number > 10) s_branch .halt_wave .else s_bitcmp1_b32 ttmp7, TTMP7_DISPATCH_ID_CONVERTED_BIT s_cbranch_scc1 .ttmp7_has_dispatch_index s_and_b32 ttmp3, ttmp3, MAX_NUM_DOORBELLS_MASK s_lshl_b32 ttmp3, ttmp3, 0x3 // Map doorbell index to amd_queue_t* through TMA (doorbell_queue_map). s_load_dwordx2 [ttmp2, ttmp3], [ttmp14, ttmp15], ttmp3 glc s_waitcnt lgkmcnt(0) // Retrieve queue base_address from hsa_queue_t*. s_load_dword ttmp2, [ttmp2, ttmp3], 0x8 glc s_waitcnt lgkmcnt(0) // The dispatch index is (dispatch_ptr.lo - base_address.lo) >> 6 s_sub_u32 ttmp2, ttmp6, ttmp2 s_lshr_b32 ttmp2, ttmp2, 0x6 s_andn2_b32 ttmp7, ttmp7, TTMP7_PACKET_INDEX_MASK s_or_b32 ttmp7, ttmp7, ttmp2 s_bitset1_b32 ttmp7, TTMP7_DISPATCH_ID_CONVERTED_BIT .ttmp7_has_dispatch_index: // Save the PC s_mov_b32 ttmp6, ttmp0 s_and_b32 ttmp1, ttmp1, SQ_WAVE_PC_HI_ADDRESS_MASK s_lshl_b32 ttmp1, ttmp1, TTMP11_PC_HI_SHIFT s_andn2_b32 ttmp11, ttmp11, (SQ_WAVE_PC_HI_ADDRESS_MASK << TTMP11_PC_HI_SHIFT) s_or_b32 ttmp11, ttmp11, ttmp1 // Park the wave s_getpc_b64 [ttmp0, ttmp1] s_add_u32 ttmp0, ttmp0, .parked - . s_addc_u32 ttmp1, ttmp1, 0x0 s_branch .halt_wave .parked: s_trap 0x2 s_branch .parked .endif .excp_raised: // If memory violation without XNACK error then signal queue error. // XNACK error will be handled by VM interrupt, since it has more information. s_and_b32 ttmp3, ttmp2, (SQ_WAVE_TRAPSTS_MEM_VIOL_MASK | SQ_WAVE_TRAPSTS_XNACK_ERROR_MASK) s_cmp_eq_u32 ttmp3, SQ_WAVE_TRAPSTS_MEM_VIOL_MASK s_mov_b32 ttmp3, SIGNAL_CODE_MEM_VIOL s_cbranch_scc1 .signal_error // If illegal instruction then signal queue error. s_and_b32 ttmp3, ttmp2, SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK s_mov_b32 ttmp3, SIGNAL_CODE_ILLEGAL_INST s_cbranch_scc1 .signal_error // Otherwise (memory violation with XNACK error) return to shader. Do not // send a signal as that will cause an interrupt storm. Instead let the // interrupt generated by the TLB miss cause the kernel to notify ROCr and // put the queue into an error state. This also ensures the TLB interrupt // is received which provides information about the page causing the fault. s_branch .halt_wave .trap_raised: // Save the entry trap id in ttmp7.saved_trap_id s_min_u32 ttmp3, ttmp2, 0xF s_lshl_b32 ttmp3, ttmp3, TTMP7_SAVED_TRAP_ID_SHIFT s_or_b32 ttmp7, ttmp7, ttmp3 // If debugger trap (s_trap >= 3) then signal debugger. s_cmp_ge_u32 ttmp2, 0x3; s_cbranch_scc1 .signal_debugger // If llvm.trap (s_trap 2) then signal queue error. s_cmp_eq_u32 ttmp2, 0x2 s_mov_b32 ttmp3, SIGNAL_CODE_LLVM_TRAP s_cbranch_scc1 .signal_error // For other traps advance PC and return to shader. s_add_u32 ttmp0, ttmp0, 0x4 s_addc_u32 ttmp1, ttmp1, 0x0 s_branch .exit_trap .signal_error: .if (.amdgcn.gfx_generation_number == 10 && .amdgcn.gfx_generation_minor >= 3) // This needs to be rewritten for gfx10.3 as scalar stores are not available. .else // FIXME: don't trash ttmp4/ttmp5 when exception handling is unified. s_mov_b32 ttmp4, ttmp3 // Fetch doorbell index for our queue. s_mov_b32 ttmp2, exec_lo s_mov_b32 ttmp3, exec_hi mGetDoorbellId s_mov_b32 exec_hi, ttmp3 // Restore exec_lo, move the doorbell index into ttmp3 s_and_b32 exec_lo, exec_lo, MAX_NUM_DOORBELLS_MASK s_lshl_b32 ttmp3, exec_lo, 0x3 s_mov_b32 exec_lo, ttmp2 // Map doorbell index to amd_queue_t* through TMA (doorbell_queue_map). s_load_dwordx2 [ttmp2, ttmp3], [ttmp14, ttmp15], ttmp3 glc s_waitcnt lgkmcnt(0) // Retrieve queue_inactive_signal from amd_queue_t*. s_load_dwordx2 [ttmp2, ttmp3], [ttmp2, ttmp3], 0xC0 glc s_waitcnt lgkmcnt(0) // Set queue signal value to error code. s_mov_b32 ttmp5, 0x0 s_atomic_swap_x2 [ttmp4, ttmp5], [ttmp2, ttmp3], 0x8 glc s_waitcnt lgkmcnt(0) // Skip event trigger if the signal value was already non-zero. s_or_b32 ttmp4, ttmp4, ttmp5 s_cbranch_scc1 .skip_event_trigger // Check for a non-NULL signal event mailbox. s_load_dwordx2 [ttmp4, ttmp5], [ttmp2, ttmp3], 0x10 glc s_waitcnt lgkmcnt(0) s_and_b64 [ttmp4, ttmp5], [ttmp4, ttmp5], [ttmp4, ttmp5] s_cbranch_scc0 .skip_event_trigger // Load the signal event value. s_load_dword ttmp2, [ttmp2, ttmp3], 0x18 glc s_waitcnt lgkmcnt(0) // Write the signal event value to the mailbox. s_store_dword ttmp2, [ttmp4, ttmp5], 0x0 glc s_waitcnt lgkmcnt(0) // Send an interrupt to trigger event notification. s_mov_b32 m0, 0x0 s_nop 0 s_sendmsg sendmsg(MSG_INTERRUPT) .endif .skip_event_trigger: // Since we trashed ttmp4/ttmp5, reset the wave_id to 0 s_mov_b32 ttmp4, 0x0 s_mov_b32 ttmp5, 0x0 .halt_wave: s_bitset1_b32 ttmp7, TTMP7_WAVE_STOPPED_BIT // Halt the wavefront. s_bitset1_b32 ttmp12, SQ_WAVE_STATUS_HALT_BIT .exit_trap: mExitTrap */ 0x8973ff73, 0x3e000000, 0x92eeff78, 0x0001000d, 0x8e6e9d6e, 0x87736e73, 0x92eeff6d, 0x00080010, 0xbf850041, 0xb8eef803, 0x866fff6e, 0x00000900, 0xbf850031, 0xbeee007e, 0xbeef007f, 0xbefe00ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff006f, 0x866fff7e, 0x00000fff, 0xbefe006e, 0xbeef1a97, 0xbeee007c, 0xbefc006f, 0xbf800000, 0xbf900001, 0xbefc006e, 0xbf0d9f73, 0xbf85000f, 0x866fff6f, 0x000003ff, 0x8e6f836f, 0xc0051bbd, 0x0000006f, 0xbf8cc07f, 0xc0031bb7, 0x00000008, 0xbf8cc07f, 0x80ee6e72, 0x8f6e866e, 0x8973ff73, 0x01ffffff, 0x87736e73, 0xbef31a9f, 0xbef2006c, 0x866dff6d, 0x0000ffff, 0x8e6d876d, 0x8977ff77, 0x007fff80, 0x87776d77, 0xbeec1c00, 0x806cff6c, 0x00000010, 0x826d806d, 0xbf820044, 0xbf920002, 0xbf82fffe, 0x866fff6e, 0x10000100, 0xbf06ff6f, 0x00000100, 0xbeef00ff, 0x20000000, 0xbf850011, 0x866fff6e, 0x00000800, 0xbeef00f4, 0xbf85000d, 0xbf820036, 0x83ef8f6e, 0x8e6f996f, 0x87736f73, 0xbf09836e, 0xbf85ffbe, 0xbf06826e, 0xbeef00ff, 0x80000000, 0xbf850003, 0x806c846c, 0x826d806d, 0xbf82002c, 0xbef0006f, 0xbeee007e, 0xbeef007f, 0xbefe00ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff006f, 0x867eff7e, 0x000003ff, 0x8e6f837e, 0xbefe006e, 0xc0051bbd, 0x0000006f, 0xbf8cc07f, 0xc0071bb7, 0x000000c0, 0xbf8cc07f, 0xbef10080, 0xc2831c37, 0x00000008, 0xbf8cc07f, 0x87707170, 0xbf85000e, 0xc0071c37, 0x00000010, 0xbf8cc07f, 0x86f07070, 0xbf840009, 0xc0031bb7, 0x00000018, 0xbf8cc07f, 0xc0431bb8, 0x00000000, 0xbf8cc07f, 0xbefc0080, 0xbf800000, 0xbf900001, 0xbef00080, 0xbef10080, 0xbef31a9e, 0xbef81a8d, 0x8f6e8b77, 0x866eff6e, 0x001f8000, 0xb96ef807, 0x86fe7e7e, 0x86ea6a6a, 0xb978f802, 0xbe801f6c, }; static const unsigned int kCodeTrapHandler90a[] = { 0x8973ff73, 0x3e000000, 0x92eeff78, 0x0001000d, 0x8e6e9d6e, 0x87736e73, 0x92eeff6d, 0x00080010, 0xbf850041, 0xb8eef803, 0x866fff6e, 0x00000900, 0xbf850031, 0xbeee007e, 0xbeef007f, 0xbefe00ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff006f, 0x866fff7e, 0x00000fff, 0xbefe006e, 0xbeef1a97, 0xbeee007c, 0xbefc006f, 0xbf800000, 0xbf900001, 0xbefc006e, 0xbf0d9f73, 0xbf85000f, 0x866fff6f, 0x000003ff, 0x8e6f836f, 0xc0051bbd, 0x0000006f, 0xbf8cc07f, 0xc0031bb7, 0x00000008, 0xbf8cc07f, 0x80ee6e72, 0x8f6e866e, 0x8973ff73, 0x01ffffff, 0x87736e73, 0xbef31a9f, 0xbef2006c, 0x866dff6d, 0x0000ffff, 0x8e6d876d, 0x8977ff77, 0x007fff80, 0x87776d77, 0xbeec1c00, 0x806cff6c, 0x00000010, 0x826d806d, 0xbf820044, 0xbf920002, 0xbf82fffe, 0x866fff6e, 0x10000100, 0xbf06ff6f, 0x00000100, 0xbeef00ff, 0x20000000, 0xbf850011, 0x866fff6e, 0x00000800, 0xbeef00f4, 0xbf85000d, 0xbf820036, 0x83ef8f6e, 0x8e6f996f, 0x87736f73, 0xbf09836e, 0xbf85ffbe, 0xbf06826e, 0xbeef00ff, 0x80000000, 0xbf850003, 0x806c846c, 0x826d806d, 0xbf82002c, 0xbef0006f, 0xbeee007e, 0xbeef007f, 0xbefe00ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff006f, 0x867eff7e, 0x000003ff, 0x8e6f837e, 0xbefe006e, 0xc0051bbd, 0x0000006f, 0xbf8cc07f, 0xc0071bb7, 0x000000c0, 0xbf8cc07f, 0xbef10080, 0xc2831c37, 0x00000008, 0xbf8cc07f, 0x87707170, 0xbf85000e, 0xc0071c37, 0x00000010, 0xbf8cc07f, 0x86f07070, 0xbf840009, 0xc0031bb7, 0x00000018, 0xbf8cc07f, 0xc0431bb8, 0x00000000, 0xbf8cc07f, 0xbefc0080, 0xbf800000, 0xbf900001, 0xbef00080, 0xbef10080, 0xbef31a9e, 0xbef81a8d, 0x8f6e8b77, 0x866eff6e, 0x001f8000, 0xb96ef807, 0x86fe7e7e, 0x86ea6a6a, 0xb978f802, 0xbe801f6c, }; static const unsigned int kCodeCopyAligned8[] = { 0xC00A0100, 0x00000000, 0xC00A0200, 0x00000010, 0xC00A0300, 0x00000020, 0xC00A0400, 0x00000030, 0xC00A0500, 0x00000040, 0xC0020600, 0x00000050, 0xBF8C007F, 0x8E028602, 0x32000002, 0x7E060205, 0xD1196A02, 0x00000900, 0xD11C6A03, 0x01A90103, 0x7E0A0207, 0xD1196A04, 0x00000D00, 0xD11C6A05, 0x01A90105, 0xD0E9006A, 0x00001102, 0xBF86000F, 0x86FE6A7E, 0xDC400000, 0x01000002, 0xBF8C0F70, 0xD1196A02, 0x00003102, 0xD11C6A03, 0x01A90103, 0xDC600000, 0x00000104, 0xD1196A04, 0x00003104, 0xD11C6A05, 0x01A90105, 0xBF82FFEE, 0xBEFE01C1, 0x8E198418, 0x24020084, 0x7E060209, 0xD1196A02, 0x00001101, 0xD11C6A03, 0x01A90103, 0x7E0A020B, 0xD1196A04, 0x00001501, 0xD11C6A05, 0x01A90105, 0xD0E9006A, 0x00001902, 0xBF86000E, 0xDC5C0000, 0x08000002, 0xD1196A02, 0x00003302, 0xD11C6A03, 0x01A90103, 0xBF8C0F70, 0xDC7C0000, 0x00000804, 0xD1196A04, 0x00003304, 0xD11C6A05, 0x01A90105, 0xBF82FFEF, 0x8E198218, 0x24020082, 0x7E06020D, 0xD1196A02, 0x00001901, 0xD11C6A03, 0x01A90103, 0x7E0A020F, 0xD1196A04, 0x00001D01, 0xD11C6A05, 0x01A90105, 0xD0E9006A, 0x00002102, 0xBF86000F, 0x86FE6A7E, 0xDC500000, 0x01000002, 0xD1196A02, 0x00003302, 0xD11C6A03, 0x01A90103, 0xBF8C0F70, 0xDC700000, 0x00000104, 0xD1196A04, 0x00003304, 0xD11C6A05, 0x01A90105, 0xBF82FFEE, 0xBEFE01C1, 0x7E060211, 0xD1196A02, 0x00002100, 0xD11C6A03, 0x01A90103, 0x7E0A0213, 0xD1196A04, 0x00002500, 0xD11C6A05, 0x01A90105, 0xD0E9006A, 0x00002902, 0xBF860006, 0x86FE6A7E, 0xDC400000, 0x01000002, 0xBF8C0F70, 0xDC600000, 0x00000104, 0xBF810000, }; static const unsigned int kCodeCopyMisaligned8[] = { 0xC00A0100, 0x00000000, 0xC00A0200, 0x00000010, 0xC00A0300, 0x00000020, 0xC0020400, 0x00000030, 0xBF8C007F, 0x8E028602, 0x32000002, 0x7E060205, 0xD1196A02, 0x00000900, 0xD11C6A03, 0x01A90103, 0x7E0A0207, 0xD1196A04, 0x00000D00, 0xD11C6A05, 0x01A90105, 0xD0E9006A, 0x00001102, 0xBF860032, 0xDC400000, 0x06000002, 0xD1196A02, 0x00002102, 0xD11C6A03, 0x01A90103, 0xDC400000, 0x07000002, 0xD1196A02, 0x00002102, 0xD11C6A03, 0x01A90103, 0xDC400000, 0x08000002, 0xD1196A02, 0x00002102, 0xD11C6A03, 0x01A90103, 0xDC400000, 0x09000002, 0xD1196A02, 0x00002102, 0xD11C6A03, 0x01A90103, 0xBF8C0F70, 0xDC600000, 0x00000604, 0xD1196A04, 0x00002104, 0xD11C6A05, 0x01A90105, 0xDC600000, 0x00000704, 0xD1196A04, 0x00002104, 0xD11C6A05, 0x01A90105, 0xDC600000, 0x00000804, 0xD1196A04, 0x00002104, 0xD11C6A05, 0x01A90105, 0xDC600000, 0x00000904, 0xD1196A04, 0x00002104, 0xD11C6A05, 0x01A90105, 0xBF82FFCB, 0x7E060209, 0xD1196A02, 0x00001100, 0xD11C6A03, 0x01A90103, 0x7E0A020B, 0xD1196A04, 0x00001500, 0xD11C6A05, 0x01A90105, 0xD0E9006A, 0x00001902, 0xBF86000F, 0x86FE6A7E, 0xDC400000, 0x01000002, 0xD1196A02, 0x00002102, 0xD11C6A03, 0x01A90103, 0xBF8C0F70, 0xDC600000, 0x00000104, 0xD1196A04, 0x00002104, 0xD11C6A05, 0x01A90105, 0xBF82FFEE, 0xBF810000, }; static const unsigned int kCodeFill8[] = { 0xC00A0100, 0x00000000, 0xC00A0200, 0x00000010, 0xBF8C007F, 0x8E028602, 0x32000002, 0x7E08020A, 0x7E0A020A, 0x7E0C020A, 0x7E0E020A, 0x8E0C840B, 0x24020084, 0x7E060205, 0xD1196A02, 0x00000901, 0xD11C6A03, 0x01A90103, 0xD0E9006A, 0x00000D02, 0xBF860007, 0xDC7C0000, 0x00000402, 0xD1196A02, 0x00001902, 0xD11C6A03, 0x01A90103, 0xBF82FFF6, 0x8E0C820B, 0x24020082, 0x7E060207, 0xD1196A02, 0x00000D01, 0xD11C6A03, 0x01A90103, 0xD0E9006A, 0x00001102, 0xBF860008, 0x86FE6A7E, 0xDC700000, 0x00000402, 0xD1196A02, 0x00001902, 0xD11C6A03, 0x01A90103, 0xBF82FFF5, 0xBF810000, }; static const unsigned int kCodeCopyAligned10[] = { 0xF4080100, 0xFA000000, 0xF4080200, 0xFA000010, 0xF4080300, 0xFA000020, 0xF4080400, 0xFA000030, 0xF4080500, 0xFA000040, 0xF4000600, 0xFA000050, 0xBF8CC07F, 0x8F028602, 0xD70F6A00, 0x00020002, 0x7E060205, 0xD70F6A02, 0x00020004, 0xD5286A03, 0x01A90103, 0x7E0A0207, 0xD70F6A04, 0x00020006, 0xD5286A05, 0x01A90105, 0xD4E1006A, 0x00001102, 0xBF86000F, 0x87FE6A7E, 0xDC200000, 0x017D0002, 0xBF8C3F70, 0xD70F6A02, 0x00020418, 0xD5286A03, 0x01A90103, 0xDC600000, 0x007D0104, 0xD70F6A04, 0x00020818, 0xD5286A05, 0x01A90105, 0xBF82FFEE, 0xBEFE04C1, 0x8F198418, 0x34020084, 0x7E060209, 0xD70F6A02, 0x00020208, 0xD5286A03, 0x01A90103, 0x7E0A020B, 0xD70F6A04, 0x0002020A, 0xD5286A05, 0x01A90105, 0xD4E1006A, 0x00001902, 0xBF86000E, 0xDC380000, 0x087D0002, 0xD70F6A02, 0x00020419, 0xD5286A03, 0x01A90103, 0xBF8C3F70, 0xDC780000, 0x007D0804, 0xD70F6A04, 0x00020819, 0xD5286A05, 0x01A90105, 0xBF82FFEF, 0x8F198218, 0x34020082, 0x7E06020D, 0xD70F6A02, 0x0002020C, 0xD5286A03, 0x01A90103, 0x7E0A020F, 0xD70F6A04, 0x0002020E, 0xD5286A05, 0x01A90105, 0xD4E1006A, 0x00002102, 0xBF86000F, 0x87FE6A7E, 0xDC300000, 0x017D0002, 0xD70F6A02, 0x00020419, 0xD5286A03, 0x01A90103, 0xBF8C3F70, 0xDC700000, 0x007D0104, 0xD70F6A04, 0x00020819, 0xD5286A05, 0x01A90105, 0xBF82FFEE, 0xBEFE04C1, 0x7E060211, 0xD70F6A02, 0x00020010, 0xD5286A03, 0x01A90103, 0x7E0A0213, 0xD70F6A04, 0x00020012, 0xD5286A05, 0x01A90105, 0xD4E1006A, 0x00002902, 0xBF860006, 0x87FE6A7E, 0xDC200000, 0x017D0002, 0xBF8C3F70, 0xDC600000, 0x007D0104, 0xBF810000, }; static const unsigned int kCodeCopyMisaligned10[] = { 0xF4080100, 0xFA000000, 0xF4080200, 0xFA000010, 0xF4080300, 0xFA000020, 0xF4000400, 0xFA000030, 0xBF8CC07F, 0x8F028602, 0xD70F6A00, 0x00020002, 0x7E060205, 0xD70F6A02, 0x00020004, 0xD5286A03, 0x01A90103, 0x7E0A0207, 0xD70F6A04, 0x00020006, 0xD5286A05, 0x01A90105, 0xD4E1006A, 0x00001102, 0xBF860032, 0xDC200000, 0x067D0002, 0xD70F6A02, 0x00020410, 0xD5286A03, 0x01A90103, 0xDC200000, 0x077D0002, 0xD70F6A02, 0x00020410, 0xD5286A03, 0x01A90103, 0xDC200000, 0x087D0002, 0xD70F6A02, 0x00020410, 0xD5286A03, 0x01A90103, 0xDC200000, 0x097D0002, 0xD70F6A02, 0x00020410, 0xD5286A03, 0x01A90103, 0xBF8C3F70, 0xDC600000, 0x007D0604, 0xD70F6A04, 0x00020810, 0xD5286A05, 0x01A90105, 0xDC600000, 0x007D0704, 0xD70F6A04, 0x00020810, 0xD5286A05, 0x01A90105, 0xDC600000, 0x007D0804, 0xD70F6A04, 0x00020810, 0xD5286A05, 0x01A90105, 0xDC600000, 0x007D0904, 0xD70F6A04, 0x00020810, 0xD5286A05, 0x01A90105, 0xBF82FFCB, 0x7E060209, 0xD70F6A02, 0x00020008, 0xD5286A03, 0x01A90103, 0x7E0A020B, 0xD70F6A04, 0x0002000A, 0xD5286A05, 0x01A90105, 0xD4E1006A, 0x00001902, 0xBF86000F, 0x87FE6A7E, 0xDC200000, 0x017D0002, 0xD70F6A02, 0x00020410, 0xD5286A03, 0x01A90103, 0xBF8C3F70, 0xDC600000, 0x007D0104, 0xD70F6A04, 0x00020810, 0xD5286A05, 0x01A90105, 0xBF82FFEE, 0xBF810000, }; static const unsigned int kCodeFill10[] = { 0xF4080100, 0xFA000000, 0xF4080200, 0xFA000010, 0xBF8CC07F, 0x8F028602, 0xD70F6A00, 0x00020002, 0x7E08020A, 0x7E0A020A, 0x7E0C020A, 0x7E0E020A, 0x8F0C840B, 0x34020084, 0x7E060205, 0xD70F6A02, 0x00020204, 0xD5286A03, 0x01A90103, 0xD4E1006A, 0x00000D02, 0xBF860007, 0xDC780000, 0x007D0402, 0xD70F6A02, 0x0002040C, 0xD5286A03, 0x01A90103, 0xBF82FFF6, 0x8F0C820B, 0x34020082, 0x7E060207, 0xD70F6A02, 0x00020206, 0xD5286A03, 0x01A90103, 0xD4E1006A, 0x00001102, 0xBF860008, 0x87FE6A7E, 0xDC700000, 0x007D0402, 0xD70F6A02, 0x0002040C, 0xD5286A03, 0x01A90103, 0xBF82FFF5, 0xBF810000, }; static const unsigned int kCodeTrapHandler1010[] = { 0x8a73ff73, 0x3e000000, 0x93eeff78, 0x0001000d, 0x8f6e9d6e, 0x88736e73, 0x93eeff6d, 0x00080010, 0xbf850041, 0xb96ef803, 0x876fff6e, 0x00000900, 0xbf850031, 0xbeee037e, 0xbeef037f, 0xbefe03ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff036f, 0x876fff7e, 0x00000fff, 0xbefe036e, 0xbeef1d96, 0xbeee037c, 0xbefc036f, 0xbf800000, 0xbf900001, 0xbefc036e, 0xbf0d9f73, 0xbf85000f, 0x876fff6f, 0x000003ff, 0x8f6f836f, 0xf4051bbd, 0xde000000, 0xbf8cc07f, 0xf4011bb7, 0xfa000008, 0xbf8cc07f, 0x80ee6e72, 0x906e866e, 0x8a73ff73, 0x01ffffff, 0x88736e73, 0xbef31d9f, 0xbef2036c, 0x876dff6d, 0x0000ffff, 0x8f6d876d, 0x8a77ff77, 0x007fff80, 0x88776d77, 0xbeec1f00, 0x806cff6c, 0x00000010, 0x826d806d, 0xbf820044, 0xbf920002, 0xbf82fffe, 0x876fff6e, 0x10000100, 0xbf06ff6f, 0x00000100, 0xbeef03ff, 0x20000000, 0xbf850011, 0x876fff6e, 0x00000800, 0xbeef03f4, 0xbf85000d, 0xbf820036, 0x83ef8f6e, 0x8f6f996f, 0x88736f73, 0xbf09836e, 0xbf85ffbe, 0xbf06826e, 0xbeef03ff, 0x80000000, 0xbf850003, 0x806c846c, 0x826d806d, 0xbf82002c, 0xbef0036f, 0xbeee037e, 0xbeef037f, 0xbefe03ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff036f, 0x877eff7e, 0x000003ff, 0x8f6f837e, 0xbefe036e, 0xf4051bbd, 0xde000000, 0xbf8cc07f, 0xf4051bb7, 0xfa0000c0, 0xbf8cc07f, 0xbef10380, 0xf6811c37, 0xfa000008, 0xbf8cc07f, 0x88707170, 0xbf85000e, 0xf4051c37, 0xfa000010, 0xbf8cc07f, 0x87f07070, 0xbf840009, 0xf4011bb7, 0xfa000018, 0xbf8cc07f, 0xf4411bb8, 0xfa000000, 0xbf8cc07f, 0xbefc0380, 0xbf800000, 0xbf900001, 0xbef00380, 0xbef10380, 0xbef31d9e, 0xbef81d8d, 0x906e8977, 0x876fff6e, 0x003f8000, 0x906e8677, 0x876eff6e, 0x02000000, 0x886e6f6e, 0xb9eef807, 0x87fe7e7e, 0x87ea6a6a, 0xb9f8f802, 0xbe80226c, }; static const unsigned int kCodeTrapHandler10[] = { 0x8a73ff73, 0x3e000000, 0x93eeff78, 0x0001000d, 0x8f6e9d6e, 0x88736e73, 0x93eeff6d, 0x00080010, 0xbf850023, 0xb96ef803, 0x876fff6e, 0x00000900, 0xbf850013, 0xbeee037e, 0xbeef037f, 0xbefe03ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff036f, 0x876fff7e, 0x00000fff, 0xbefe036e, 0xbeef1d96, 0xbeee037c, 0xbefc036f, 0xbf800000, 0xbf900001, 0xbefc036e, 0xbf82001a, 0x876fff6e, 0x10000100, 0xbf06ff6f, 0x00000100, 0xbeef03ff, 0x20000000, 0xbf850011, 0x876fff6e, 0x00000800, 0xbeef03f4, 0xbf85000d, 0xbf82000e, 0x83ef8f6e, 0x8f6f996f, 0x88736f73, 0xbf09836e, 0xbf85ffdc, 0xbf06826e, 0xbeef03ff, 0x80000000, 0xbf850003, 0x806c846c, 0x826d806d, 0xbf820004, 0xbef00380, 0xbef10380, 0xbef31d9e, 0xbef81d8d, 0x906e8977, 0x876fff6e, 0x003f8000, 0x906e8677, 0x876eff6e, 0x02000000, 0x886e6f6e, 0xb9eef807, 0x87fe7e7e, 0x87ea6a6a, 0xb9f8f802, 0xbe80226c, }; /* .set SQ_WAVE_PC_HI_ADDRESS_MASK , 0xFFFF .set SQ_WAVE_PC_HI_HT_SHIFT , 24 .set SQ_WAVE_PC_HI_TRAP_ID_SHIFT , 16 .set SQ_WAVE_PC_HI_TRAP_ID_SIZE , 8 .set SQ_WAVE_PC_HI_TRAP_ID_BFE , (SQ_WAVE_PC_HI_TRAP_ID_SHIFT | (SQ_WAVE_PC_HI_TRAP_ID_SIZE << 16)) .set SQ_WAVE_STATUS_HALT_SHIFT , 13 .set SQ_WAVE_STATUS_HALT_BFE , (SQ_WAVE_STATUS_HALT_SHIFT | (1 << 16)) .set SQ_WAVE_TRAPSTS_MEM_VIOL_SHIFT , 8 .set SQ_WAVE_TRAPSTS_ILLEGAL_INST_SHIFT , 11 .set SQ_WAVE_TRAPSTS_XNACK_ERROR_SHIFT , 28 .set SQ_WAVE_TRAPSTS_MATH_EXCP , 0x7F .set SQ_WAVE_MODE_EXCP_EN_SHIFT , 12 .set TRAP_ID_ABORT , 2 .set TRAP_ID_DEBUGTRAP , 3 .set DOORBELL_ID_SIZE , 10 .set DOORBELL_ID_MASK , ((1 << DOORBELL_ID_SIZE) - 1) .set EC_QUEUE_WAVE_ABORT_M0 , (1 << (DOORBELL_ID_SIZE + 0)) .set EC_QUEUE_WAVE_TRAP_M0 , (1 << (DOORBELL_ID_SIZE + 1)) .set EC_QUEUE_WAVE_MATH_ERROR_M0 , (1 << (DOORBELL_ID_SIZE + 2)) .set EC_QUEUE_WAVE_ILLEGAL_INSTRUCTION_M0 , (1 << (DOORBELL_ID_SIZE + 3)) .set EC_QUEUE_WAVE_MEMORY_VIOLATION_M0 , (1 << (DOORBELL_ID_SIZE + 4)) .set EC_QUEUE_WAVE_APERTURE_VIOLATION_M0 , (1 << (DOORBELL_ID_SIZE + 5)) .set TTMP6_WAVE_STOPPED_SHIFT , 30 .set TTMP6_SAVED_STATUS_HALT_SHIFT , 29 .set TTMP6_SAVED_STATUS_HALT_MASK , (1 << TTMP6_SAVED_STATUS_HALT_SHIFT) .set TTMP6_SAVED_TRAP_ID_SHIFT , 25 .set TTMP6_SAVED_TRAP_ID_SIZE , 4 .set TTMP6_SAVED_TRAP_ID_MASK , (((1 << TTMP6_SAVED_TRAP_ID_SIZE) - 1) << TTMP6_SAVED_TRAP_ID_SHIFT) .set TTMP6_SAVED_TRAP_ID_BFE , (TTMP6_SAVED_TRAP_ID_SHIFT | (TTMP6_SAVED_TRAP_ID_SIZE << 16)) .set TTMP11_PC_HI_SHIFT , 7 .set TTMP11_DEBUG_ENABLED_SHIFT , 23 .if .amdgcn.gfx_generation_number == 9 .set TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT , 26 .set SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT , 15 .set SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK , 0x1F8000 .elseif .amdgcn.gfx_generation_number == 10 && .amdgcn.gfx_generation_minor < 3 .set TTMP11_SAVE_REPLAY_W64H_SHIFT , 31 .set TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT , 24 .set SQ_WAVE_IB_STS_REPLAY_W64H_SHIFT , 25 .set SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT , 15 .set SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK , 0x3F8000 .set SQ_WAVE_IB_STS_REPLAY_W64H_MASK , 0x2000000 .endif // ABI between first and second level trap handler: // ttmp0 = PC[31:0] // ttmp12 = SQ_WAVE_STATUS // ttmp14 = TMA[31:0] // ttmp15 = TMA[63:32] // gfx9: // ttmp1 = 0[2:0], PCRewind[3:0], HostTrap[0], TrapId[7:0], PC[47:32] // ttmp11 = SQ_WAVE_IB_STS[20:15], 0[1:0], DebugEnabled[0], 0[15:0], NoScratch[0], WaveIdInWG[5:0] // gfx10: // ttmp1 = 0[0], PCRewind[5:0], HostTrap[0], TrapId[7:0], PC[47:32] // gfx1010: // ttmp11 = SQ_WAVE_IB_STS[25], SQ_WAVE_IB_STS[21:15], DebugEnabled[0], 0[15:0], NoScratch[0], WaveIdInWG[5:0] // gfx1030: // ttmp11 = 0[7:0], DebugEnabled[0], 0[15:0], NoScratch[0], WaveIdInWG[5:0] trap_entry: // Branch if not a trap (an exception instead). s_bfe_u32 ttmp2, ttmp1, SQ_WAVE_PC_HI_TRAP_ID_BFE s_cbranch_scc0 .no_skip_debugtrap // If caused by s_trap then advance PC. s_bitcmp1_b32 ttmp1, SQ_WAVE_PC_HI_HT_SHIFT s_cbranch_scc1 .not_s_trap s_add_u32 ttmp0, ttmp0, 0x4 s_addc_u32 ttmp1, ttmp1, 0x0 .not_s_trap: // If llvm.debugtrap and debugger is not attached. s_cmp_eq_u32 ttmp2, TRAP_ID_DEBUGTRAP s_cbranch_scc0 .no_skip_debugtrap s_bitcmp0_b32 ttmp11, TTMP11_DEBUG_ENABLED_SHIFT s_cbranch_scc0 .no_skip_debugtrap // Ignore llvm.debugtrap. s_branch .exit_trap .no_skip_debugtrap: // Save trap id and halt status in ttmp6. s_andn2_b32 ttmp6, ttmp6, (TTMP6_SAVED_TRAP_ID_MASK | TTMP6_SAVED_STATUS_HALT_MASK) s_min_u32 ttmp2, ttmp2, 0xF s_lshl_b32 ttmp2, ttmp2, TTMP6_SAVED_TRAP_ID_SHIFT s_or_b32 ttmp6, ttmp6, ttmp2 s_bfe_u32 ttmp2, ttmp12, SQ_WAVE_STATUS_HALT_BFE s_lshl_b32 ttmp2, ttmp2, TTMP6_SAVED_STATUS_HALT_SHIFT s_or_b32 ttmp6, ttmp6, ttmp2 // Fetch doorbell id for our queue. s_mov_b32 ttmp2, exec_lo s_mov_b32 ttmp3, exec_hi s_mov_b32 exec_lo, 0x80000000 s_sendmsg sendmsg(MSG_GET_DOORBELL) .wait_sendmsg: s_nop 0x7 s_bitcmp0_b32 exec_lo, 0x1F s_cbranch_scc0 .wait_sendmsg s_mov_b32 exec_hi, ttmp3 // Restore exec_lo, move the doorbell_id into ttmp3 s_and_b32 ttmp3, exec_lo, DOORBELL_ID_MASK s_mov_b32 exec_lo, ttmp2 // Map trap reason to an exception code. s_getreg_b32 ttmp2, hwreg(HW_REG_TRAPSTS) s_bitcmp1_b32 ttmp2, SQ_WAVE_TRAPSTS_XNACK_ERROR_SHIFT s_cbranch_scc0 .not_memory_violation s_or_b32 ttmp3, ttmp3, EC_QUEUE_WAVE_MEMORY_VIOLATION_M0 // Aperture violation requires XNACK_ERROR == 0. s_branch .not_aperture_violation .not_memory_violation: s_bitcmp1_b32 ttmp2, SQ_WAVE_TRAPSTS_MEM_VIOL_SHIFT s_cbranch_scc0 .not_aperture_violation s_or_b32 ttmp3, ttmp3, EC_QUEUE_WAVE_APERTURE_VIOLATION_M0 .not_aperture_violation: s_bitcmp1_b32 ttmp2, SQ_WAVE_TRAPSTS_ILLEGAL_INST_SHIFT s_cbranch_scc0 .not_illegal_instruction s_or_b32 ttmp3, ttmp3, EC_QUEUE_WAVE_ILLEGAL_INSTRUCTION_M0 .not_illegal_instruction: s_and_b32 ttmp2, ttmp2, SQ_WAVE_TRAPSTS_MATH_EXCP s_cbranch_scc0 .not_math_exception s_getreg_b32 ttmp7, hwreg(HW_REG_MODE) s_lshl_b32 ttmp2, ttmp2, SQ_WAVE_MODE_EXCP_EN_SHIFT s_and_b32 ttmp2, ttmp2, ttmp7 s_cbranch_scc0 .not_math_exception s_or_b32 ttmp3, ttmp3, EC_QUEUE_WAVE_MATH_ERROR_M0 .not_math_exception: s_bfe_u32 ttmp2, ttmp6, TTMP6_SAVED_TRAP_ID_BFE s_cmp_eq_u32 ttmp2, TRAP_ID_ABORT s_cbranch_scc0 .not_abort_trap s_or_b32 ttmp3, ttmp3, EC_QUEUE_WAVE_ABORT_M0 .not_abort_trap: // If no other exception was flagged then report a generic error. s_andn2_b32 ttmp2, ttmp3, DOORBELL_ID_MASK s_cbranch_scc1 .send_interrupt s_or_b32 ttmp3, ttmp3, EC_QUEUE_WAVE_TRAP_M0 .send_interrupt: // m0 = interrupt data = (exception_code << DOORBELL_ID_SIZE) | doorbell_id s_mov_b32 ttmp2, m0 s_mov_b32 m0, ttmp3 s_nop 0x0 // Manually inserted wait states s_sendmsg sendmsg(MSG_INTERRUPT) s_mov_b32 m0, ttmp2 // Parking the wave requires saving the original pc in the preserved ttmps. // Register layout before parking the wave: // // ttmp7: 0[31:0] // ttmp11: 1st_level_ttmp11[31:23] 0[15:0] 1st_level_ttmp11[6:0] // // After parking the wave: // // ttmp7: pc_lo[31:0] // ttmp11: 1st_level_ttmp11[31:23] pc_hi[15:0] 1st_level_ttmp11[6:0] .if ((.amdgcn.gfx_generation_number == 10 && .amdgcn.gfx_generation_minor >= 3) || .amdgcn.gfx_generation_number > 10) s_branch .halt_wave .else // Save the PC s_mov_b32 ttmp7, ttmp0 s_and_b32 ttmp1, ttmp1, SQ_WAVE_PC_HI_ADDRESS_MASK s_lshl_b32 ttmp1, ttmp1, TTMP11_PC_HI_SHIFT s_andn2_b32 ttmp11, ttmp11, (SQ_WAVE_PC_HI_ADDRESS_MASK << TTMP11_PC_HI_SHIFT) s_or_b32 ttmp11, ttmp11, ttmp1 // Park the wave s_getpc_b64 [ttmp0, ttmp1] s_add_u32 ttmp0, ttmp0, .parked - . s_addc_u32 ttmp1, ttmp1, 0x0 s_branch .halt_wave .parked: s_trap 0x2 s_branch .parked .endif .halt_wave: // Halt the wavefront upon restoring STATUS below. s_bitset1_b32 ttmp6, TTMP6_WAVE_STOPPED_SHIFT s_bitset1_b32 ttmp12, SQ_WAVE_STATUS_HALT_SHIFT .exit_trap: // Restore SQ_WAVE_IB_STS. .if .amdgcn.gfx_generation_number == 9 s_lshr_b32 ttmp2, ttmp11, (TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT - SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT) s_and_b32 ttmp2, ttmp2, SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK s_setreg_b32 hwreg(HW_REG_IB_STS), ttmp2 .endif .if .amdgcn.gfx_generation_number == 10 && .amdgcn.gfx_generation_minor < 3 s_lshr_b32 ttmp2, ttmp11, (TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT - SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT) s_and_b32 ttmp3, ttmp2, SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK s_lshr_b32 ttmp2, ttmp11, (TTMP11_SAVE_REPLAY_W64H_SHIFT - SQ_WAVE_IB_STS_REPLAY_W64H_SHIFT) s_and_b32 ttmp2, ttmp2, SQ_WAVE_IB_STS_REPLAY_W64H_MASK s_or_b32 ttmp2, ttmp2, ttmp3 s_setreg_b32 hwreg(HW_REG_IB_STS), ttmp2 .endif // Restore SQ_WAVE_STATUS. s_and_b64 exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32 s_and_b64 vcc, vcc, vcc // Restore STATUS.VCCZ, not writable by s_setreg_b32 s_setreg_b32 hwreg(HW_REG_STATUS), ttmp12 // Return to original (possibly modified) PC. s_rfe_b64 [ttmp0, ttmp1] */ static const unsigned int kCodeTrapHandlerV2_9[] = { 0x92eeff6d, 0x00080010, 0xbf840009, 0xbf0d986d, 0xbf850002, 0x806c846c, 0x826d806d, 0xbf06836e, 0xbf840003, 0xbf0c9777, 0xbf840001, 0xbf82004c, 0x8972ff72, 0x3e000000, 0x83ee8f6e, 0x8e6e996e, 0x87726e72, 0x92eeff78, 0x0001000d, 0x8e6e9d6e, 0x87726e72, 0xbeee007e, 0xbeef007f, 0xbefe00ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff006f, 0x866fff7e, 0x000003ff, 0xbefe006e, 0xb8eef803, 0xbf0d9c6e, 0xbf840003, 0x876fff6f, 0x00004000, 0xbf820004, 0xbf0d886e, 0xbf840002, 0x876fff6f, 0x00008000, 0xbf0d8b6e, 0xbf840002, 0x876fff6f, 0x00002000, 0x866eff6e, 0x0000007f, 0xbf840006, 0xb8f3f801, 0x8e6e8c6e, 0x866e736e, 0xbf840002, 0x876fff6f, 0x00001000, 0x92eeff72, 0x00040019, 0xbf06826e, 0xbf840002, 0x876fff6f, 0x00000400, 0x896eff6f, 0x000003ff, 0xbf850002, 0x876fff6f, 0x00000800, 0xbeee007c, 0xbefc006f, 0xbf800000, 0xbf900001, 0xbefc006e, 0xbef3006c, 0x866dff6d, 0x0000ffff, 0x8e6d876d, 0x8977ff77, 0x007fff80, 0x87776d77, 0xbeec1c00, 0x806cff6c, 0x00000010, 0x826d806d, 0xbf820002, 0xbf920002, 0xbf82fffe, 0xbef21a9e, 0xbef81a8d, 0x8f6e8b77, 0x866eff6e, 0x001f8000, 0xb96ef807, 0x86fe7e7e, 0x86ea6a6a, 0xb978f802, 0xbe801f6c, }; static const unsigned int kCodeTrapHandlerV2_1010[] = { 0x93eeff6d, 0x00080010, 0xbf840009, 0xbf0d986d, 0xbf850002, 0x806c846c, 0x826d806d, 0xbf06836e, 0xbf840003, 0xbf0c9777, 0xbf840001, 0xbf82004c, 0x8a72ff72, 0x3e000000, 0x83ee8f6e, 0x8f6e996e, 0x88726e72, 0x93eeff78, 0x0001000d, 0x8f6e9d6e, 0x88726e72, 0xbeee037e, 0xbeef037f, 0xbefe03ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff036f, 0x876fff7e, 0x000003ff, 0xbefe036e, 0xb96ef803, 0xbf0d9c6e, 0xbf840003, 0x886fff6f, 0x00004000, 0xbf820004, 0xbf0d886e, 0xbf840002, 0x886fff6f, 0x00008000, 0xbf0d8b6e, 0xbf840002, 0x886fff6f, 0x00002000, 0x876eff6e, 0x0000007f, 0xbf840006, 0xb973f801, 0x8f6e8c6e, 0x876e736e, 0xbf840002, 0x886fff6f, 0x00001000, 0x93eeff72, 0x00040019, 0xbf06826e, 0xbf840002, 0x886fff6f, 0x00000400, 0x8a6eff6f, 0x000003ff, 0xbf850002, 0x886fff6f, 0x00000800, 0xbeee037c, 0xbefc036f, 0xbf800000, 0xbf900001, 0xbefc036e, 0xbef3036c, 0x876dff6d, 0x0000ffff, 0x8f6d876d, 0x8a77ff77, 0x007fff80, 0x88776d77, 0xbeec1f00, 0x806cff6c, 0x00000010, 0x826d806d, 0xbf820002, 0xbf920002, 0xbf82fffe, 0xbef21d9e, 0xbef81d8d, 0x906e8977, 0x876fff6e, 0x003f8000, 0x906e8677, 0x876eff6e, 0x02000000, 0x886e6f6e, 0xb9eef807, 0x87fe7e7e, 0x87ea6a6a, 0xb9f8f802, 0xbe80226c, }; static const unsigned int kCodeTrapHandlerV2_10[] = { 0x93eeff6d, 0x00080010, 0xbf840009, 0xbf0d986d, 0xbf850002, 0x806c846c, 0x826d806d, 0xbf06836e, 0xbf840003, 0xbf0c9777, 0xbf840001, 0xbf82003f, 0x8a72ff72, 0x3e000000, 0x83ee8f6e, 0x8f6e996e, 0x88726e72, 0x93eeff78, 0x0001000d, 0x8f6e9d6e, 0x88726e72, 0xbeee037e, 0xbeef037f, 0xbefe03ff, 0x80000000, 0xbf90000a, 0xbf800007, 0xbf0c9f7e, 0xbf84fffd, 0xbeff036f, 0x876fff7e, 0x000003ff, 0xbefe036e, 0xb96ef803, 0xbf0d9c6e, 0xbf840003, 0x886fff6f, 0x00004000, 0xbf820004, 0xbf0d886e, 0xbf840002, 0x886fff6f, 0x00008000, 0xbf0d8b6e, 0xbf840002, 0x886fff6f, 0x00002000, 0x876eff6e, 0x0000007f, 0xbf840006, 0xb973f801, 0x8f6e8c6e, 0x876e736e, 0xbf840002, 0x886fff6f, 0x00001000, 0x93eeff72, 0x00040019, 0xbf06826e, 0xbf840002, 0x886fff6f, 0x00000400, 0x8a6eff6f, 0x000003ff, 0xbf850002, 0x886fff6f, 0x00000800, 0xbeee037c, 0xbefc036f, 0xbf800000, 0xbf900001, 0xbefc036e, 0xbf820000, 0xbef21d9e, 0xbef81d8d, 0x87fe7e7e, 0x87ea6a6a, 0xb9f8f802, 0xbe80226c, }; } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_hsa_code.hpp000066400000000000000000000402711420110115200222160ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_CODE_HPP_ #define AMD_HSA_CODE_HPP_ #include "core/inc/amd_elf_image.hpp" #include "inc/amd_hsa_elf.h" #include "inc/amd_hsa_kernel_code.h" #include "inc/hsa.h" #include "inc/hsa_ext_finalize.h" #include #include #include #include namespace rocr { namespace amd { namespace hsa { namespace common { template class Signed { public: static const uint64_t CT_SIGNATURE; const uint64_t RT_SIGNATURE; protected: Signed(): RT_SIGNATURE(signature) {} virtual ~Signed() {} }; template const uint64_t Signed::CT_SIGNATURE = signature; bool IsAccessibleMemoryAddress(uint64_t address); template size_t OffsetOf(member_type class_type::*member) { return (char*)&((class_type*)nullptr->*member) - (char*)nullptr; } template class_type* ObjectAt(uint64_t address) { if (!IsAccessibleMemoryAddress(address)) { return nullptr; } const uint64_t *rt_signature = (const uint64_t*)(address + OffsetOf(&class_type::RT_SIGNATURE)); if (nullptr == rt_signature) { return nullptr; } if (class_type::CT_SIGNATURE != *rt_signature) { return nullptr; } return (class_type*)address; } } // namespace common namespace code { typedef amd::elf::Segment Segment; typedef amd::elf::Section Section; typedef amd::elf::RelocationSection RelocationSection; typedef amd::elf::Relocation Relocation; class KernelSymbol; class VariableSymbol; class Symbol { protected: amd::elf::Symbol* elfsym; public: explicit Symbol(amd::elf::Symbol* elfsym_) : elfsym(elfsym_) { } virtual ~Symbol() { } virtual bool IsKernelSymbol() const { return false; } virtual KernelSymbol* AsKernelSymbol() { assert(false); return 0; } virtual bool IsVariableSymbol() const { return false; } virtual VariableSymbol* AsVariableSymbol() { assert(false); return 0; } amd::elf::Symbol* elfSym() { return elfsym; } std::string Name() const { return elfsym ? elfsym->name() : ""; } Section* GetSection() { return elfsym->section(); } virtual uint64_t SectionOffset() const { return elfsym->value(); } virtual uint64_t VAddr() const { return elfsym->section()->addr() + elfsym->value(); } uint32_t Index() const { return elfsym ? elfsym->index() : 0; } bool IsDeclaration() const; bool IsDefinition() const; virtual bool IsAgent() const; virtual hsa_symbol_kind_t Kind() const = 0; hsa_symbol_linkage_t Linkage() const; hsa_variable_allocation_t Allocation() const; hsa_variable_segment_t Segment() const; uint64_t Size() const; uint32_t Size32() const; uint32_t Alignment() const; bool IsConst() const; virtual hsa_status_t GetInfo(hsa_code_symbol_info_t attribute, void *value); static hsa_code_symbol_t ToHandle(Symbol* sym); static Symbol* FromHandle(hsa_code_symbol_t handle); void setValue(uint64_t value) { elfsym->setValue(value); } void setSize(uint32_t size) { elfsym->setSize(size); } std::string GetModuleName() const; std::string GetSymbolName() const; }; class KernelSymbol : public Symbol { private: uint32_t kernarg_segment_size, kernarg_segment_alignment; uint32_t group_segment_size, private_segment_size; bool is_dynamic_callstack; public: explicit KernelSymbol(amd::elf::Symbol* elfsym_, const amd_kernel_code_t* akc); bool IsKernelSymbol() const override { return true; } KernelSymbol* AsKernelSymbol() override { return this; } hsa_symbol_kind_t Kind() const override { return HSA_SYMBOL_KIND_KERNEL; } hsa_status_t GetInfo(hsa_code_symbol_info_t attribute, void *value) override; }; class VariableSymbol : public Symbol { public: explicit VariableSymbol(amd::elf::Symbol* elfsym_) : Symbol(elfsym_) { } bool IsVariableSymbol() const override { return true; } VariableSymbol* AsVariableSymbol() override { return this; } hsa_symbol_kind_t Kind() const override { return HSA_SYMBOL_KIND_VARIABLE; } hsa_status_t GetInfo(hsa_code_symbol_info_t attribute, void *value) override; }; class AmdHsaCode { private: std::ostringstream out; std::unique_ptr img; std::vector dataSegments; std::vector dataSections; std::vector relocationSections; std::vector symbols; bool combineDataSegments; Segment* hsaSegments[AMDGPU_HSA_SEGMENT_LAST][2]; Section* hsaSections[AMDGPU_HSA_SECTION_LAST]; amd::elf::Section* hsatext; amd::elf::Section* imageInit; amd::elf::Section* samplerInit; amd::elf::Section* debugInfo; amd::elf::Section* debugLine; amd::elf::Section* debugAbbrev; bool PullElf(); bool PullElfV1(); bool PullElfV2(); void AddAmdNote(uint32_t type, const void* desc, uint32_t desc_size); template bool GetAmdNote(uint32_t type, S** desc) { uint32_t desc_size; if (!img->note()->getNote("AMD", type, (void**) desc, &desc_size)) { out << "Failed to find note, type: " << type << std::endl; return false; } if (desc_size < sizeof(S)) { out << "Note size mismatch, type: " << type << " size: " << desc_size << " expected at least " << sizeof(S) << std::endl; return false; } return true; } void PrintSegment(std::ostream& out, Segment* segment); void PrintSection(std::ostream& out, Section* section); void PrintRawData(std::ostream& out, Section* section); void PrintRawData(std::ostream& out, const unsigned char *data, size_t size); void PrintRelocationData(std::ostream& out, RelocationSection* section); void PrintSymbol(std::ostream& out, Symbol* sym); void PrintDisassembly(std::ostream& out, const unsigned char *isa, size_t size, uint32_t isa_offset = 0); std::string MangleSymbolName(const std::string& module_name, const std::string symbol_name); bool ElfImageError(); public: bool HasHsaText() const { return hsatext != 0; } amd::elf::Section* HsaText() { assert(hsatext); return hsatext; } const amd::elf::Section* HsaText() const { assert(hsatext); return hsatext; } amd::elf::SymbolTable* Symtab() { assert(img); return img->symtab(); } uint16_t Machine() const { return img->Machine(); } uint32_t EFlags() const { return img->EFlags(); } uint32_t EClass() const { return img->EClass(); } uint32_t OsAbi() const { return img->OsAbi(); } AmdHsaCode(bool combineDataSegments = true); virtual ~AmdHsaCode(); std::string output() { return out.str(); } bool LoadFromFile(const std::string& filename); bool SaveToFile(const std::string& filename); bool WriteToBuffer(void* buffer); bool InitFromBuffer(const void* buffer, size_t size); bool InitAsBuffer(const void* buffer, size_t size); bool InitAsHandle(hsa_code_object_t code_handle); bool InitNew(bool xnack = false); bool Freeze(); hsa_code_object_t GetHandle(); const char* ElfData(); uint64_t ElfSize(); bool Validate(); void Print(std::ostream& out); void PrintNotes(std::ostream& out); void PrintSegments(std::ostream& out); void PrintSections(std::ostream& out); void PrintSymbols(std::ostream& out); void PrintMachineCode(std::ostream& out); void PrintMachineCode(std::ostream& out, KernelSymbol* sym); bool PrintToFile(const std::string& filename); void AddNoteCodeObjectVersion(uint32_t major, uint32_t minor); bool GetNoteCodeObjectVersion(std::string& version); void AddNoteHsail(uint32_t hsail_major, uint32_t hsail_minor, hsa_profile_t profile, hsa_machine_model_t machine_model, hsa_default_float_rounding_mode_t rounding_mode); bool GetNoteHsail(uint32_t* hsail_major, uint32_t* hsail_minor, hsa_profile_t* profile, hsa_machine_model_t* machine_model, hsa_default_float_rounding_mode_t* default_float_round); void AddNoteIsa(const std::string& vendor_name, const std::string& architecture_name, uint32_t major, uint32_t minor, uint32_t stepping); bool GetNoteIsa(std::string& vendor_name, std::string& architecture_name, uint32_t* major_version, uint32_t* minor_version, uint32_t* stepping); void AddNoteProducer(uint32_t major, uint32_t minor, const std::string& producer); bool GetNoteProducer(uint32_t* major, uint32_t* minor, std::string& producer_name); void AddNoteProducerOptions(const std::string& options); void AddNoteProducerOptions(int32_t call_convention, const hsa_ext_control_directives_t& user_directives, const std::string& user_options); bool GetNoteProducerOptions(std::string& options); bool GetIsa(std::string& isaName); bool GetCodeObjectVersion(uint32_t* major, uint32_t* minor); hsa_status_t GetInfo(hsa_code_object_info_t attribute, void *value); hsa_status_t GetSymbol(const char *module_name, const char *symbol_name, hsa_code_symbol_t *sym); hsa_status_t IterateSymbols(hsa_code_object_t code_object, hsa_status_t (*callback)( hsa_code_object_t code_object, hsa_code_symbol_t symbol, void* data), void* data); void AddHsaTextData(const void* buffer, size_t size); uint64_t NextKernelCodeOffset() const; bool AddKernelCode(KernelSymbol* sym, const void* code, size_t size); Symbol* AddKernelDefinition(const std::string& name, const void* isa, size_t isa_size); size_t DataSegmentCount() const { return dataSegments.size(); } Segment* DataSegment(size_t i) const { return dataSegments[i]; } size_t DataSectionCount() { return dataSections.size(); } Section* DataSection(size_t i) { return dataSections[i]; } Section* AddEmptySection(); Section* AddCodeSection(Segment* segment); Section* AddDataSection(const std::string &name, uint32_t type, uint64_t flags, Segment* segment); bool HasImageInitSection() const { return imageInit != 0; } Section* ImageInitSection(); void AddImageInitializer(Symbol* image, uint64_t destOffset, const amdgpu_hsa_image_descriptor_t& init); void AddImageInitializer(Symbol* image, uint64_t destOffset, amdgpu_hsa_metadata_kind16_t kind, amdgpu_hsa_image_geometry8_t geometry, amdgpu_hsa_image_channel_order8_t channel_order, amdgpu_hsa_image_channel_type8_t channel_type, uint64_t width, uint64_t height, uint64_t depth, uint64_t array); bool HasSamplerInitSection() const { return samplerInit != 0; } amd::elf::Section* SamplerInitSection(); amd::elf::Section* AddSamplerInit(); void AddSamplerInitializer(Symbol* sampler, uint64_t destOffset, const amdgpu_hsa_sampler_descriptor_t& init); void AddSamplerInitializer(Symbol* sampler, uint64_t destOffset, amdgpu_hsa_sampler_coord8_t coord, amdgpu_hsa_sampler_filter8_t filter, amdgpu_hsa_sampler_addressing8_t addressing); void AddInitVarWithAddress(bool large, Symbol* dest, uint64_t destOffset, Symbol* addrOf, uint64_t addrAddend); void InitHsaSegment(amdgpu_hsa_elf_segment_t segment, bool writable); bool AddHsaSegments(); Segment* HsaSegment(amdgpu_hsa_elf_segment_t segment, bool writable); void InitHsaSectionSegment(amdgpu_hsa_elf_section_t section, bool combineSegments = true); Section* HsaDataSection(amdgpu_hsa_elf_section_t section, bool combineSegments = true); Symbol* AddExecutableSymbol(const std::string &name, unsigned char type, unsigned char binding, unsigned char other, Section *section = 0); Symbol* AddVariableSymbol(const std::string &name, unsigned char type, unsigned char binding, unsigned char other, Section *section, uint64_t value, uint64_t size); void AddSectionSymbols(); size_t RelocationSectionCount() { return relocationSections.size(); } RelocationSection* GetRelocationSection(size_t i) { return relocationSections[i]; } size_t SymbolCount() { return symbols.size(); } Symbol* GetSymbol(size_t i) { return symbols[i]; } Symbol* GetSymbolByElfIndex(size_t index); Symbol* FindSymbol(const std::string &n); void AddData(amdgpu_hsa_elf_section_t section, const void* data = 0, size_t size = 0); Section* DebugInfo(); Section* DebugLine(); Section* DebugAbbrev(); Section* AddHsaHlDebug(const std::string& name, const void* data, size_t size); }; class AmdHsaCodeManager { private: typedef std::unordered_map CodeMap; CodeMap codeMap; public: AmdHsaCode* FromHandle(hsa_code_object_t handle); bool Destroy(hsa_code_object_t handle); }; class KernelSymbolV2 : public KernelSymbol { private: public: explicit KernelSymbolV2(amd::elf::Symbol* elfsym_, const amd_kernel_code_t* akc); bool IsAgent() const override { return true; } uint64_t SectionOffset() const override { return elfsym->value() - elfsym->section()->addr(); } uint64_t VAddr() const override { return elfsym->value(); } }; class VariableSymbolV2 : public VariableSymbol { private: public: explicit VariableSymbolV2(amd::elf::Symbol* elfsym_) : VariableSymbol(elfsym_) { } bool IsAgent() const override { return false; } uint64_t SectionOffset() const override { return elfsym->value() - elfsym->section()->addr(); } uint64_t VAddr() const override { return elfsym->value(); } }; } // namespace code } // namespace hsa } // namespace amd } // namespace rocr #endif // AMD_HSA_CODE_HPP_ ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_hsa_loader.hpp000066400000000000000000000371221420110115200225530ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_LOADER_HPP #define AMD_HSA_LOADER_HPP #include #include #include "inc/hsa.h" #include "inc/hsa_ext_image.h" #include "inc/hsa_ven_amd_loader.h" #include "inc/amd_hsa_elf.h" #include #include #include #if defined(_WIN32) || defined(_WIN64) #include #define __read__ _read #define __lseek__ _lseek #else #include #define __read__ read #define __lseek__ lseek #endif // _WIN32 || _WIN64 /// @brief Major version of the AMD HSA Loader. Major versions are not backwards /// compatible. #define AMD_HSA_LOADER_VERSION_MAJOR 0 /// @brief Minor version of the AMD HSA Loader. Minor versions are backwards /// compatible. #define AMD_HSA_LOADER_VERSION_MINOR 5 /// @brief Descriptive version of the AMD HSA Loader. #define AMD_HSA_LOADER_VERSION "AMD HSA Loader v0.05 (June 16, 2015)" enum hsa_ext_symbol_info_t { HSA_EXT_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT_SIZE = 100, HSA_EXT_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT_ALIGN = 101, }; typedef uint32_t hsa_symbol_info32_t; typedef hsa_executable_symbol_t hsa_symbol_t; typedef hsa_executable_symbol_info_t hsa_symbol_info_t; /// @brief Loaded code object attributes. enum amd_loaded_code_object_info_t { AMD_LOADED_CODE_OBJECT_INFO_ELF_IMAGE = 0, AMD_LOADED_CODE_OBJECT_INFO_ELF_IMAGE_SIZE = 1 }; /// @brief Loaded segment handle. typedef struct amd_loaded_segment_s { uint64_t handle; } amd_loaded_segment_t; /// @brief Loaded segment attributes. enum amd_loaded_segment_info_t { AMD_LOADED_SEGMENT_INFO_TYPE = 0, AMD_LOADED_SEGMENT_INFO_ELF_BASE_ADDRESS = 1, AMD_LOADED_SEGMENT_INFO_LOAD_BASE_ADDRESS = 2, AMD_LOADED_SEGMENT_INFO_SIZE = 3 }; namespace rocr { namespace amd { namespace hsa { namespace loader { /// @class CodeObjectReaderImpl. /// @brief Code Object Reader Wrapper. struct CodeObjectReaderImpl final { public: /// @returns Handle equivalent of @p object. static hsa_code_object_reader_t Handle( const CodeObjectReaderImpl *object) { hsa_code_object_reader_t handle = {reinterpret_cast(object)}; return handle; } /// @returns Object equivalent of @p handle. static CodeObjectReaderImpl *Object( const hsa_code_object_reader_t &handle) { CodeObjectReaderImpl *object = reinterpret_cast(handle.handle); return object; } /// @brief Default constructor. CodeObjectReaderImpl() {} /// @brief Default destructor. ~CodeObjectReaderImpl(); hsa_status_t SetFile( hsa_file_t _code_object_file_descriptor, size_t _code_object_offset = 0, size_t _code_object_size = 0); hsa_status_t SetMemory( const void *_code_object_memory, size_t _code_object_size); const void *GetCodeObjectMemory() const { return code_object_memory; }; std::string GetUri() const { return uri; }; private: const void *code_object_memory{nullptr}; size_t code_object_size{0}; std::string uri{}; bool is_mmap{false}; }; //===----------------------------------------------------------------------===// // Context. // //===----------------------------------------------------------------------===// class Context { public: virtual ~Context() {} virtual hsa_isa_t IsaFromName(const char *name) = 0; virtual bool IsaSupportedByAgent(hsa_agent_t agent, hsa_isa_t isa) = 0; virtual void* SegmentAlloc(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, size_t size, size_t align, bool zero) = 0; virtual bool SegmentCopy(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* dst, size_t offset, const void* src, size_t size) = 0; virtual void SegmentFree(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* seg, size_t size) = 0; virtual void* SegmentAddress(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* seg, size_t offset) = 0; virtual void* SegmentHostAddress(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* seg, size_t offset) = 0; virtual bool SegmentFreeze(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* seg, size_t size) = 0; virtual bool ImageExtensionSupported() = 0; virtual hsa_status_t ImageCreate( hsa_agent_t agent, hsa_access_permission_t image_permission, const hsa_ext_image_descriptor_t *image_descriptor, const void *image_data, hsa_ext_image_t *image_handle) = 0; virtual hsa_status_t ImageDestroy( hsa_agent_t agent, hsa_ext_image_t image_handle) = 0; virtual hsa_status_t SamplerCreate( hsa_agent_t agent, const hsa_ext_sampler_descriptor_t *sampler_descriptor, hsa_ext_sampler_t *sampler_handle) = 0; virtual hsa_status_t SamplerDestroy( hsa_agent_t agent, hsa_ext_sampler_t sampler_handle) = 0; protected: Context() {} private: Context(const Context &c); Context& operator=(const Context &c); }; //===----------------------------------------------------------------------===// // Symbol. // //===----------------------------------------------------------------------===// class Symbol { public: static hsa_symbol_t Handle(Symbol *symbol) { hsa_symbol_t symbol_handle = {reinterpret_cast(symbol)}; return symbol_handle; } static Symbol* Object(hsa_symbol_t symbol_handle) { Symbol *symbol = reinterpret_cast(symbol_handle.handle); return symbol; } virtual ~Symbol() {} virtual bool GetInfo(hsa_symbol_info32_t symbol_info, void *value) = 0; virtual hsa_agent_t GetAgent() = 0; protected: Symbol() {} private: Symbol(const Symbol &s); Symbol& operator=(const Symbol &s); }; //===----------------------------------------------------------------------===// // LoadedCodeObject. // //===----------------------------------------------------------------------===// class LoadedCodeObject { public: static hsa_loaded_code_object_t Handle(LoadedCodeObject *object) { hsa_loaded_code_object_t handle = {reinterpret_cast(object)}; return handle; } static LoadedCodeObject* Object(hsa_loaded_code_object_t handle) { LoadedCodeObject *object = reinterpret_cast(handle.handle); return object; } virtual ~LoadedCodeObject() {} virtual bool GetInfo(amd_loaded_code_object_info_t attribute, void *value) = 0; virtual hsa_status_t IterateLoadedSegments( hsa_status_t (*callback)( amd_loaded_segment_t loaded_segment, void *data), void *data) = 0; virtual hsa_agent_t getAgent() const = 0; virtual hsa_executable_t getExecutable() const = 0; virtual uint64_t getElfData() const = 0; virtual uint64_t getElfSize() const = 0; virtual uint64_t getStorageOffset() const = 0; virtual uint64_t getLoadBase() const = 0; virtual uint64_t getLoadSize() const = 0; virtual int64_t getDelta() const = 0; virtual std::string getUri() const = 0; protected: LoadedCodeObject() {} private: LoadedCodeObject(const LoadedCodeObject&); LoadedCodeObject& operator=(const LoadedCodeObject&); }; //===----------------------------------------------------------------------===// // LoadedSegment. // //===----------------------------------------------------------------------===// class LoadedSegment { public: static amd_loaded_segment_t Handle(LoadedSegment *object) { amd_loaded_segment_t handle = {reinterpret_cast(object)}; return handle; } static LoadedSegment* Object(amd_loaded_segment_t handle) { LoadedSegment *object = reinterpret_cast(handle.handle); return object; } virtual ~LoadedSegment() {} virtual bool GetInfo(amd_loaded_segment_info_t attribute, void *value) = 0; protected: LoadedSegment() {} private: LoadedSegment(const LoadedSegment&); LoadedSegment& operator=(const LoadedSegment&); }; //===----------------------------------------------------------------------===// // Executable. // //===----------------------------------------------------------------------===// class Executable { public: static hsa_executable_t Handle(Executable *executable) { hsa_executable_t executable_handle = {reinterpret_cast(executable)}; return executable_handle; } static Executable* Object(hsa_executable_t executable_handle) { Executable *executable = reinterpret_cast(executable_handle.handle); return executable; } virtual ~Executable() {} virtual hsa_status_t GetInfo( hsa_executable_info_t executable_info, void *value) = 0; virtual hsa_status_t DefineProgramExternalVariable( const char *name, void *address) = 0; virtual hsa_status_t DefineAgentExternalVariable( const char *name, hsa_agent_t agent, hsa_variable_segment_t segment, void *address) = 0; virtual hsa_status_t LoadCodeObject( hsa_agent_t agent, hsa_code_object_t code_object, const char *options, const std::string &uri, hsa_loaded_code_object_t *loaded_code_object = nullptr) = 0; virtual hsa_status_t LoadCodeObject( hsa_agent_t agent, hsa_code_object_t code_object, size_t code_object_size, const char *options, const std::string &uri, hsa_loaded_code_object_t *loaded_code_object = nullptr) = 0; virtual hsa_status_t Freeze(const char *options) = 0; virtual hsa_status_t Validate(uint32_t *result) = 0; /// @note needed for hsa v1.0. /// @todo remove during loader refactoring. virtual bool IsProgramSymbol(const char *symbol_name) = 0; virtual Symbol* GetSymbol( const char *symbol_name, const hsa_agent_t *agent) = 0; typedef hsa_status_t (*iterate_symbols_f)( hsa_executable_t executable, hsa_symbol_t symbol_handle, void *data); virtual hsa_status_t IterateSymbols( iterate_symbols_f callback, void *data) = 0; /// @since hsa v1.1. virtual hsa_status_t IterateAgentSymbols( hsa_agent_t agent, hsa_status_t (*callback)(hsa_executable_t exec, hsa_agent_t agent, hsa_executable_symbol_t symbol, void *data), void *data) = 0; /// @since hsa v1.1. virtual hsa_status_t IterateProgramSymbols( hsa_status_t (*callback)(hsa_executable_t exec, hsa_executable_symbol_t symbol, void *data), void *data) = 0; virtual hsa_status_t IterateLoadedCodeObjects( hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data) = 0; virtual size_t GetNumSegmentDescriptors() = 0; virtual size_t QuerySegmentDescriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t total_num_segment_descriptors, size_t first_empty_segment_descriptor) = 0; virtual uint64_t FindHostAddress(uint64_t device_address) = 0; virtual void Print(std::ostream& out) = 0; virtual bool PrintToFile(const std::string& filename) = 0; protected: Executable() {} private: Executable(const Executable &e); Executable& operator=(const Executable &e); static std::vector executables; static std::mutex executables_mutex; }; /// @class Loader class Loader { public: /// @brief Destructor. virtual ~Loader() {} /// @brief Creates AMD HSA Loader with specified @p context. /// /// @param[in] context Context. Must not be null. /// /// @returns AMD HSA Loader on success, null on failure. static Loader* Create(Context* context); /// @brief Destroys AMD HSA Loader @p Loader_object. /// /// @param[in] loader AMD HSA Loader to destroy. Must not be null. static void Destroy(Loader *loader); /// @returns Context associated with Loader. virtual Context* GetContext() const = 0; /// @brief Creates empty AMD HSA Executable with specified @p profile, /// @p options virtual Executable* CreateExecutable( hsa_profile_t profile, const char *options, hsa_default_float_rounding_mode_t default_float_rounding_mode = HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT) = 0; /// @brief Freezes @p executable virtual hsa_status_t FreezeExecutable(Executable *executable, const char *options) = 0; /// @brief Destroys @p executable virtual void DestroyExecutable(Executable *executable) = 0; /// @brief Invokes @p callback for each created executable virtual hsa_status_t IterateExecutables( hsa_status_t (*callback)( hsa_executable_t executable, void *data), void *data) = 0; /// @brief same as hsa_ven_amd_loader_query_segment_descriptors. virtual hsa_status_t QuerySegmentDescriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors) = 0; /// @brief Finds the handle of executable to which @p device_address /// belongs. Return NULL handle if device address is invalid. virtual hsa_executable_t FindExecutable(uint64_t device_address) = 0; /// @brief Returns host address given @p device_address. If @p device_address /// is already host address, returns null pointer. If @p device_address is /// invalid address, returns null pointer. virtual uint64_t FindHostAddress(uint64_t device_address) = 0; /// @brief Print loader help. virtual void PrintHelp(std::ostream& out) = 0; protected: /// @brief Default constructor. Loader() {} private: /// @brief Copy constructor - not available. Loader(const Loader&); /// @brief Assignment operator - not available. Loader& operator=(const Loader&); }; } // namespace loader } // namespace hsa } // namespace amd } // namespace rocr #endif // AMD_HSA_LOADER_HPP ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_loader_context.hpp000066400000000000000000000077721420110115200234740ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_AMD_LOADER_CONTEXT_HPP #define HSA_RUNTIME_CORE_INC_AMD_LOADER_CONTEXT_HPP #include "core/inc/amd_hsa_loader.hpp" namespace rocr { namespace amd { class LoaderContext final: public amd::hsa::loader::Context { public: LoaderContext(): amd::hsa::loader::Context() {} ~LoaderContext() {} hsa_isa_t IsaFromName(const char *name) override; bool IsaSupportedByAgent(hsa_agent_t agent, hsa_isa_t code_object_isa) override; void* SegmentAlloc(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, size_t size, size_t align, bool zero) override; bool SegmentCopy(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* dst, size_t offset, const void* src, size_t size) override; void SegmentFree(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* seg, size_t size = 0) override; void* SegmentAddress(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* seg, size_t offset) override; void* SegmentHostAddress(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* seg, size_t offset) override; bool SegmentFreeze(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, void* seg, size_t size) override; bool ImageExtensionSupported() override; hsa_status_t ImageCreate(hsa_agent_t agent, hsa_access_permission_t image_permission, const hsa_ext_image_descriptor_t* image_descriptor, const void* image_data, hsa_ext_image_t* image_handle) override; hsa_status_t ImageDestroy(hsa_agent_t agent, hsa_ext_image_t image_handle) override; hsa_status_t SamplerCreate(hsa_agent_t agent, const hsa_ext_sampler_descriptor_t* sampler_descriptor, hsa_ext_sampler_t* sampler_handle) override; hsa_status_t SamplerDestroy(hsa_agent_t agent, hsa_ext_sampler_t sampler_handle) override; private: LoaderContext(const LoaderContext&); LoaderContext& operator=(const LoaderContext&); }; } // namespace amd } // namespace rocr #endif // HSA_RUNTIME_CORE_INC_AMD_LOADER_CONTEXT_HPP ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_memory_region.h000066400000000000000000000173311420110115200227650ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // AMD specific HSA backend. #ifndef HSA_RUNTIME_CORE_INC_AMD_MEMORY_REGION_H_ #define HSA_RUNTIME_CORE_INC_AMD_MEMORY_REGION_H_ #include "hsakmt.h" #include "core/inc/agent.h" #include "core/inc/runtime.h" #include "core/inc/memory_region.h" #include "core/util/simple_heap.h" #include "core/util/locks.h" #include "inc/hsa_ext_amd.h" namespace rocr { namespace AMD { class MemoryRegion : public core::MemoryRegion { public: /// @brief Convert this object into hsa_region_t. static __forceinline hsa_region_t Convert(MemoryRegion* region) { const hsa_region_t region_handle = { static_cast(reinterpret_cast(region))}; return region_handle; } static __forceinline const hsa_region_t Convert(const MemoryRegion* region) { const hsa_region_t region_handle = { static_cast(reinterpret_cast(region))}; return region_handle; } /// @brief Convert hsa_region_t into AMD::MemoryRegion *. static __forceinline MemoryRegion* Convert(hsa_region_t region) { return reinterpret_cast(region.handle); } /// @brief Allocate agent accessible memory (system / local memory). static void* AllocateKfdMemory(const HsaMemFlags& flag, HSAuint32 node_id, size_t size); /// @brief Free agent accessible memory (system / local memory). static void FreeKfdMemory(void* ptr, size_t size); static bool RegisterMemory(void* ptr, size_t size, const HsaMemFlags& MemFlags); static void DeregisterMemory(void* ptr); /// @brief Pin memory. static bool MakeKfdMemoryResident(size_t num_node, const uint32_t* nodes, const void* ptr, size_t size, uint64_t* alternate_va, HsaMemMapFlags map_flag); /// @brief Unpin memory. static void MakeKfdMemoryUnresident(const void* ptr); MemoryRegion(bool fine_grain, bool kernarg, bool full_profile, core::Agent* owner, const HsaMemoryProperties& mem_props); ~MemoryRegion(); hsa_status_t Allocate(size_t& size, AllocateFlags alloc_flags, void** address) const; hsa_status_t Free(void* address, size_t size) const; hsa_status_t IPCFragmentExport(void* address) const; hsa_status_t GetInfo(hsa_region_info_t attribute, void* value) const; hsa_status_t GetPoolInfo(hsa_amd_memory_pool_info_t attribute, void* value) const; hsa_status_t GetAgentPoolInfo(const core::Agent& agent, hsa_amd_agent_memory_pool_info_t attribute, void* value) const; hsa_status_t AllowAccess(uint32_t num_agents, const hsa_agent_t* agents, const void* ptr, size_t size) const; hsa_status_t CanMigrate(const MemoryRegion& dst, bool& result) const; hsa_status_t Migrate(uint32_t flag, const void* ptr) const; hsa_status_t Lock(uint32_t num_agents, const hsa_agent_t* agents, void* host_ptr, size_t size, void** agent_ptr) const; hsa_status_t Unlock(void* host_ptr) const; HSAuint64 GetBaseAddress() const { return mem_props_.VirtualBaseAddress; } HSAuint64 GetPhysicalSize() const { return mem_props_.SizeInBytes; } HSAuint64 GetVirtualSize() const { return virtual_size_; } hsa_status_t AssignAgent(void* ptr, size_t size, const core::Agent& agent, hsa_access_permission_t access) const; void Trim() const; __forceinline bool IsLocalMemory() const { return ((mem_props_.HeapType == HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE) || (mem_props_.HeapType == HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC)); } __forceinline bool IsPublic() const { return (mem_props_.HeapType == HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC); } __forceinline bool IsSystem() const { return mem_props_.HeapType == HSA_HEAPTYPE_SYSTEM; } __forceinline bool IsLDS() const { return mem_props_.HeapType == HSA_HEAPTYPE_GPU_LDS; } __forceinline bool IsGDS() const { return mem_props_.HeapType == HSA_HEAPTYPE_GPU_GDS; } __forceinline bool IsScratch() const { return mem_props_.HeapType == HSA_HEAPTYPE_GPU_SCRATCH; } __forceinline uint32_t BusWidth() const { return static_cast(mem_props_.Width); } __forceinline uint32_t MaxMemCloc() const { return static_cast(mem_props_.MemoryClockMax); } private: const HsaMemoryProperties mem_props_; HsaMemFlags mem_flag_; HsaMemMapFlags map_flag_; size_t max_single_alloc_size_; // Used to collect total system memory static size_t max_sysmem_alloc_size_; HSAuint64 virtual_size_; // Protects against concurrent allow_access calls to fragments of the same block by virtue of all // fragments of the block routing to the same MemoryRegion. mutable KernelMutex access_lock_; static const size_t kPageSize_ = 4096; // Determine access type allowed to requesting device hsa_amd_memory_pool_access_t GetAccessInfo(const core::Agent& agent, const core::Runtime::LinkInfo& link_info) const; // Operational body for Allocate. Recursive. hsa_status_t AllocateImpl(size_t& size, AllocateFlags alloc_flags, void** address) const; // Operational body for Free. Recursive. hsa_status_t FreeImpl(void* address, size_t size) const; class BlockAllocator { private: MemoryRegion& region_; static const size_t block_size_ = 2 * 1024 * 1024; // 2MB blocks. public: explicit BlockAllocator(MemoryRegion& region) : region_(region) {} void* alloc(size_t request_size, size_t& allocated_size) const; void free(void* ptr, size_t length) const { region_.FreeImpl(ptr, length); } size_t block_size() const { return block_size_; } }; mutable SimpleHeap fragment_allocator_; }; } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/amd_topology.h000066400000000000000000000046611420110115200217700ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_AMD_TOPOLOGY_H_ #define HSA_RUNTIME_CORE_INC_AMD_TOPOLOGY_H_ namespace rocr { namespace AMD { /// @brief Initializes the runtime. /// Should not be called directly, must be called only from Runtime::Acquire() bool Load(); /// @brief Shutdown/cleanup of runtime. /// Should not be called directly, must be called only from Runtime::Release() bool Unload(); } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/blit.h000066400000000000000000000114621420110115200202220ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_BLIT_H_ #define HSA_RUNTIME_CORE_INC_BLIT_H_ #include #include "core/inc/agent.h" namespace rocr { namespace core { class Blit { public: explicit Blit() {} virtual ~Blit() {} /// @brief Marks the blit object as invalid and uncouples its link with /// the underlying compute device's control block. Use of blit object /// once it has been release is illegal and any behavior is indeterminate /// /// @note: The call will block until all commands have executed. /// /// @param agent Agent passed to Initialize. /// /// @return hsa_status_t virtual hsa_status_t Destroy(const core::Agent& agent) = 0; /// @brief Submit a linear copy command to the the underlying compute device's /// control block. The call is blocking until the command execution is /// finished. /// /// @param dst Memory address of the copy destination. /// @param src Memory address of the copy source. /// @param size Size of the data to be copied. virtual hsa_status_t SubmitLinearCopyCommand(void* dst, const void* src, size_t size) = 0; /// @brief Submit a linear copy command to the the underlying compute device's /// control block. The call is non blocking. The memory transfer will start /// after all dependent signals are satisfied. After the transfer is /// completed, the out signal will be decremented. /// /// @param dst Memory address of the copy destination. /// @param src Memory address of the copy source. /// @param size Size of the data to be copied. /// @param dep_signals Arrays of dependent signal. /// @param out_signal Output signal. virtual hsa_status_t SubmitLinearCopyCommand( void* dst, const void* src, size_t size, std::vector& dep_signals, core::Signal& out_signal) = 0; /// @brief Submit a linear fill command to the the underlying compute device's /// control block. The call is blocking until the command execution is /// finished. /// /// @param ptr Memory address of the fill destination. /// @param value Value to be set. /// @param num Number of uint32_t element to be set to the value. virtual hsa_status_t SubmitLinearFillCommand(void* ptr, uint32_t value, size_t num) = 0; /// @brief Enable profiling of the asynchronous copy command. The timestamp /// of each copy request will be stored in the completion signal structure. /// /// @param enable True to enable profiling. False to disable profiling. /// /// @return HSA_STATUS_SUCCESS if the request to enable/disable profiling is /// successful. virtual hsa_status_t EnableProfiling(bool enable) = 0; /// @brief Blit operations use SDMA. virtual bool isSDMA() const { return false; } }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/cache.h000066400000000000000000000061641420110115200203360ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_CACHE_H #define HSA_RUNTIME_CORE_INC_CACHE_H #include "core/inc/hsa_internal.h" #include "core/inc/checked.h" #include "core/util/utils.h" #include #include namespace rocr { namespace core { class Cache : public Checked<0x39A6C7AD3F135B06> { public: static __forceinline hsa_cache_t Convert(const Cache* cache) { const hsa_cache_t handle = {static_cast(reinterpret_cast(cache))}; return handle; } static __forceinline Cache* Convert(const hsa_cache_t cache) { return reinterpret_cast(static_cast(cache.handle)); } Cache(const std::string& name, uint8_t level, uint32_t size) : name_(name), level_(level), size_(size) {} Cache(std::string&& name, uint8_t level, uint32_t size) : name_(std::move(name)), level_(level), size_(size) {} hsa_status_t GetInfo(hsa_cache_info_t attribute, void* value); private: std::string name_; uint32_t level_; uint32_t size_; // Forbid copying and moving of this object DISALLOW_COPY_AND_ASSIGN(Cache); }; } // namespace core } // namespace rocr #endif // HSA_RUNTIME_CORE_INC_CACHE_H ROCR-Runtime-rocm-5.0.0/src/core/inc/checked.h000066400000000000000000000072271420110115200206620ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTME_CORE_INC_CHECKED_H_ #define HSA_RUNTME_CORE_INC_CHECKED_H_ #include #include namespace rocr { namespace core { /// @brief Compares type codes and pointers to check object validity. Used for cast validation. template class Check final { public: typedef Check CheckType; Check() { object_ = uintptr_t(this) ^ uintptr_t(code); } Check(const Check&) { object_ = uintptr_t(this) ^ uintptr_t(code); } Check(Check&&) { object_ = uintptr_t(this) ^ uintptr_t(code); } ~Check() { object_ = NULL; } const Check& operator=(Check&& rhs) { return *this; } const Check& operator=(const Check& rhs) { return *this; } bool IsValid() const { return object_ == (uintptr_t(this) ^ uintptr_t(code)); } uint64_t check_code() const { return code; } private: uintptr_t object_; }; template class Check final { public: typedef Check CheckType; Check() { object_ = uintptr_t(code); } Check(const Check&) { object_ = uintptr_t(code); } Check(Check&&) { object_ = uintptr_t(code); } const Check& operator=(Check&& rhs) { return *this; } const Check& operator=(const Check& rhs) { return *this; } bool IsValid() const { return object_ == uintptr_t(code); } uint64_t check_code() const { return code; } private: uintptr_t object_; }; /// @brief Base class for validating objects. template class Checked { public: typedef Checked CheckedType; bool IsValid() const { return id.IsValid(); } virtual ~Checked() {} private: Check id; }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/default_signal.h000066400000000000000000000143101420110115200222440ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA runtime C++ interface file. #ifndef HSA_RUNTME_CORE_INC_DEFAULT_SIGNAL_H_ #define HSA_RUNTME_CORE_INC_DEFAULT_SIGNAL_H_ #include "core/inc/runtime.h" #include "core/inc/signal.h" #include "core/util/utils.h" namespace rocr { namespace core { /// @brief Operations for a simple pure memory based signal. /// @brief See base class Signal. class BusyWaitSignal : public Signal { public: /// @brief Determines if a Signal* can be safely converted to BusyWaitSignal* /// via static_cast. static __forceinline bool IsType(Signal* ptr) { return ptr->IsType(&rtti_id_); } /// @brief See base class Signal. explicit BusyWaitSignal(SharedSignal* abi_block, bool enableIPC); // Below are various methods corresponding to the APIs, which load/store the // signal value or modify the existing signal value automically and with // specified memory ordering semantics. hsa_signal_value_t LoadRelaxed(); hsa_signal_value_t LoadAcquire(); void StoreRelaxed(hsa_signal_value_t value); void StoreRelease(hsa_signal_value_t value); hsa_signal_value_t WaitRelaxed(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint); hsa_signal_value_t WaitAcquire(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint); void AndRelaxed(hsa_signal_value_t value); void AndAcquire(hsa_signal_value_t value); void AndRelease(hsa_signal_value_t value); void AndAcqRel(hsa_signal_value_t value); void OrRelaxed(hsa_signal_value_t value); void OrAcquire(hsa_signal_value_t value); void OrRelease(hsa_signal_value_t value); void OrAcqRel(hsa_signal_value_t value); void XorRelaxed(hsa_signal_value_t value); void XorAcquire(hsa_signal_value_t value); void XorRelease(hsa_signal_value_t value); void XorAcqRel(hsa_signal_value_t value); void AddRelaxed(hsa_signal_value_t value); void AddAcquire(hsa_signal_value_t value); void AddRelease(hsa_signal_value_t value); void AddAcqRel(hsa_signal_value_t value); void SubRelaxed(hsa_signal_value_t value); void SubAcquire(hsa_signal_value_t value); void SubRelease(hsa_signal_value_t value); void SubAcqRel(hsa_signal_value_t value); hsa_signal_value_t ExchRelaxed(hsa_signal_value_t value); hsa_signal_value_t ExchAcquire(hsa_signal_value_t value); hsa_signal_value_t ExchRelease(hsa_signal_value_t value); hsa_signal_value_t ExchAcqRel(hsa_signal_value_t value); hsa_signal_value_t CasRelaxed(hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t CasAcquire(hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t CasRelease(hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t CasAcqRel(hsa_signal_value_t expected, hsa_signal_value_t value); /// @brief see the base class Signal __forceinline hsa_signal_value_t* ValueLocation() const { return (hsa_signal_value_t*)&signal_.value; } /// @brief see the base class Signal __forceinline HsaEvent* EopEvent() { return NULL; } protected: bool _IsA(rtti_t id) const { return id == &rtti_id_; } private: static int rtti_id_; DISALLOW_COPY_AND_ASSIGN(BusyWaitSignal); }; /// @brief Simple memory only signal using a new ABI block. class DefaultSignal : private LocalSignal, public BusyWaitSignal { public: /// @brief Determines if a Signal* can be safely converted to BusyWaitSignal* /// via static_cast. static __forceinline bool IsType(Signal* ptr) { return ptr->IsType(&rtti_id_); } /// @brief See base class Signal. explicit DefaultSignal(hsa_signal_value_t initial_value, bool enableIPC = false) : LocalSignal(initial_value, enableIPC), BusyWaitSignal(signal(), enableIPC) {} protected: bool _IsA(rtti_t id) const { if (id == &rtti_id_) return true; return BusyWaitSignal::_IsA(id); } private: static int rtti_id_; DISALLOW_COPY_AND_ASSIGN(DefaultSignal); }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/exceptions.h000066400000000000000000000073251420110115200214540ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_EXCEPTIONS_H #define HSA_RUNTIME_CORE_INC_EXCEPTIONS_H #include #include #include "core/inc/hsa_internal.h" namespace rocr { namespace AMD { /// @brief Exception type which carries an error code to return to the user. class hsa_exception : public std::exception { public: hsa_exception(hsa_status_t error, const char* description) : err_(error), desc_(description) {} hsa_status_t error_code() const noexcept { return err_; } const char* what() const noexcept override { return desc_.c_str(); } private: hsa_status_t err_; std::string desc_; }; /// @brief Holds and invokes callbacks, capturing any execptions and forwarding those to the user /// after unwinding the runtime stack. template class callback_t; template class callback_t { public: typedef R (*func_t)(Args...); callback_t() : function(nullptr) {} // Should not be marked explicit. callback_t(func_t function_ptr) : function(function_ptr) {} callback_t& operator=(func_t function_ptr) { function = function_ptr; return *this; } bool operator==(func_t function_ptr) { return function == function_ptr; } bool operator!=(func_t function_ptr) { return function != function_ptr; } // Allows common function pointer idioms, such as if( func != nullptr )... // without allowing silent reversion to the original function pointer type. operator void*() { return reinterpret_cast(function); } R operator()(Args... args) { try { return function(args...); } catch (...) { throw std::nested_exception(); return R(); } } private: func_t function; }; } // namespace amd } // namespace rocr #endif // HSA_RUNTIME_CORE_INC_EXCEPTIONS_H ROCR-Runtime-rocm-5.0.0/src/core/inc/host_queue.h000066400000000000000000000150131420110115200214450ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_HOST_QUEUE_H_ #define HSA_RUNTIME_CORE_INC_HOST_QUEUE_H_ #include "core/inc/memory_region.h" #include "core/inc/queue.h" #include "core/inc/runtime.h" #include "core/inc/signal.h" namespace rocr { namespace core { class HostQueue : public Queue { public: static __forceinline bool IsType(core::Queue* queue) { return queue->IsType(&rtti_id_); } HostQueue(hsa_region_t region, uint32_t ring_size, hsa_queue_type32_t type, uint32_t features, hsa_signal_t doorbell_signal); ~HostQueue(); hsa_status_t Inactivate() override { return HSA_STATUS_SUCCESS; } hsa_status_t SetPriority(HSA_QUEUE_PRIORITY priority) override { return HSA_STATUS_ERROR_INVALID_QUEUE; } uint64_t LoadReadIndexAcquire() override { return atomic::Load(&amd_queue_.read_dispatch_id, std::memory_order_acquire); } uint64_t LoadReadIndexRelaxed() override { return atomic::Load(&amd_queue_.read_dispatch_id, std::memory_order_relaxed); } uint64_t LoadWriteIndexAcquire() override { return atomic::Load(&amd_queue_.write_dispatch_id, std::memory_order_acquire); } uint64_t LoadWriteIndexRelaxed() override { return atomic::Load(&amd_queue_.write_dispatch_id, std::memory_order_relaxed); } void StoreReadIndexRelaxed(uint64_t value) override { atomic::Store(&amd_queue_.read_dispatch_id, value, std::memory_order_relaxed); } void StoreReadIndexRelease(uint64_t value) override { atomic::Store(&amd_queue_.read_dispatch_id, value, std::memory_order_release); } void StoreWriteIndexRelaxed(uint64_t value) override { atomic::Store(&amd_queue_.write_dispatch_id, value, std::memory_order_relaxed); } void StoreWriteIndexRelease(uint64_t value) override { atomic::Store(&amd_queue_.write_dispatch_id, value, std::memory_order_release); } uint64_t CasWriteIndexAcqRel(uint64_t expected, uint64_t value) override { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_acq_rel); } uint64_t CasWriteIndexAcquire(uint64_t expected, uint64_t value) override { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_acquire); } uint64_t CasWriteIndexRelaxed(uint64_t expected, uint64_t value) override { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_relaxed); } uint64_t CasWriteIndexRelease(uint64_t expected, uint64_t value) override { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_release); } uint64_t AddWriteIndexAcqRel(uint64_t value) override { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_acq_rel); } uint64_t AddWriteIndexAcquire(uint64_t value) override { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_acquire); } uint64_t AddWriteIndexRelaxed(uint64_t value) override { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_relaxed); } uint64_t AddWriteIndexRelease(uint64_t value) override { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_release); } hsa_status_t SetCUMasking(uint32_t num_cu_mask_count, const uint32_t* cu_mask) override { return HSA_STATUS_ERROR_INVALID_QUEUE; } hsa_status_t GetCUMasking(uint32_t num_cu_mask_count, uint32_t* cu_mask) override { return HSA_STATUS_ERROR_INVALID_QUEUE; } void ExecutePM4(uint32_t* cmd_data, size_t cmd_size_b) override { assert(false && "HostQueue::ExecutePM4 is unimplemented"); } void* operator new(size_t size) { return _aligned_malloc(size, HSA_QUEUE_ALIGN_BYTES); } void* operator new(size_t size, void* ptr) { return ptr; } void operator delete(void* ptr) { _aligned_free(ptr); } void operator delete(void*, void*) {} protected: bool _IsA(Queue::rtti_t id) const override { return id == &rtti_id_; } private: static int rtti_id_; static const size_t kRingAlignment = 256; const uint32_t size_; void* ring_; // Host queue id counter, starting from 0x80000000 to avoid overlaping // with aql queue id. static std::atomic queue_count_; DISALLOW_COPY_AND_ASSIGN(HostQueue); }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/hsa_api_trace_int.h000066400000000000000000000056351420110115200227310ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_HSA_API_TRACE_INT_H #define HSA_RUNTIME_CORE_INC_HSA_API_TRACE_INT_H #include "inc/hsa_api_trace.h" #include "core/inc/hsa_internal.h" namespace rocr { namespace core { struct HsaApiTable { static const uint32_t HSA_EXT_FINALIZER_API_TABLE_ID = 0; static const uint32_t HSA_EXT_IMAGE_API_TABLE_ID = 1; static const uint32_t HSA_EXT_AQLPROFILE_API_TABLE_ID = 2; ::HsaApiTable hsa_api; ::CoreApiTable core_api; ::AmdExtTable amd_ext_api; ::FinalizerExtTable finalizer_api; ::ImageExtTable image_api; HsaApiTable(); void Init(); void UpdateCore(); void UpdateAmdExts(); void CloneExts(void* ptr, uint32_t table_id); void LinkExts(void* ptr, uint32_t table_id); void Reset(); }; extern HsaApiTable hsa_api_table_; extern HsaApiTable hsa_internal_api_table_; void LoadInitialHsaApiTable(); } // namespace core } // namespace rocr #endif ROCR-Runtime-rocm-5.0.0/src/core/inc/hsa_ext_amd_impl.h000066400000000000000000000264141420110115200225700ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA AMD extension. #ifndef HSA_RUNTIME_CORE_INC_EXT_AMD_H_ #define HSA_RUNTIME_CORE_INC_EXT_AMD_H_ #include "inc/hsa.h" #include "inc/hsa_ext_image.h" #include "inc/hsa_ext_amd.h" // Wrap internal implementation inside AMD namespace namespace rocr { namespace AMD { // Mirrors Amd Extension Apis hsa_status_t hsa_amd_coherency_get_type(hsa_agent_t agent, hsa_amd_coherency_type_t* type); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_coherency_set_type(hsa_agent_t agent, hsa_amd_coherency_type_t type); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_profiling_set_profiler_enabled(hsa_queue_t* queue, int enable); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_profiling_async_copy_enable(bool enable); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_profiling_get_dispatch_time( hsa_agent_t agent, hsa_signal_t signal, hsa_amd_profiling_dispatch_time_t* time); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_profiling_get_async_copy_time( hsa_signal_t signal, hsa_amd_profiling_async_copy_time_t* time); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_profiling_convert_tick_to_system_domain(hsa_agent_t agent, uint64_t agent_tick, uint64_t* system_tick); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_signal_async_handler(hsa_signal_t signal, hsa_signal_condition_t cond, hsa_signal_value_t value, hsa_amd_signal_handler handler, void* arg); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_async_function(void (*callback)(void* arg), void* arg); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_signal_create(hsa_signal_value_t initial_value, uint32_t num_consumers, const hsa_agent_t* consumers, uint64_t attributes, hsa_signal_t* signal); // Mirrors Amd Extension Apis uint32_t hsa_amd_signal_wait_any(uint32_t signal_count, hsa_signal_t* signals, hsa_signal_condition_t* conds, hsa_signal_value_t* values, uint64_t timeout_hint, hsa_wait_state_t wait_hint, hsa_signal_value_t* satisfying_value); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_queue_cu_set_mask(const hsa_queue_t* queue, uint32_t num_cu_mask_count, const uint32_t* cu_mask); // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_queue_cu_get_mask(const hsa_queue_t* queue, uint32_t num_cu_mask_count, uint32_t* cu_mask); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_pool_get_info(hsa_amd_memory_pool_t memory_pool, hsa_amd_memory_pool_info_t attribute, void* value); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_agent_iterate_memory_pools( hsa_agent_t agent, hsa_status_t (*callback)(hsa_amd_memory_pool_t memory_pool, void* data), void* data); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_pool_allocate(hsa_amd_memory_pool_t memory_pool, size_t size, uint32_t flags, void** ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_pool_free(void* ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_async_copy(void* dst, hsa_agent_t dst_agent, const void* src, hsa_agent_t src_agent, size_t size, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_async_copy_rect( const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, hsa_agent_t copy_agent, hsa_amd_copy_direction_t dir, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_agent_memory_pool_get_info( hsa_agent_t agent, hsa_amd_memory_pool_t memory_pool, hsa_amd_agent_memory_pool_info_t attribute, void* value); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_agents_allow_access(uint32_t num_agents, const hsa_agent_t* agents, const uint32_t* flags, const void* ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_pool_can_migrate(hsa_amd_memory_pool_t src_memory_pool, hsa_amd_memory_pool_t dst_memory_pool, bool* result); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_migrate(const void* ptr, hsa_amd_memory_pool_t memory_pool, uint32_t flags); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_lock(void* host_ptr, size_t size, hsa_agent_t* agents, int num_agent, void** agent_ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_lock_to_pool(void* host_ptr, size_t size, hsa_agent_t* agents, int num_agent, hsa_amd_memory_pool_t pool, uint32_t flags, void** agent_ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_unlock(void* host_ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_memory_fill(void* ptr, uint32_t value, size_t count); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_interop_map_buffer(uint32_t num_agents, hsa_agent_t* agents, int interop_handle, uint32_t flags, size_t* size, void** ptr, size_t* metadata_size, const void** metadata); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_interop_unmap_buffer(void* ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_pointer_info(const void* ptr, hsa_amd_pointer_info_t* info, void* (*alloc)(size_t), uint32_t* num_agents_accessible, hsa_agent_t** accessible); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_pointer_info_set_userdata(const void* ptr, void* userdata); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_ipc_memory_create(void* ptr, size_t len, hsa_amd_ipc_memory_t* handle); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_ipc_memory_attach(const hsa_amd_ipc_memory_t* handle, size_t len, uint32_t num_agents, const hsa_agent_t* mapping_agents, void** mapped_ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_ipc_memory_detach(void* mapped_ptr); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_ipc_signal_create(hsa_signal_t signal, hsa_amd_ipc_signal_t* handle); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_ipc_signal_attach(const hsa_amd_ipc_signal_t* handle, hsa_signal_t* signal); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_register_system_event_handler(hsa_amd_system_event_callback_t callback, void* data); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_queue_set_priority(hsa_queue_t* queue, hsa_amd_queue_priority_t priority); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_register_deallocation_callback( void* ptr, hsa_amd_deallocation_callback_t callback, void* user_data); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_deregister_deallocation_callback( void* ptr, hsa_amd_deallocation_callback_t callback); // Mirrors Amd Extension Apis hsa_status_t hsa_amd_signal_value_pointer(hsa_signal_t signal, volatile hsa_signal_value_t** value_ptr); // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_svm_attributes_set(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count); // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_svm_attributes_get(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count); // Mirrors Amd Extension Apis hsa_status_t HSA_API hsa_amd_svm_prefetch_async(void* ptr, size_t size, hsa_agent_t agent, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal); } // namespace amd } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/hsa_ext_interface.h000066400000000000000000000067111420110115200227440ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTME_CORE_INC_AMD_EXT_INTERFACE_H_ #define HSA_RUNTME_CORE_INC_AMD_EXT_INTERFACE_H_ #include #include #include "core/inc/hsa_api_trace_int.h" #include "core/util/os.h" #include "core/util/utils.h" namespace rocr { namespace core { struct ImageExtTableInternal : public ImageExtTable { decltype(::hsa_amd_image_get_info_max_dim)* hsa_amd_image_get_info_max_dim_fn; }; class ExtensionEntryPoints { public: // Table of function pointers for Hsa Extension Image ImageExtTableInternal image_api; // Table of function pointers for Hsa Extension Finalizer FinalizerExtTable finalizer_api; ExtensionEntryPoints(); bool LoadFinalizer(std::string library_name); void Unload(); // Update Image Api table with handles to implementation bool LoadImage(); // Reset Api tables to point to null implementations void UnloadImage(); private: typedef void (*Load_t)(const ::HsaApiTable* table); typedef void (*Unload_t)(); std::vector libs_; // Initialize table for HSA Finalizer Extension Api's void InitFinalizerExtTable(); // Initialize table for HSA Image Extension Api's void InitImageExtTable(); // Initialize Amd Ext table for Api related to Images void InitAmdExtTable(); // Update Amd Ext table for Api related to Images void UpdateAmdExtTable(decltype(::hsa_amd_image_create)* func_ptr); DISALLOW_COPY_AND_ASSIGN(ExtensionEntryPoints); }; } // namespace core } // namespace rocr #endif ROCR-Runtime-rocm-5.0.0/src/core/inc/hsa_internal.h000066400000000000000000000502441420110115200217400ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_HSA_INTERNAL_H #define HSA_RUNTIME_CORE_INC_HSA_INTERNAL_H #include "inc/hsa.h" namespace rocr { namespace HSA { // Define core namespace interfaces - copy of function declarations in hsa.h hsa_status_t hsa_init(); hsa_status_t hsa_shut_down(); hsa_status_t hsa_system_get_info(hsa_system_info_t attribute, void *value); hsa_status_t hsa_extension_get_name(uint16_t extension, const char** name); hsa_status_t hsa_system_extension_supported(uint16_t extension, uint16_t version_major, uint16_t version_minor, bool* result); hsa_status_t hsa_system_major_extension_supported(uint16_t extension, uint16_t version_major, uint16_t* version_minor, bool* result); hsa_status_t hsa_system_get_extension_table(uint16_t extension, uint16_t version_major, uint16_t version_minor, void *table); hsa_status_t hsa_system_get_major_extension_table(uint16_t extension, uint16_t version_major, size_t table_length, void* table); hsa_status_t hsa_iterate_agents(hsa_status_t (*callback)(hsa_agent_t agent, void *data), void *data); hsa_status_t hsa_agent_get_info(hsa_agent_t agent, hsa_agent_info_t attribute, void *value); hsa_status_t hsa_agent_get_exception_policies(hsa_agent_t agent, hsa_profile_t profile, uint16_t *mask); hsa_status_t hsa_cache_get_info(hsa_cache_t cache, hsa_cache_info_t attribute, void* value); hsa_status_t hsa_agent_iterate_caches( hsa_agent_t agent, hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* value); hsa_status_t hsa_agent_extension_supported(uint16_t extension, hsa_agent_t agent, uint16_t version_major, uint16_t version_minor, bool *result); hsa_status_t hsa_agent_major_extension_supported(uint16_t extension, hsa_agent_t agent, uint16_t version_major, uint16_t* version_minor, bool* result); hsa_status_t hsa_queue_create(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type, void (*callback)(hsa_status_t status, hsa_queue_t *source, void *data), void *data, uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t **queue); hsa_status_t hsa_soft_queue_create(hsa_region_t region, uint32_t size, hsa_queue_type32_t type, uint32_t features, hsa_signal_t completion_signal, hsa_queue_t **queue); hsa_status_t hsa_queue_destroy(hsa_queue_t *queue); hsa_status_t hsa_queue_inactivate(hsa_queue_t *queue); uint64_t hsa_queue_load_read_index_scacquire(const hsa_queue_t* queue); uint64_t hsa_queue_load_read_index_relaxed(const hsa_queue_t *queue); uint64_t hsa_queue_load_write_index_scacquire(const hsa_queue_t* queue); uint64_t hsa_queue_load_write_index_relaxed(const hsa_queue_t *queue); void hsa_queue_store_write_index_relaxed(const hsa_queue_t *queue, uint64_t value); void hsa_queue_store_write_index_screlease(const hsa_queue_t* queue, uint64_t value); uint64_t hsa_queue_cas_write_index_scacq_screl(const hsa_queue_t* queue, uint64_t expected, uint64_t value); uint64_t hsa_queue_cas_write_index_scacquire(const hsa_queue_t* queue, uint64_t expected, uint64_t value); uint64_t hsa_queue_cas_write_index_relaxed(const hsa_queue_t *queue, uint64_t expected, uint64_t value); uint64_t hsa_queue_cas_write_index_screlease(const hsa_queue_t* queue, uint64_t expected, uint64_t value); uint64_t hsa_queue_add_write_index_scacq_screl(const hsa_queue_t* queue, uint64_t value); uint64_t hsa_queue_add_write_index_scacquire(const hsa_queue_t* queue, uint64_t value); uint64_t hsa_queue_add_write_index_relaxed(const hsa_queue_t *queue, uint64_t value); uint64_t hsa_queue_add_write_index_screlease(const hsa_queue_t* queue, uint64_t value); void hsa_queue_store_read_index_relaxed(const hsa_queue_t *queue, uint64_t value); void hsa_queue_store_read_index_screlease(const hsa_queue_t* queue, uint64_t value); hsa_status_t hsa_agent_iterate_regions( hsa_agent_t agent, hsa_status_t (*callback)(hsa_region_t region, void *data), void *data); hsa_status_t hsa_region_get_info(hsa_region_t region, hsa_region_info_t attribute, void *value); hsa_status_t hsa_memory_register(void *address, size_t size); hsa_status_t hsa_memory_deregister(void *address, size_t size); hsa_status_t hsa_memory_allocate(hsa_region_t region, size_t size, void **ptr); hsa_status_t hsa_memory_free(void *ptr); hsa_status_t hsa_memory_copy(void *dst, const void *src, size_t size); hsa_status_t hsa_memory_assign_agent(void *ptr, hsa_agent_t agent, hsa_access_permission_t access); hsa_status_t hsa_signal_create(hsa_signal_value_t initial_value, uint32_t num_consumers, const hsa_agent_t *consumers, hsa_signal_t *signal); hsa_status_t hsa_signal_destroy(hsa_signal_t signal); hsa_signal_value_t hsa_signal_load_relaxed(hsa_signal_t signal); hsa_signal_value_t hsa_signal_load_scacquire(hsa_signal_t signal); void hsa_signal_store_relaxed(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_store_screlease(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_silent_store_relaxed(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_silent_store_screlease(hsa_signal_t signal, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_wait_relaxed(hsa_signal_t signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_expectancy_hint); hsa_signal_value_t hsa_signal_wait_scacquire(hsa_signal_t signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_expectancy_hint); hsa_status_t hsa_signal_group_create(uint32_t num_signals, const hsa_signal_t* signals, uint32_t num_consumers, const hsa_agent_t* consumers, hsa_signal_group_t* signal_group); hsa_status_t hsa_signal_group_destroy(hsa_signal_group_t signal_group); hsa_status_t hsa_signal_group_wait_any_scacquire(hsa_signal_group_t signal_group, const hsa_signal_condition_t* conditions, const hsa_signal_value_t* compare_values, hsa_wait_state_t wait_state_hint, hsa_signal_t* signal, hsa_signal_value_t* value); hsa_status_t hsa_signal_group_wait_any_relaxed(hsa_signal_group_t signal_group, const hsa_signal_condition_t* conditions, const hsa_signal_value_t* compare_values, hsa_wait_state_t wait_state_hint, hsa_signal_t* signal, hsa_signal_value_t* value); void hsa_signal_and_relaxed(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_and_scacquire(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_and_screlease(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_and_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_or_relaxed(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_or_scacquire(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_or_screlease(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_or_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_xor_relaxed(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_xor_scacquire(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_xor_screlease(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_xor_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_add_relaxed(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_add_scacquire(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_add_screlease(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_add_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_subtract_relaxed(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_subtract_scacquire(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_subtract_screlease(hsa_signal_t signal, hsa_signal_value_t value); void hsa_signal_subtract_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_exchange_relaxed(hsa_signal_t signal, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_exchange_scacquire(hsa_signal_t signal, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_exchange_screlease(hsa_signal_t signal, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_exchange_scacq_screl(hsa_signal_t signal, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_cas_relaxed(hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_cas_scacquire(hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_cas_screlease(hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t hsa_signal_cas_scacq_screl(hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); //===--- Instruction Set Architecture -----------------------------------===// hsa_status_t hsa_isa_from_name( const char *name, hsa_isa_t *isa); hsa_status_t hsa_agent_iterate_isas( hsa_agent_t agent, hsa_status_t (*callback)(hsa_isa_t isa, void *data), void *data); /* deprecated */ hsa_status_t hsa_isa_get_info( hsa_isa_t isa, hsa_isa_info_t attribute, uint32_t index, void *value); hsa_status_t hsa_isa_get_info_alt( hsa_isa_t isa, hsa_isa_info_t attribute, void *value); hsa_status_t hsa_isa_get_exception_policies( hsa_isa_t isa, hsa_profile_t profile, uint16_t *mask); hsa_status_t hsa_isa_get_round_method( hsa_isa_t isa, hsa_fp_type_t fp_type, hsa_flush_mode_t flush_mode, hsa_round_method_t *round_method); hsa_status_t hsa_wavefront_get_info( hsa_wavefront_t wavefront, hsa_wavefront_info_t attribute, void *value); hsa_status_t hsa_isa_iterate_wavefronts( hsa_isa_t isa, hsa_status_t (*callback)(hsa_wavefront_t wavefront, void *data), void *data); /* deprecated */ hsa_status_t hsa_isa_compatible( hsa_isa_t code_object_isa, hsa_isa_t agent_isa, bool *result); //===--- Code Objects (deprecated) --------------------------------------===// /* deprecated */ hsa_status_t hsa_code_object_serialize( hsa_code_object_t code_object, hsa_status_t (*alloc_callback)(size_t size, hsa_callback_data_t data, void **address), hsa_callback_data_t callback_data, const char *options, void **serialized_code_object, size_t *serialized_code_object_size); /* deprecated */ hsa_status_t hsa_code_object_deserialize( void *serialized_code_object, size_t serialized_code_object_size, const char *options, hsa_code_object_t *code_object); /* deprecated */ hsa_status_t hsa_code_object_destroy( hsa_code_object_t code_object); /* deprecated */ hsa_status_t hsa_code_object_get_info( hsa_code_object_t code_object, hsa_code_object_info_t attribute, void *value); /* deprecated */ hsa_status_t hsa_code_object_get_symbol( hsa_code_object_t code_object, const char *symbol_name, hsa_code_symbol_t *symbol); /* deprecated */ hsa_status_t hsa_code_object_get_symbol_from_name( hsa_code_object_t code_object, const char *module_name, const char *symbol_name, hsa_code_symbol_t *symbol); /* deprecated */ hsa_status_t hsa_code_symbol_get_info( hsa_code_symbol_t code_symbol, hsa_code_symbol_info_t attribute, void *value); /* deprecated */ hsa_status_t hsa_code_object_iterate_symbols( hsa_code_object_t code_object, hsa_status_t (*callback)(hsa_code_object_t code_object, hsa_code_symbol_t symbol, void *data), void *data); //===--- Executable -----------------------------------------------------===// hsa_status_t hsa_code_object_reader_create_from_file( hsa_file_t file, hsa_code_object_reader_t *code_object_reader); hsa_status_t hsa_code_object_reader_create_from_memory( const void *code_object, size_t size, hsa_code_object_reader_t *code_object_reader); hsa_status_t hsa_code_object_reader_destroy( hsa_code_object_reader_t code_object_reader); /* deprecated */ hsa_status_t hsa_executable_create( hsa_profile_t profile, hsa_executable_state_t executable_state, const char *options, hsa_executable_t *executable); hsa_status_t hsa_executable_create_alt( hsa_profile_t profile, hsa_default_float_rounding_mode_t default_float_rounding_mode, const char *options, hsa_executable_t *executable); hsa_status_t hsa_executable_destroy( hsa_executable_t executable); /* deprecated */ hsa_status_t hsa_executable_load_code_object( hsa_executable_t executable, hsa_agent_t agent, hsa_code_object_t code_object, const char *options); hsa_status_t hsa_executable_load_program_code_object( hsa_executable_t executable, hsa_code_object_reader_t code_object_reader, const char *options, hsa_loaded_code_object_t *loaded_code_object); hsa_status_t hsa_executable_load_agent_code_object( hsa_executable_t executable, hsa_agent_t agent, hsa_code_object_reader_t code_object_reader, const char *options, hsa_loaded_code_object_t *loaded_code_object); hsa_status_t hsa_executable_freeze( hsa_executable_t executable, const char *options); hsa_status_t hsa_executable_get_info( hsa_executable_t executable, hsa_executable_info_t attribute, void *value); hsa_status_t hsa_executable_global_variable_define( hsa_executable_t executable, const char *variable_name, void *address); hsa_status_t hsa_executable_agent_global_variable_define( hsa_executable_t executable, hsa_agent_t agent, const char *variable_name, void *address); hsa_status_t hsa_executable_readonly_variable_define( hsa_executable_t executable, hsa_agent_t agent, const char *variable_name, void *address); hsa_status_t hsa_executable_validate( hsa_executable_t executable, uint32_t *result); hsa_status_t hsa_executable_validate_alt( hsa_executable_t executable, const char *options, uint32_t *result); /* deprecated */ hsa_status_t hsa_executable_get_symbol( hsa_executable_t executable, const char *module_name, const char *symbol_name, hsa_agent_t agent, int32_t call_convention, hsa_executable_symbol_t *symbol); hsa_status_t hsa_executable_get_symbol_by_name( hsa_executable_t executable, const char *symbol_name, const hsa_agent_t *agent, hsa_executable_symbol_t *symbol); hsa_status_t hsa_executable_symbol_get_info( hsa_executable_symbol_t executable_symbol, hsa_executable_symbol_info_t attribute, void *value); /* deprecated */ hsa_status_t hsa_executable_iterate_symbols( hsa_executable_t executable, hsa_status_t (*callback)(hsa_executable_t executable, hsa_executable_symbol_t symbol, void *data), void *data); hsa_status_t hsa_executable_iterate_agent_symbols( hsa_executable_t executable, hsa_agent_t agent, hsa_status_t (*callback)(hsa_executable_t exec, hsa_agent_t agent, hsa_executable_symbol_t symbol, void *data), void *data); hsa_status_t hsa_executable_iterate_program_symbols( hsa_executable_t executable, hsa_status_t (*callback)(hsa_executable_t exec, hsa_executable_symbol_t symbol, void *data), void *data); //===--- Runtime Notifications ------------------------------------------===// hsa_status_t hsa_status_string( hsa_status_t status, const char **status_string); } // namespace HSA } // namespace rocr #ifdef BUILDING_HSA_CORE_RUNTIME //This using declaration is deliberate! //We want unqualified name resolution to fail when building the runtime. This is a guard against accidental use of the intercept layer in the runtime. //using namespace rocr::HSA; #endif #endif ROCR-Runtime-rocm-5.0.0/src/core/inc/hsa_table_interface.h000066400000000000000000000041751420110115200232350ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "inc/hsa_api_trace.h" void hsa_table_interface_init(const HsaApiTable* apiTable); const HsaApiTable* hsa_table_interface_get_table(); ROCR-Runtime-rocm-5.0.0/src/core/inc/hsa_ven_amd_loader_impl.h000066400000000000000000000065511420110115200241060ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2020-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTME_CORE_INC_HSA_VEN_AMD_LOADER_IMPL_H_ #define HSA_RUNTME_CORE_INC_HSA_VEN_AMD_LOADER_IMPL_H_ #include "inc/hsa_ven_amd_loader.h" namespace rocr { hsa_status_t hsa_ven_amd_loader_query_host_address( const void *device_address, const void **host_address); hsa_status_t hsa_ven_amd_loader_query_segment_descriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors); hsa_status_t hsa_ven_amd_loader_query_executable( const void *device_address, hsa_executable_t *executable); hsa_status_t hsa_ven_amd_loader_executable_iterate_loaded_code_objects( hsa_executable_t executable, hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data); hsa_status_t hsa_ven_amd_loader_loaded_code_object_get_info( hsa_loaded_code_object_t loaded_code_object, hsa_ven_amd_loader_loaded_code_object_info_t attribute, void *value); hsa_status_t hsa_ven_amd_loader_code_object_reader_create_from_file_with_offset_size( hsa_file_t file, size_t offset, size_t size, hsa_code_object_reader_t *code_object_reader); hsa_status_t hsa_ven_amd_loader_iterate_executables( hsa_status_t (*callback)( hsa_executable_t executable, void *data), void *data); } // namespace rocr #endif ROCR-Runtime-rocm-5.0.0/src/core/inc/intercept_queue.h000066400000000000000000000253501420110115200224720ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_INTERCEPT_QUEUE_H_ #define HSA_RUNTIME_CORE_INC_INTERCEPT_QUEUE_H_ #include #include #include #include "core/inc/runtime.h" #include "core/inc/queue.h" #include "core/inc/signal.h" #include "core/inc/interrupt_signal.h" #include "core/inc/exceptions.h" #include "core/util/locks.h" namespace rocr { namespace core { // @brief Generic container to forward Queue interfaces into Queue* member. // Class only has utility as a base type customized Queue wrappers. class QueueWrapper : public Queue { public: std::unique_ptr wrapped; explicit QueueWrapper(std::unique_ptr queue) : Queue(), wrapped(std::move(queue)) { memcpy(&amd_queue_, &wrapped->amd_queue_, sizeof(amd_queue_t)); wrapped->set_public_handle(wrapped.get(), public_handle_); } ~QueueWrapper() {} hsa_status_t Inactivate() override { return wrapped->Inactivate(); } hsa_status_t SetPriority(HSA_QUEUE_PRIORITY priority) override { return wrapped->SetPriority(priority); } uint64_t LoadReadIndexAcquire() override { return wrapped->LoadReadIndexAcquire(); } uint64_t LoadReadIndexRelaxed() override { return wrapped->LoadReadIndexRelaxed(); } uint64_t LoadWriteIndexRelaxed() override { return wrapped->LoadWriteIndexRelaxed(); } uint64_t LoadWriteIndexAcquire() override { return wrapped->LoadWriteIndexAcquire(); } void StoreReadIndexRelaxed(uint64_t value) override { return wrapped->StoreReadIndexRelaxed(value); } void StoreReadIndexRelease(uint64_t value) override { return wrapped->StoreReadIndexRelease(value); } void StoreWriteIndexRelaxed(uint64_t value) override { return wrapped->StoreWriteIndexRelaxed(value); } void StoreWriteIndexRelease(uint64_t value) override { return wrapped->StoreWriteIndexRelease(value); } uint64_t CasWriteIndexAcqRel(uint64_t expected, uint64_t value) override { return wrapped->CasWriteIndexAcqRel(expected, value); } uint64_t CasWriteIndexAcquire(uint64_t expected, uint64_t value) override { return wrapped->CasWriteIndexAcquire(expected, value); } uint64_t CasWriteIndexRelaxed(uint64_t expected, uint64_t value) override { return wrapped->CasWriteIndexRelaxed(expected, value); } uint64_t CasWriteIndexRelease(uint64_t expected, uint64_t value) override { return wrapped->CasWriteIndexRelease(expected, value); } uint64_t AddWriteIndexAcqRel(uint64_t value) override { return wrapped->AddWriteIndexAcqRel(value); } uint64_t AddWriteIndexAcquire(uint64_t value) override { return wrapped->AddWriteIndexAcquire(value); } uint64_t AddWriteIndexRelaxed(uint64_t value) override { return wrapped->AddWriteIndexRelaxed(value); } uint64_t AddWriteIndexRelease(uint64_t value) override { return wrapped->AddWriteIndexRelease(value); } hsa_status_t SetCUMasking(uint32_t num_cu_mask_count, const uint32_t* cu_mask) override { return wrapped->SetCUMasking(num_cu_mask_count, cu_mask); } hsa_status_t GetCUMasking(uint32_t num_cu_mask_count, uint32_t* cu_mask) override { return wrapped->GetCUMasking(num_cu_mask_count, cu_mask); } void ExecutePM4(uint32_t* cmd_data, size_t cmd_size_b) override { wrapped->ExecutePM4(cmd_data, cmd_size_b); } void SetProfiling(bool enabled) override { wrapped->SetProfiling(enabled); } protected: void do_set_public_handle(hsa_queue_t* handle) override { public_handle_ = handle; wrapped->set_public_handle(wrapped.get(), handle); } }; // @brief Generic container for a proxy queue. // Presents an proxy packet buffer and doorbell signal for an underlying Queue. Write index // operations act on the proxy buffer while all other operations pass through to the underlying // queue. class QueueProxy : public QueueWrapper { public: explicit QueueProxy(std::unique_ptr queue) : QueueWrapper(std::move(queue)) {} uint64_t LoadReadIndexAcquire() override { return atomic::Load(&amd_queue_.read_dispatch_id, std::memory_order_acquire); } uint64_t LoadReadIndexRelaxed() override { return atomic::Load(&amd_queue_.read_dispatch_id, std::memory_order_relaxed); } void StoreReadIndexRelaxed(uint64_t value) override { assert(false); } void StoreReadIndexRelease(uint64_t value) override { assert(false); } uint64_t LoadWriteIndexRelaxed() override { return atomic::Load(&amd_queue_.write_dispatch_id, std::memory_order_relaxed); } uint64_t LoadWriteIndexAcquire() override { return atomic::Load(&amd_queue_.write_dispatch_id, std::memory_order_acquire); } void StoreWriteIndexRelaxed(uint64_t value) override { atomic::Store(&amd_queue_.write_dispatch_id, value, std::memory_order_relaxed); } void StoreWriteIndexRelease(uint64_t value) override { atomic::Store(&amd_queue_.write_dispatch_id, value, std::memory_order_release); } uint64_t CasWriteIndexAcqRel(uint64_t expected, uint64_t value) override { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_acq_rel); } uint64_t CasWriteIndexAcquire(uint64_t expected, uint64_t value) override { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_acquire); } uint64_t CasWriteIndexRelaxed(uint64_t expected, uint64_t value) override { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_relaxed); } uint64_t CasWriteIndexRelease(uint64_t expected, uint64_t value) override { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_release); } uint64_t AddWriteIndexAcqRel(uint64_t value) override { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_acq_rel); } uint64_t AddWriteIndexAcquire(uint64_t value) override { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_acquire); } uint64_t AddWriteIndexRelaxed(uint64_t value) override { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_relaxed); } uint64_t AddWriteIndexRelease(uint64_t value) override { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_release); } }; // @brief Provides packet intercept and rewrite capability for a queue. // Host-side dispatches are processed during doorbell ring. // Device-side dispatches are processed as an asynchronous signal event. class InterceptQueue : public QueueProxy, private LocalSignal, public DoorbellSignal { public: explicit InterceptQueue(std::unique_ptr queue); ~InterceptQueue(); void AddInterceptor(hsa_amd_queue_intercept_handler interceptor, void* data) { assert(interceptor != nullptr && "Packet intercept callback was nullptr."); interceptors.push_back(std::make_pair(interceptor, data)); } hsa_status_t Inactivate() override { active_ = false; return wrapped->Inactivate(); } private: // Serialize packet interception processing. KernelMutex lock_; // Largest processed packet index. uint64_t next_packet_; // Post interception packet overflow buffer std::vector overflow_; // Index at which async intercept processing was scheduled. uint64_t retry_index_; // Event signal to use for async packet processing and control flag. InterruptSignal* async_doorbell_; std::atomic quit_; // Indicates queue active/inactive state. std::atomic active_; // Proxy packet buffer SharedArray buffer_; // Packet transform callbacks std::vector, void*>> interceptors; static const hsa_signal_value_t DOORBELL_MAX = 0xFFFFFFFFFFFFFFFFull; static bool HandleAsyncDoorbell(hsa_signal_value_t value, void* arg); static void PacketWriter(const void* pkts, uint64_t pkt_count); bool Submit(const AqlPacket* packets, uint64_t count); static void Submit(const void* pkts, uint64_t pkt_count, uint64_t user_pkt_index, void* data, hsa_amd_queue_intercept_packet_writer writer); /* * Remaining Queue and Signal interface definitions. */ public: /// @brief Update signal value using Relaxed semantics /// /// @param value Value of signal to update with void StoreRelaxed(hsa_signal_value_t value) override; /// @brief Update signal value using Release semantics /// /// @param value Value of signal to update with void StoreRelease(hsa_signal_value_t value) override { std::atomic_thread_fence(std::memory_order_release); StoreRelaxed(value); } static __forceinline bool IsType(core::Signal* signal) { return signal->IsType(&rtti_id_); } static __forceinline bool IsType(core::Queue* queue) { return queue->IsType(&rtti_id_); } protected: bool _IsA(Queue::rtti_t id) const override { return id == &rtti_id_; } private: static int rtti_id_; }; } // namespace core } // namespace rocr #endif // HSA_RUNTIME_CORE_INC_INTERCEPT_QUEUE_H_ ROCR-Runtime-rocm-5.0.0/src/core/inc/interrupt_signal.h000066400000000000000000000156611420110115200226660ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA runtime C++ interface file. #ifndef HSA_RUNTME_CORE_INC_INTERRUPT_SIGNAL_H_ #define HSA_RUNTME_CORE_INC_INTERRUPT_SIGNAL_H_ #include #include #include "hsakmt.h" #include "core/inc/signal.h" #include "core/util/utils.h" namespace rocr { namespace core { /// @brief A Signal implementation using interrupts versus plain memory based. /// Also see base class Signal. /// /// Breaks common/vendor separation - signals in general needs to be re-worked /// at the foundation level to make sense in a multi-device system. /// Supports only one waiter for now. /// KFD changes are needed to support multiple waiters and have device /// signaling. class InterruptSignal : private LocalSignal, public Signal { public: class EventPool { public: struct Deleter { void operator()(HsaEvent* evt) { InterruptSignal::DestroyEvent(evt); } }; using unique_event_ptr = ::std::unique_ptr; EventPool() : allEventsAllocated(false) {} HsaEvent* alloc(); void free(HsaEvent* evt); void clear() { events_.clear(); allEventsAllocated = false; } private: KernelMutex lock_; std::vector events_; bool allEventsAllocated; }; static HsaEvent* CreateEvent(HSA_EVENTTYPE type, bool manual_reset); static void DestroyEvent(HsaEvent* evt); /// @brief Determines if a Signal* can be safely converted to an /// InterruptSignal* via static_cast. static __forceinline bool IsType(Signal* ptr) { return ptr->IsType(&rtti_id_); } explicit InterruptSignal(hsa_signal_value_t initial_value, HsaEvent* use_event = NULL); ~InterruptSignal(); // Below are various methods corresponding to the APIs, which load/store the // signal value or modify the existing signal value automically and with // specified memory ordering semantics. hsa_signal_value_t LoadRelaxed(); hsa_signal_value_t LoadAcquire(); void StoreRelaxed(hsa_signal_value_t value); void StoreRelease(hsa_signal_value_t value); hsa_signal_value_t WaitRelaxed(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint); hsa_signal_value_t WaitAcquire(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint); void AndRelaxed(hsa_signal_value_t value); void AndAcquire(hsa_signal_value_t value); void AndRelease(hsa_signal_value_t value); void AndAcqRel(hsa_signal_value_t value); void OrRelaxed(hsa_signal_value_t value); void OrAcquire(hsa_signal_value_t value); void OrRelease(hsa_signal_value_t value); void OrAcqRel(hsa_signal_value_t value); void XorRelaxed(hsa_signal_value_t value); void XorAcquire(hsa_signal_value_t value); void XorRelease(hsa_signal_value_t value); void XorAcqRel(hsa_signal_value_t value); void AddRelaxed(hsa_signal_value_t value); void AddAcquire(hsa_signal_value_t value); void AddRelease(hsa_signal_value_t value); void AddAcqRel(hsa_signal_value_t value); void SubRelaxed(hsa_signal_value_t value); void SubAcquire(hsa_signal_value_t value); void SubRelease(hsa_signal_value_t value); void SubAcqRel(hsa_signal_value_t value); hsa_signal_value_t ExchRelaxed(hsa_signal_value_t value); hsa_signal_value_t ExchAcquire(hsa_signal_value_t value); hsa_signal_value_t ExchRelease(hsa_signal_value_t value); hsa_signal_value_t ExchAcqRel(hsa_signal_value_t value); hsa_signal_value_t CasRelaxed(hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t CasAcquire(hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t CasRelease(hsa_signal_value_t expected, hsa_signal_value_t value); hsa_signal_value_t CasAcqRel(hsa_signal_value_t expected, hsa_signal_value_t value); /// @brief See base class Signal. __forceinline hsa_signal_value_t* ValueLocation() const { return (hsa_signal_value_t*)&signal_.value; } /// @brief See base class Signal. __forceinline HsaEvent* EopEvent() { return event_; } protected: bool _IsA(rtti_t id) const { return id == &rtti_id_; } private: /// @variable KFD event on which the interrupt signal is based on. HsaEvent* event_; /// @variable Indicates whether the signal should release the event when it /// closes or not. bool free_event_; /// Used to obtain a globally unique value (address) for rtti. static int rtti_id_; /// @brief Notify driver of signal value change if necessary. __forceinline void SetEvent() { std::atomic_signal_fence(std::memory_order_seq_cst); if (InWaiting()) hsaKmtSetEvent(event_); } DISALLOW_COPY_AND_ASSIGN(InterruptSignal); }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/ipc_signal.h000066400000000000000000000076601420110115200214050ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTME_CORE_INC_IPC_SIGNAL_H_ #define HSA_RUNTME_CORE_INC_IPC_SIGNAL_H_ #include #include #include "core/inc/signal.h" #include "core/inc/default_signal.h" #include "core/util/locks.h" namespace rocr { namespace core { /// @brief Container for ipc shared memory. class SharedMemory { public: SharedMemory(const hsa_amd_ipc_memory_t* handle, size_t len); ~SharedMemory(); SharedMemory(SharedMemory&&); void* ptr() const { return ptr_; } private: void* ptr_; }; /// @brief Container for ipc signal abi block. class SharedMemorySignal { public: explicit SharedMemorySignal(const hsa_amd_ipc_memory_t* handle) : signal_(handle, 4096) { if (!signal()->IsValid()) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "IPC Signal handle is invalid."); } SharedSignal* signal() const { return reinterpret_cast(signal_.ptr()); } private: SharedMemory signal_; }; /// @brief Memory only signal using a shared memory ABI block. class IPCSignal : private SharedMemorySignal, public BusyWaitSignal { public: /// @brief Creates a sharable handle for an IPC enabled signal. static void CreateHandle(Signal* signal, hsa_amd_ipc_signal_t* ipc_handle); /// @brief Opens an IPC signal from its IPC handle. static Signal* Attach(const hsa_amd_ipc_signal_t* ipc_handle); /// @brief Determines if a Signal* can be safely converted to BusyWaitSignal* /// via static_cast. static __forceinline bool IsType(Signal* ptr) { return ptr->IsType(&rtti_id_); } protected: bool _IsA(rtti_t id) const { if (id == &rtti_id_) return true; return BusyWaitSignal::_IsA(id); } private: static int rtti_id_; static KernelMutex lock_; explicit IPCSignal(SharedMemorySignal&& abi_block) : SharedMemorySignal(std::move(abi_block)), BusyWaitSignal(signal(), true) {} DISALLOW_COPY_AND_ASSIGN(IPCSignal); }; } // namespace core } // namespace rocr #endif // HSA_RUNTME_CORE_INC_IPC_SIGNAL_H_ ROCR-Runtime-rocm-5.0.0/src/core/inc/isa.h000066400000000000000000000164221420110115200200450ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_ISA_H_ #define HSA_RUNTIME_CORE_ISA_H_ #include #include #include #include #include #include "core/inc/amd_hsa_code.hpp" namespace rocr { namespace core { /// @class Wavefront. /// @brief Wavefront. class Wavefront final: public amd::hsa::common::Signed<0xA02483F1AD7F101C> { public: /// @brief Default destructor. ~Wavefront() {} /// @returns Handle equivalent of @p object. static hsa_wavefront_t Handle(const Wavefront *object) { hsa_wavefront_t handle = { reinterpret_cast(object) }; return handle; } /// @returns Object equivalent of @p handle. static Wavefront *Object(const hsa_wavefront_t &handle) { Wavefront *object = amd::hsa::common::ObjectAt(handle.handle); return object; } /// @brief Query value of requested @p attribute and record it in @p value. bool GetInfo(const hsa_wavefront_info_t &attribute, void *value) const; private: /// @brief Default constructor. Wavefront() {} /// @brief Wavefront's friends. friend class Isa; }; enum class IsaFeature : uint8_t { Unsupported, Any, Disabled, Enabled, }; /// @class Isa. /// @brief Instruction Set Architecture. class Isa final: public amd::hsa::common::Signed<0xB13594F2BD8F212D> { public: /// @brief Isa's version type. typedef std::tuple Version; /// @brief Default destructor. ~Isa() {} /// @returns Handle equivalent of @p isa_object. static hsa_isa_t Handle(const Isa *isa_object) { hsa_isa_t isa_handle = { reinterpret_cast(isa_object) }; return isa_handle; } /// @returns Object equivalent of @p isa_handle. static Isa *Object(const hsa_isa_t &isa_handle) { Isa *isa_object = amd::hsa::common::ObjectAt(isa_handle.handle); return isa_object; } /// @returns True if @p code_object_isa and @p agent_isa are compatible, /// false otherwise. static bool IsCompatible(const Isa &code_object_isa, const Isa &agent_isa); /// @returns This Isa's version. const Version &GetVersion() const { return version_; } /// @returns SRAM ECC feature status. IsaFeature GetSramecc() const { return sramecc_; } /// @returns XNACK feature status. IsaFeature GetXnack() const { return xnack_; } /// @returns This Isa's supported wavefront. const Wavefront &GetWavefront() const { return wavefront_; } /// @returns True if SRAMECC feature is supported, false otherwise. bool IsSrameccSupported() const { return sramecc_ != IsaFeature::Unsupported; } /// @returns True if XNACK feature is supported, false otherwise. bool IsXnackSupported() const { return xnack_ != IsaFeature::Unsupported; } /// @returns This Isa's major version. int32_t GetMajorVersion() const { return std::get<0>(version_); } /// @returns This Isa's minor version. int32_t GetMinorVersion() const { return std::get<1>(version_); } /// @returns This Isa's stepping. int32_t GetStepping() const { return std::get<2>(version_); } /// @brief Isa is always in valid state. bool IsValid() const { return true; } /// @returns This Isa's processor name. std::string GetProcessorName() const; /// @returns This Isa's name consisting of the target triple and target ID. std::string GetIsaName() const; /// @brief Query value of requested @p attribute and record it in @p value. bool GetInfo(const hsa_isa_info_t &attribute, void *value) const; /// @returns Round method (single or double) used to implement the floating- /// point multiply add instruction (mad) for a given combination of @p fp_type /// and @p flush_mode. hsa_round_method_t GetRoundMethod( hsa_fp_type_t fp_type, hsa_flush_mode_t flush_mode) const; private: /// @brief Default constructor. Isa() : targetid_(nullptr), version_(Version(-1, -1, -1)), sramecc_(IsaFeature::Unsupported), xnack_(IsaFeature::Unsupported) {} // @brief Isa's target ID name. const char* targetid_; /// @brief Isa's version. Version version_; /// @brief SRAMECC feature. IsaFeature sramecc_; /// @brief XNACK feature. IsaFeature xnack_; /// @brief Isa's supported wavefront. Wavefront wavefront_; /// @brief Isa's friends. friend class IsaRegistry; }; // class Isa /// @class IsaRegistry. /// @brief Instruction Set Architecture Registry. class IsaRegistry final { public: /// @returns Isa for requested @p full_name, null pointer if not supported. static const Isa *GetIsa(const std::string &full_name); /// @returns Isa for requested @p version, null pointer if not supported. static const Isa *GetIsa(const Isa::Version &version, IsaFeature sramecc = IsaFeature::Any, IsaFeature xnack = IsaFeature::Any); private: /// @brief IsaRegistry's map type. typedef std::unordered_map IsaMap; /// @brief Supported instruction set architectures. static const IsaMap supported_isas_; /// @brief Default constructor - not available. IsaRegistry(); /// @brief Default destructor - not available. ~IsaRegistry(); /// @returns Supported instruction set architectures. static const IsaMap GetSupportedIsas(); }; // class IsaRegistry } // namespace core } // namespace rocr #endif // HSA_RUNTIME_CORE_ISA_HPP_ ROCR-Runtime-rocm-5.0.0/src/core/inc/memory_region.h000066400000000000000000000114331420110115200221410ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA runtime C++ interface file. #ifndef HSA_RUNTME_CORE_INC_MEMORY_REGION_H_ #define HSA_RUNTME_CORE_INC_MEMORY_REGION_H_ #include #include "core/inc/hsa_internal.h" #include "core/inc/checked.h" #include "core/util/utils.h" namespace rocr { namespace core { class Agent; class MemoryRegion : public Checked<0x9C961F19EE175BB3> { public: MemoryRegion(bool fine_grain, bool kernarg, bool full_profile, core::Agent* owner) : fine_grain_(fine_grain), kernarg_(kernarg), full_profile_(full_profile), owner_(owner) { assert(owner_ != NULL); } virtual ~MemoryRegion() {} // Convert this object into hsa_region_t. static __forceinline hsa_region_t Convert(MemoryRegion* region) { const hsa_region_t region_handle = { static_cast(reinterpret_cast(region))}; return region_handle; } static __forceinline const hsa_region_t Convert(const MemoryRegion* region) { const hsa_region_t region_handle = { static_cast(reinterpret_cast(region))}; return region_handle; } // Convert hsa_region_t into MemoryRegion *. static __forceinline MemoryRegion* Convert(hsa_region_t region) { return reinterpret_cast(region.handle); } enum AllocateEnum { AllocateNoFlags = 0, AllocateRestrict = (1 << 0), // Don't map system memory to GPU agents AllocateExecutable = (1 << 1), // Set executable permission AllocateDoubleMap = (1 << 2), // Map twice VA allocation to backing store AllocateDirect = (1 << 3), // Bypass fragment cache. AllocateIPC = (1 << 4), // System memory that can be IPC-shared }; typedef uint32_t AllocateFlags; virtual hsa_status_t Allocate(size_t& size, AllocateFlags alloc_flags, void** address) const = 0; virtual hsa_status_t Free(void* address, size_t size) const = 0; // Prepares suballocated memory for IPC export. virtual hsa_status_t IPCFragmentExport(void* address) const = 0; // Translate memory properties into HSA region attribute. virtual hsa_status_t GetInfo(hsa_region_info_t attribute, void* value) const = 0; virtual hsa_status_t AssignAgent(void* ptr, size_t size, const Agent& agent, hsa_access_permission_t access) const = 0; // Releases any cached memory that may be held within the allocator. virtual void Trim() const {} __forceinline bool fine_grain() const { return fine_grain_; } __forceinline bool kernarg() const { return kernarg_; } __forceinline bool full_profile() const { return full_profile_; } __forceinline core::Agent* owner() const { return owner_; } private: const bool fine_grain_; const bool kernarg_; const bool full_profile_; core::Agent* owner_; }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/queue.h000066400000000000000000000302371420110115200204150ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA runtime C++ interface file. #ifndef HSA_RUNTME_CORE_INC_COMMAND_QUEUE_H_ #define HSA_RUNTME_CORE_INC_COMMAND_QUEUE_H_ #include #include "core/common/shared.h" #include "core/inc/checked.h" #include "core/util/utils.h" #include "inc/amd_hsa_queue.h" #include "hsakmt.h" namespace rocr { namespace core { struct AqlPacket { union { hsa_kernel_dispatch_packet_t dispatch; hsa_barrier_and_packet_t barrier_and; hsa_barrier_or_packet_t barrier_or; hsa_agent_dispatch_packet_t agent; }; uint8_t type() const { return ((dispatch.header >> HSA_PACKET_HEADER_TYPE) & ((1 << HSA_PACKET_HEADER_WIDTH_TYPE) - 1)); } bool IsValid() const { return int(type() <= HSA_PACKET_TYPE_BARRIER_OR) & (type() != HSA_PACKET_TYPE_INVALID); } std::string string() const { std::stringstream string; uint8_t type = this->type(); const char* type_names[] = { "HSA_PACKET_TYPE_VENDOR_SPECIFIC", "HSA_PACKET_TYPE_INVALID", "HSA_PACKET_TYPE_KERNEL_DISPATCH", "HSA_PACKET_TYPE_BARRIER_AND", "HSA_PACKET_TYPE_AGENT_DISPATCH", "HSA_PACKET_TYPE_BARRIER_OR"}; string << "type: " << type_names[type] << "\nbarrier: " << ((dispatch.header >> HSA_PACKET_HEADER_BARRIER) & ((1 << HSA_PACKET_HEADER_WIDTH_BARRIER) - 1)) << "\nacquire: " << ((dispatch.header >> HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) & ((1 << HSA_PACKET_HEADER_WIDTH_SCACQUIRE_FENCE_SCOPE) - 1)) << "\nrelease: " << ((dispatch.header >> HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE) & ((1 << HSA_PACKET_HEADER_WIDTH_SCRELEASE_FENCE_SCOPE) - 1)); if (type == HSA_PACKET_TYPE_KERNEL_DISPATCH) { string << "\nDim: " << dispatch.setup << "\nworkgroup_size: " << dispatch.workgroup_size_x << ", " << dispatch.workgroup_size_y << ", " << dispatch.workgroup_size_z << "\ngrid_size: " << dispatch.grid_size_x << ", " << dispatch.grid_size_y << ", " << dispatch.grid_size_z << "\nprivate_size: " << dispatch.private_segment_size << "\ngroup_size: " << dispatch.group_segment_size << "\nkernel_object: " << dispatch.kernel_object << "\nkern_arg: " << dispatch.kernarg_address << "\nsignal: " << dispatch.completion_signal.handle; } if ((type == HSA_PACKET_TYPE_BARRIER_AND) || (type == HSA_PACKET_TYPE_BARRIER_OR)) { for (int i = 0; i < 5; i++) string << "\ndep[" << i << "]: " << barrier_and.dep_signal[i].handle; string << "\nsignal: " << barrier_and.completion_signal.handle; } return string.str(); } }; class Queue; /// @brief Helper structure to simplify conversion of amd_queue_t and /// core::Queue object. struct SharedQueue { amd_queue_t amd_queue; Queue* core_queue; }; class LocalQueue { public: SharedQueue* queue() const { return local_queue_.shared_object(); } private: Shared local_queue_; }; /// @brief Class Queue which encapsulate user mode queues and /// provides Api to access its Read, Write indices using Acquire, /// Release and Relaxed semantics. /* Queue is intended to be an pure interface class and may be wrapped or replaced by tools. All funtions other than Convert and public_handle must be virtual. */ class Queue : public Checked<0xFA3906A679F9DB49>, private LocalQueue { public: Queue() : LocalQueue(), amd_queue_(queue()->amd_queue) { queue()->core_queue = this; public_handle_ = Convert(this); } virtual ~Queue() {} virtual void Destroy() { delete this; } /// @brief Returns the handle of Queue's public data type /// /// @param queue Pointer to an instance of Queue implementation object /// /// @return hsa_queue_t * Pointer to the public data type of a queue static __forceinline hsa_queue_t* Convert(Queue* queue) { return (queue != nullptr) ? &queue->amd_queue_.hsa_queue : nullptr; } /// @brief Transform the public data type of a Queue's data type into an // instance of it Queue class object /// /// @param queue Handle of public data type of a queue /// /// @return Queue * Pointer to the Queue's implementation object static __forceinline Queue* Convert(const hsa_queue_t* queue) { return (queue != nullptr) ? reinterpret_cast(reinterpret_cast(queue) - offsetof(SharedQueue, amd_queue.hsa_queue))->core_queue : nullptr; } /// @brief Inactivate the queue object. Once inactivate a /// queue cannot be used anymore and must be destroyed /// /// @return hsa_status_t Status of request virtual hsa_status_t Inactivate() = 0; /// @brief Change the scheduling priority of the queue virtual hsa_status_t SetPriority(HSA_QUEUE_PRIORITY priority) = 0; /// @brief Reads the Read Index of Queue using Acquire semantics /// /// @return uint64_t Value of Read index virtual uint64_t LoadReadIndexAcquire() = 0; /// @brief Reads the Read Index of Queue using Relaxed semantics /// /// @return uint64_t Value of Read index virtual uint64_t LoadReadIndexRelaxed() = 0; /// @brief Reads the Write Index of Queue using Acquire semantics /// /// @return uint64_t Value of Write index virtual uint64_t LoadWriteIndexAcquire() = 0; /// Reads the Write Index of Queue using Relaxed semantics /// /// @return uint64_t Value of Write index virtual uint64_t LoadWriteIndexRelaxed() = 0; /// @brief Updates the Read Index of Queue using Relaxed semantics /// /// @param value New value of Read index to update virtual void StoreReadIndexRelaxed(uint64_t value) = 0; /// @brief Updates the Read Index of Queue using Release semantics /// /// @param value New value of Read index to update virtual void StoreReadIndexRelease(uint64_t value) = 0; /// @brief Updates the Write Index of Queue using Relaxed semantics /// /// @param value New value of Write index to update virtual void StoreWriteIndexRelaxed(uint64_t value) = 0; /// @brief Updates the Write Index of Queue using Release semantics /// /// @param value New value of Write index to update virtual void StoreWriteIndexRelease(uint64_t value) = 0; /// @brief Compares and swaps Write index using Acquire and Release semantics /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update virtual uint64_t CasWriteIndexAcqRel(uint64_t expected, uint64_t value) = 0; /// @brief Compares and swaps Write index using Acquire semantics /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update virtual uint64_t CasWriteIndexAcquire(uint64_t expected, uint64_t value) = 0; /// @brief Compares and swaps Write index using Relaxed semantics /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update virtual uint64_t CasWriteIndexRelaxed(uint64_t expected, uint64_t value) = 0; /// @brief Compares and swaps Write index using Release semantics /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update virtual uint64_t CasWriteIndexRelease(uint64_t expected, uint64_t value) = 0; /// @brief Updates the Write index using Acquire and Release semantics /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update virtual uint64_t AddWriteIndexAcqRel(uint64_t value) = 0; /// @brief Updates the Write index using Acquire semantics /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update virtual uint64_t AddWriteIndexAcquire(uint64_t value) = 0; /// @brief Updates the Write index using Relaxed semantics /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update virtual uint64_t AddWriteIndexRelaxed(uint64_t value) = 0; /// @brief Updates the Write index using Release semantics /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update virtual uint64_t AddWriteIndexRelease(uint64_t value) = 0; /// @brief Set CU Masking /// /// @param num_cu_mask_count size of mask bit array /// /// @param cu_mask pointer to cu mask /// /// @return hsa_status_t virtual hsa_status_t SetCUMasking(uint32_t num_cu_mask_count, const uint32_t* cu_mask) = 0; /// @brief Get CU Masking /// /// @param num_cu_mask_count size of mask bit array /// /// @param cu_mask pointer to cu mask /// /// @return hsa_status_t virtual hsa_status_t GetCUMasking(uint32_t num_cu_mask_count, uint32_t* cu_mask) = 0; // @brief Submits a block of PM4 and waits until it has been executed. virtual void ExecutePM4(uint32_t* cmd_data, size_t cmd_size_b) = 0; virtual void SetProfiling(bool enabled) { AMD_HSA_BITS_SET(amd_queue_.queue_properties, AMD_QUEUE_PROPERTIES_ENABLE_PROFILING, (enabled != 0)); } /// @ brief Reports async queue errors to stderr if no other error handler was registered. static void DefaultErrorHandler(hsa_status_t status, hsa_queue_t* source, void* data); // Handle of AMD Queue struct amd_queue_t& amd_queue_; hsa_queue_t* public_handle() const { return public_handle_; } typedef void* rtti_t; bool IsType(rtti_t id) { return _IsA(id); } protected: static void set_public_handle(Queue* ptr, hsa_queue_t* handle) { ptr->do_set_public_handle(handle); } virtual void do_set_public_handle(hsa_queue_t* handle) { public_handle_ = handle; } virtual bool _IsA(rtti_t id) const = 0; hsa_queue_t* public_handle_; /// Next available queue id. uint64_t GetQueueId() { return hsa_queue_counter_++; } private: // HSA Queue ID - used to bind a unique ID static std::atomic hsa_queue_counter_; DISALLOW_COPY_AND_ASSIGN(Queue); }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/registers.h000066400000000000000000000237731420110115200213070ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // This file is used only for open source cmake builds, if we hardcode the // register values in amd_aql_queue.cpp then this file won't be required. For // now we are using this file where register details are spelled out in the // structs/unions below. #ifndef HSA_RUNTME_CORE_INC_REGISTERS_H_ #define HSA_RUNTME_CORE_INC_REGISTERS_H_ typedef enum SQ_RSRC_BUF_TYPE { SQ_RSRC_BUF = 0x00000000, SQ_RSRC_BUF_RSVD_1 = 0x00000001, SQ_RSRC_BUF_RSVD_2 = 0x00000002, SQ_RSRC_BUF_RSVD_3 = 0x00000003, } SQ_RSRC_BUF_TYPE; typedef enum BUF_DATA_FORMAT { BUF_DATA_FORMAT_INVALID = 0x00000000, BUF_DATA_FORMAT_8 = 0x00000001, BUF_DATA_FORMAT_16 = 0x00000002, BUF_DATA_FORMAT_8_8 = 0x00000003, BUF_DATA_FORMAT_32 = 0x00000004, BUF_DATA_FORMAT_16_16 = 0x00000005, BUF_DATA_FORMAT_10_11_11 = 0x00000006, BUF_DATA_FORMAT_11_11_10 = 0x00000007, BUF_DATA_FORMAT_10_10_10_2 = 0x00000008, BUF_DATA_FORMAT_2_10_10_10 = 0x00000009, BUF_DATA_FORMAT_8_8_8_8 = 0x0000000a, BUF_DATA_FORMAT_32_32 = 0x0000000b, BUF_DATA_FORMAT_16_16_16_16 = 0x0000000c, BUF_DATA_FORMAT_32_32_32 = 0x0000000d, BUF_DATA_FORMAT_32_32_32_32 = 0x0000000e, BUF_DATA_FORMAT_RESERVED_15 = 0x0000000f, } BUF_DATA_FORMAT; typedef enum BUF_NUM_FORMAT { BUF_NUM_FORMAT_UNORM = 0x00000000, BUF_NUM_FORMAT_SNORM = 0x00000001, BUF_NUM_FORMAT_USCALED = 0x00000002, BUF_NUM_FORMAT_SSCALED = 0x00000003, BUF_NUM_FORMAT_UINT = 0x00000004, BUF_NUM_FORMAT_SINT = 0x00000005, BUF_NUM_FORMAT_SNORM_OGL__SI__CI = 0x00000006, BUF_NUM_FORMAT_RESERVED_6__VI = 0x00000006, BUF_NUM_FORMAT_FLOAT = 0x00000007, } BUF_NUM_FORMAT; typedef enum BUF_FORMAT { BUF_FORMAT_32_UINT = 0x00000014, } BUF_FORMAT; typedef enum SQ_SEL_XYZW01 { SQ_SEL_0 = 0x00000000, SQ_SEL_1 = 0x00000001, SQ_SEL_RESERVED_0 = 0x00000002, SQ_SEL_RESERVED_1 = 0x00000003, SQ_SEL_X = 0x00000004, SQ_SEL_Y = 0x00000005, SQ_SEL_Z = 0x00000006, SQ_SEL_W = 0x00000007, } SQ_SEL_XYZW01; union COMPUTE_TMPRING_SIZE { struct { #if defined(LITTLEENDIAN_CPU) unsigned int WAVES : 12; unsigned int WAVESIZE : 13; unsigned int : 7; #elif defined(BIGENDIAN_CPU) unsigned int : 7; unsigned int WAVESIZE : 13; unsigned int WAVES : 12; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_BUF_RSRC_WORD0 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS : 32; #elif defined(BIGENDIAN_CPU) unsigned int BASE_ADDRESS : 32; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_BUF_RSRC_WORD1 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS_HI : 16; unsigned int STRIDE : 14; unsigned int CACHE_SWIZZLE : 1; unsigned int SWIZZLE_ENABLE : 1; #elif defined(BIGENDIAN_CPU) unsigned int SWIZZLE_ENABLE : 1; unsigned int CACHE_SWIZZLE : 1; unsigned int STRIDE : 14; unsigned int BASE_ADDRESS_HI : 16; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_BUF_RSRC_WORD2 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int NUM_RECORDS : 32; #elif defined(BIGENDIAN_CPU) unsigned int NUM_RECORDS : 32; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_BUF_RSRC_WORD3 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int DST_SEL_X : 3; unsigned int DST_SEL_Y : 3; unsigned int DST_SEL_Z : 3; unsigned int DST_SEL_W : 3; unsigned int NUM_FORMAT : 3; unsigned int DATA_FORMAT : 4; unsigned int ELEMENT_SIZE : 2; unsigned int INDEX_STRIDE : 2; unsigned int ADD_TID_ENABLE : 1; unsigned int ATC__CI__VI : 1; unsigned int HASH_ENABLE : 1; unsigned int HEAP : 1; unsigned int MTYPE__CI__VI : 3; unsigned int TYPE : 2; #elif defined(BIGENDIAN_CPU) unsigned int TYPE : 2; unsigned int MTYPE__CI__VI : 3; unsigned int HEAP : 1; unsigned int HASH_ENABLE : 1; unsigned int ATC__CI__VI : 1; unsigned int ADD_TID_ENABLE : 1; unsigned int INDEX_STRIDE : 2; unsigned int ELEMENT_SIZE : 2; unsigned int DATA_FORMAT : 4; unsigned int NUM_FORMAT : 3; unsigned int DST_SEL_W : 3; unsigned int DST_SEL_Z : 3; unsigned int DST_SEL_Y : 3; unsigned int DST_SEL_X : 3; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_BUF_RSRC_WORD3_GFX10 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int DST_SEL_X : 3; unsigned int DST_SEL_Y : 3; unsigned int DST_SEL_Z : 3; unsigned int DST_SEL_W : 3; unsigned int FORMAT : 7; unsigned int RESERVED1 : 2; unsigned int INDEX_STRIDE : 2; unsigned int ADD_TID_ENABLE : 1; unsigned int RESOURCE_LEVEL : 1; unsigned int RESERVED2 : 3; unsigned int OOB_SELECT : 2; unsigned int TYPE : 2; #elif defined(BIGENDIAN_CPU) unsigned int TYPE : 2; unsigned int OOB_SELECT : 2; unsigned int RESERVED2 : 3; unsigned int RESOURCE_LEVEL : 1; unsigned int ADD_TID_ENABLE : 1; unsigned int INDEX_STRIDE : 2; unsigned int RESERVED1 : 2; unsigned int FORMAT : 7; unsigned int DST_SEL_W : 3; unsigned int DST_SEL_Z : 3; unsigned int DST_SEL_Y : 3; unsigned int DST_SEL_X : 3; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/runtime.h000066400000000000000000000517041420110115200207560ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA runtime C++ interface file. #ifndef HSA_RUNTME_CORE_INC_RUNTIME_H_ #define HSA_RUNTME_CORE_INC_RUNTIME_H_ #include #include #include #include #include #include "core/inc/hsa_ext_interface.h" #include "core/inc/hsa_internal.h" #include "core/inc/hsa_ext_amd_impl.h" #include "core/inc/agent.h" #include "core/inc/exceptions.h" #include "core/inc/memory_region.h" #include "core/inc/signal.h" #include "core/inc/interrupt_signal.h" #include "core/util/flag.h" #include "core/util/locks.h" #include "core/util/os.h" #include "core/util/utils.h" #include "core/inc/amd_loader_context.hpp" #include "core/inc/amd_hsa_code.hpp" //---------------------------------------------------------------------------// // Constants // //---------------------------------------------------------------------------// #define HSA_ARGUMENT_ALIGN_BYTES 16 #define HSA_QUEUE_ALIGN_BYTES 64 #define HSA_PACKET_ALIGN_BYTES 64 //Avoids include namespace rocr { namespace AMD { class MemoryRegion; } // namespace amd namespace core { extern bool g_use_interrupt_wait; /// @brief Runtime class provides the following functions: /// - open and close connection to kernel driver. /// - load supported extension library (image and finalizer). /// - load tools library. /// - expose supported agents. /// - allocate and free memory. /// - memory copy and fill. /// - grant access to memory (dgpu memory pool extension). /// - maintain loader state. /// - monitor asynchronous event from agent. class Runtime { friend class AMD::MemoryRegion; public: /// @brief Structure to describe connectivity between agents. struct LinkInfo { LinkInfo() : num_hop(0), info{0} {} uint32_t num_hop; hsa_amd_memory_pool_link_info_t info; }; struct KfdVersion_t { HsaVersionInfo version; bool supports_exception_debugging; }; /// @brief Open connection to kernel driver and increment reference count. static hsa_status_t Acquire(); /// @brief Decrement reference count and close connection to kernel driver. static hsa_status_t Release(); /// @brief Checks if connection to kernel driver is opened. /// @retval True if the connection to kernel driver is opened. static bool IsOpen(); // @brief Callback handler for VM fault access. static bool VMFaultHandler(hsa_signal_value_t val, void* arg); // @brief Print known allocations near ptr. static void PrintMemoryMapNear(void* ptr); /// @brief Singleton object of the runtime. static Runtime* runtime_singleton_; /// @brief Insert agent into agent list ::agents_. /// @param [in] agent Pointer to the agent object. void RegisterAgent(Agent* agent); /// @brief Delete all agent objects from ::agents_. void DestroyAgents(); /// @brief Set the number of links connecting the agents in the platform. void SetLinkCount(size_t num_link); /// @brief Register link information connecting @p node_id_from and @p /// node_id_to. /// @param [in] node_id_from Node id of the source node. /// @param [in] node_id_to Node id of the destination node. /// @param [in] link_info The link information between source and destination /// nodes. void RegisterLinkInfo(uint32_t node_id_from, uint32_t node_id_to, uint32_t num_hop, hsa_amd_memory_pool_link_info_t& link_info); /// @brief Query link information between two nodes. /// @param [in] node_id_from Node id of the source node. /// @param [in] node_id_to Node id of the destination node. /// @retval The link information between source and destination nodes. const LinkInfo GetLinkInfo(uint32_t node_id_from, uint32_t node_id_to); /// @brief Invoke the user provided call back for each agent in the agent /// list. /// /// @param [in] callback User provided callback function. /// @param [in] data User provided pointer as input for @p callback. /// /// @retval ::HSA_STATUS_SUCCESS if the callback function for each traversed /// agent returns ::HSA_STATUS_SUCCESS. hsa_status_t IterateAgent(hsa_status_t (*callback)(hsa_agent_t agent, void* data), void* data); /// @brief Allocate memory on a particular region. /// /// @param [in] region Pointer to region object. /// @param [in] size Allocation size in bytes. /// @param [in] alloc_flags Modifiers to pass to MemoryRegion allocator. /// @param [out] address Pointer to store the allocation result. /// /// @retval ::HSA_STATUS_SUCCESS If allocation is successful. hsa_status_t AllocateMemory(const MemoryRegion* region, size_t size, MemoryRegion::AllocateFlags alloc_flags, void** address); /// @brief Free memory previously allocated with AllocateMemory. /// /// @param [in] ptr Address of the memory to be freed. /// /// @retval ::HSA_STATUS_ERROR If @p ptr is not the address of previous /// allocation via ::core::Runtime::AllocateMemory /// @retval ::HSA_STATUS_SUCCESS if @p ptr is successfully released. hsa_status_t FreeMemory(void* ptr); hsa_status_t RegisterReleaseNotifier(void* ptr, hsa_amd_deallocation_callback_t callback, void* user_data); hsa_status_t DeregisterReleaseNotifier(void* ptr, hsa_amd_deallocation_callback_t callback); /// @brief Blocking memory copy from src to dst. /// /// @param [in] dst Memory address of the destination. /// @param [in] src Memory address of the source. /// @param [in] size Copy size in bytes. /// /// @retval ::HSA_STATUS_SUCCESS if memory copy is successful and completed. hsa_status_t CopyMemory(void* dst, const void* src, size_t size); /// @brief Non-blocking memory copy from src to dst. /// /// @details The memory copy will be performed after all signals in /// @p dep_signals have value of 0. On completion @p completion_signal /// will be decremented. /// /// @param [in] dst Memory address of the destination. /// @param [in] dst_agent Agent object associated with the destination. This /// agent should be able to access the destination and source. /// @param [in] src Memory address of the source. /// @param [in] src_agent Agent object associated with the source. This /// agent should be able to access the destination and source. /// @param [in] size Copy size in bytes. /// @param [in] dep_signals Array of signal dependency. /// @param [in] completion_signal Completion signal object. /// /// @retval ::HSA_STATUS_SUCCESS if copy command has been submitted /// successfully to the agent DMA queue. hsa_status_t CopyMemory(void* dst, core::Agent& dst_agent, const void* src, core::Agent& src_agent, size_t size, std::vector& dep_signals, core::Signal& completion_signal); /// @brief Fill the first @p count of uint32_t in ptr with value. /// /// @param [in] ptr Memory address to be filled. /// @param [in] value The value/pattern that will be used to set @p ptr. /// @param [in] count Number of uint32_t element to be set. /// /// @retval ::HSA_STATUS_SUCCESS if memory fill is successful and completed. hsa_status_t FillMemory(void* ptr, uint32_t value, size_t count); /// @brief Set agents as the whitelist to access ptr. /// /// @param [in] num_agents The number of agent handles in @p agents array. /// @param [in] agents Agent handle array. /// @param [in] ptr Pointer of memory previously allocated via /// core::Runtime::AllocateMemory. /// /// @retval ::HSA_STATUS_SUCCESS The whitelist has been configured /// successfully and all agents in the @p agents could start accessing @p ptr. hsa_status_t AllowAccess(uint32_t num_agents, const hsa_agent_t* agents, const void* ptr); /// @brief Query system information. /// /// @param [in] attribute System info attribute to query. /// @param [out] value Pointer to store the attribute value. /// /// @retval HSA_STATUS_SUCCESS The attribute is valid and the @p value is /// set. hsa_status_t GetSystemInfo(hsa_system_info_t attribute, void* value); /// @brief Register a callback function @p handler that is associated with /// @p signal to asynchronous event monitor thread. /// /// @param [in] signal Signal handle associated with @p handler. /// @param [in] cond The condition to execute the @p handler. /// @param [in] value The value to compare with @p signal value. If the /// comparison satisfy @p cond, the @p handler will be called. /// @param [in] arg Pointer to the argument that will be provided to @p /// handler. /// /// @retval ::HSA_STATUS_SUCCESS Registration is successful. hsa_status_t SetAsyncSignalHandler(hsa_signal_t signal, hsa_signal_condition_t cond, hsa_signal_value_t value, hsa_amd_signal_handler handler, void* arg); hsa_status_t InteropMap(uint32_t num_agents, Agent** agents, int interop_handle, uint32_t flags, size_t* size, void** ptr, size_t* metadata_size, const void** metadata); hsa_status_t InteropUnmap(void* ptr); struct PtrInfoBlockData { void* base; size_t length; }; hsa_status_t PtrInfo(const void* ptr, hsa_amd_pointer_info_t* info, void* (*alloc)(size_t), uint32_t* num_agents_accessible, hsa_agent_t** accessible, PtrInfoBlockData* block_info = nullptr); hsa_status_t SetPtrInfoData(const void* ptr, void* userptr); hsa_status_t IPCCreate(void* ptr, size_t len, hsa_amd_ipc_memory_t* handle); hsa_status_t IPCAttach(const hsa_amd_ipc_memory_t* handle, size_t len, uint32_t num_agents, Agent** mapping_agents, void** mapped_ptr); hsa_status_t IPCDetach(void* ptr); hsa_status_t SetSvmAttrib(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count); hsa_status_t GetSvmAttrib(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count); hsa_status_t SvmPrefetch(void* ptr, size_t size, hsa_agent_t agent, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal); const std::vector& cpu_agents() { return cpu_agents_; } const std::vector& gpu_agents() { return gpu_agents_; } const std::vector& gpu_ids() { return gpu_ids_; } Agent* region_gpu() { return region_gpu_; } const std::vector& system_regions_fine() const { return system_regions_fine_; } const std::vector& system_regions_coarse() const { return system_regions_coarse_; } amd::hsa::loader::Loader* loader() { return loader_; } amd::LoaderContext* loader_context() { return &loader_context_; } amd::hsa::code::AmdHsaCodeManager* code_manager() { return &code_manager_; } std::function& system_allocator() { return system_allocator_; } std::function& system_deallocator() { return system_deallocator_; } const Flag& flag() const { return flag_; } ExtensionEntryPoints extensions_; hsa_status_t SetCustomSystemEventHandler(hsa_amd_system_event_callback_t callback, void* data); hsa_status_t SetInternalQueueCreateNotifier(hsa_amd_runtime_queue_notifier callback, void* user_data); void InternalQueueCreateNotify(const hsa_queue_t* queue, hsa_agent_t agent); SharedSignalPool_t* GetSharedSignalPool() { return &SharedSignalPool; } InterruptSignal::EventPool* GetEventPool() { return &EventPool; } uint64_t sys_clock_freq() const { return sys_clock_freq_; } void KfdVersion(const HsaVersionInfo& version) { kfd_version.version = version; } void KfdVersion(bool exception_debugging) { kfd_version.supports_exception_debugging = exception_debugging; } KfdVersion_t KfdVersion() const { return kfd_version; } protected: static void AsyncEventsLoop(void*); struct AllocationRegion { AllocationRegion() : region(NULL), size(0), user_ptr(nullptr) {} AllocationRegion(const MemoryRegion* region_arg, size_t size_arg) : region(region_arg), size(size_arg), user_ptr(nullptr) {} struct notifier_t { void* ptr; AMD::callback_t callback; void* user_data; }; const MemoryRegion* region; size_t size; void* user_ptr; std::unique_ptr> notifiers; }; struct AsyncEventsControl { AsyncEventsControl() : async_events_thread_(NULL) {} void Shutdown(); hsa_signal_t wake; os::Thread async_events_thread_; KernelMutex lock; bool exit; }; struct AsyncEvents { void PushBack(hsa_signal_t signal, hsa_signal_condition_t cond, hsa_signal_value_t value, hsa_amd_signal_handler handler, void* arg); void CopyIndex(size_t dst, size_t src); size_t Size(); void PopBack(); void Clear(); std::vector signal_; std::vector cond_; std::vector value_; std::vector handler_; std::vector arg_; }; struct PrefetchRange; typedef std::map prefetch_map_t; struct PrefetchOp { void* base; size_t size; uint32_t node_id; int remaining_deps; hsa_signal_t completion; std::vector dep_signals; prefetch_map_t::iterator prefetch_map_entry; }; struct PrefetchRange { PrefetchRange() {} PrefetchRange(size_t Bytes, PrefetchOp* Op) : bytes(Bytes), op(Op) {} size_t bytes; PrefetchOp* op; prefetch_map_t::iterator prev; prefetch_map_t::iterator next; }; // Will be created before any user could call hsa_init but also could be // destroyed before incorrectly written programs call hsa_shutdown. static KernelMutex bootstrap_lock_; Runtime(); Runtime(const Runtime&); Runtime& operator=(const Runtime&); ~Runtime() {} /// @brief Open connection to kernel driver. hsa_status_t Load(); /// @brief Close connection to kernel driver and cleanup resources. void Unload(); /// @brief Dynamically load extension libraries (images, finalizer) and /// call OnLoad method on each loaded library. void LoadExtensions(); /// @brief Call OnUnload method on each extension library then close it. void UnloadExtensions(); /// @brief Dynamically load tool libraries and call OnUnload method on each /// loaded library. void LoadTools(); /// @brief Call OnUnload method of each tool library. void UnloadTools(); /// @brief Close tool libraries. void CloseTools(); // @brief Binds virtual memory access fault handler to this node. void BindVmFaultHandler(); // @brief Acquire snapshot of system event handlers. // Returns a copy to avoid holding a lock during callbacks. std::vector, void*>> GetSystemEventHandlers(); /// @brief Get the index of ::link_matrix_. /// @param [in] node_id_from Node id of the source node. /// @param [in] node_id_to Node id of the destination node. /// @retval Index in ::link_matrix_. uint32_t GetIndexLinkInfo(uint32_t node_id_from, uint32_t node_id_to); /// @brief Get most recently issued SVM prefetch agent for the range in question. Agent* GetSVMPrefetchAgent(void* ptr, size_t size); /// @brief Get the highest used node id. uint32_t max_node_id() const { return agents_by_node_.rbegin()->first; } // Mutex object to protect multithreaded access to ::allocation_map_. // Also ensures atomicity of pointer info queries by interlocking // KFD map/unmap, register/unregister, and access to hsaKmtQueryPointerInfo // registered & mapped arrays. KernelSharedMutex memory_lock_; // Array containing tools library handles. std::vector tool_libs_; // Agent list containing all CPU agents in the platform. std::vector cpu_agents_; // Agent list containing all compatible GPU agents in the platform. std::vector gpu_agents_; // Agent map containing all agents indexed by their KFD node IDs. std::map > agents_by_node_; // Agent list containing all compatible gpu agent ids in the platform. std::vector gpu_ids_; // List of all fine grain system memory region in the platform. std::vector system_regions_fine_; // List of all coarse grain system memory region in the platform. std::vector system_regions_coarse_; // Matrix of IO link. std::vector link_matrix_; // Loader instance. amd::hsa::loader::Loader* loader_; // Loader context. amd::LoaderContext loader_context_; // Code object manager. amd::hsa::code::AmdHsaCodeManager code_manager_; // Contains the region, address, and size of previously allocated memory. std::map allocation_map_; // Pending prefetch containers. KernelMutex prefetch_lock_; prefetch_map_t prefetch_map_; // Allocator using ::system_region_ std::function system_allocator_; // Deallocator using ::system_region_ std::function system_deallocator_; // Deprecated HSA Region API GPU (for legacy APU support only) Agent* region_gpu_; AsyncEventsControl async_events_control_; AsyncEvents async_events_; AsyncEvents new_async_events_; // System clock frequency. uint64_t sys_clock_freq_; // Number of Numa Nodes size_t num_nodes_; // @brief AMD HSA event to monitor for virtual memory access fault. HsaEvent* vm_fault_event_; // @brief HSA signal to contain the VM fault event. Signal* vm_fault_signal_; // Custom system event handlers. std::vector, void*>> system_event_handlers_; // System event handler lock KernelMutex system_event_lock_; // Internal queue creation notifier AMD::callback_t internal_queue_create_notifier_; void* internal_queue_create_notifier_user_data_; // Holds reference count to runtime object. std::atomic ref_count_; // Track environment variables. Flag flag_; // Pools memory for SharedSignal (Signal ABI blocks) SharedSignalPool_t SharedSignalPool; // Pools KFD Events for InterruptSignal InterruptSignal::EventPool EventPool; // Kfd version KfdVersion_t kfd_version; // Frees runtime memory when the runtime library is unloaded if safe to do so. // Failure to release the runtime indicates an incorrect application but is // common (example: calls library routines at process exit). friend class RuntimeCleanup; }; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/scratch_cache.h000066400000000000000000000140571420110115200220450ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2020-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_SCRATCH_CACHE_H_ #define HSA_RUNTIME_CORE_INC_SCRATCH_CACHE_H_ #include "core/inc/amd_gpu_agent.h" #include "core/util/locks.h" #include "core/util/utils.h" #include #include namespace rocr { namespace AMD { class ScratchCache { public: struct node { enum STATE { FREE = 0, ALLOC = 1, TRIM = 2, STEAL = 4 }; void* base; bool large; uint32_t state; node() : base(nullptr), state(FREE) {} bool isFree() const { return state == FREE; } bool trimPending() const { return state == (ALLOC | TRIM); } void trim() { assert(!isFree() && "Trim of free scratch node."); state |= TRIM; } void free() { assert(!isFree() && "Free of free scratch node."); state = FREE; } void alloc() { assert(isFree() && "Alloc of non-free scratch node."); state = ALLOC; } }; typedef ::std::multimap map_t; typedef map_t::iterator ref_t; typedef ::std::function deallocator_t; // @brief Contains scratch memory information. struct ScratchInfo { void* queue_base; // Size to fill the machine with size_per_thread size_t size; // Size to satisfy the present dispatch without throttling. size_t dispatch_size; size_t size_per_thread; uint32_t lanes_per_wave; uint32_t waves_per_group; ptrdiff_t queue_process_offset; bool large; bool retry; hsa_signal_t queue_retry; uint64_t wanted_slots; ScratchCache::ref_t scratch_node; }; ScratchCache(const ScratchCache& rhs) = delete; ScratchCache(ScratchCache&& rhs) = delete; ScratchCache& operator=(const ScratchCache& rhs) = delete; ScratchCache& operator=(ScratchCache&& rhs) = delete; ScratchCache(deallocator_t deallocator) : dealloc(deallocator), available_bytes(0) {} ~ScratchCache() { assert(map.empty() && "ScratchCache not empty at shutdown."); } bool alloc(ScratchInfo& info) { ref_t it = map.upper_bound(info.size - 1); if (it == map.end()) return false; // Small requests must have an exact size match and be small. if (!info.large) { while ((it != map.end()) && (it->first == info.size)) { if (it->second.isFree() && (!it->second.large)) { it->second.alloc(); info.queue_base = it->second.base; info.scratch_node = it; available_bytes -= it->first; return true; } it++; } return false; } // Large requests may use a small allocation and do not require an exact size match. while (it != map.end()) { if (it->second.isFree()) { it->second.alloc(); info.queue_base = it->second.base; info.size = it->first; info.scratch_node = it; available_bytes -= it->first; return true; } it++; } return false; } void free(ScratchInfo& info) { assert(!info.scratch_node->second.isFree() && "free called on free scratch node."); auto it = info.scratch_node; if (it->second.trimPending()) { dealloc(it->second.base, it->first, it->second.large); map.erase(it); return; } it->second.free(); available_bytes += it->first; assert(it->first == info.size && "Scratch cache size mismatch."); } bool trim(bool trim_nodes_in_use) { bool ret = !map.empty(); auto it = map.begin(); while (it != map.end()) { if (it->second.isFree()) { available_bytes -= it->first; dealloc(it->second.base, it->first, it->second.large); auto temp = it; it++; map.erase(temp); } else { if (trim_nodes_in_use) it->second.trim(); it++; } } return ret; } void insert(ScratchInfo& info) { node n; n.base = info.queue_base; n.large = info.large; n.alloc(); auto it = map.insert(std::make_pair(info.size, n)); info.scratch_node = it; } size_t free_bytes() const { return available_bytes; } private: map_t map; deallocator_t dealloc; size_t available_bytes; }; } // namespace AMD } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/inc/sdma_registers.h000066400000000000000000000317701420110115200223070ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_SDMA_REGISTERS_H_ #define HSA_RUNTIME_CORE_INC_SDMA_REGISTERS_H_ #include #include namespace rocr { namespace AMD { // SDMA packet for VI device. // Reference: http://people.freedesktop.org/~agd5f/dma_packets.txt const unsigned int SDMA_OP_COPY = 1; const unsigned int SDMA_OP_FENCE = 5; const unsigned int SDMA_OP_TRAP = 6; const unsigned int SDMA_OP_POLL_REGMEM = 8; const unsigned int SDMA_OP_ATOMIC = 10; const unsigned int SDMA_OP_CONST_FILL = 11; const unsigned int SDMA_OP_TIMESTAMP = 13; const unsigned int SDMA_OP_GCR = 17; const unsigned int SDMA_SUBOP_COPY_LINEAR = 0; const unsigned int SDMA_SUBOP_COPY_LINEAR_RECT = 4; const unsigned int SDMA_SUBOP_TIMESTAMP_GET_GLOBAL = 2; const unsigned int SDMA_SUBOP_USER_GCR = 1; const unsigned int SDMA_ATOMIC_ADD64 = 47; typedef struct SDMA_PKT_COPY_LINEAR_TAG { union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int extra_info : 16; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int count : 22; unsigned int reserved_0 : 10; }; unsigned int DW_1_DATA; } COUNT_UNION; union { struct { unsigned int reserved_0 : 16; unsigned int dst_swap : 2; unsigned int reserved_1 : 6; unsigned int src_swap : 2; unsigned int reserved_2 : 6; }; unsigned int DW_2_DATA; } PARAMETER_UNION; union { struct { unsigned int src_addr_31_0 : 32; }; unsigned int DW_3_DATA; } SRC_ADDR_LO_UNION; union { struct { unsigned int src_addr_63_32 : 32; }; unsigned int DW_4_DATA; } SRC_ADDR_HI_UNION; union { struct { unsigned int dst_addr_31_0 : 32; }; unsigned int DW_5_DATA; } DST_ADDR_LO_UNION; union { struct { unsigned int dst_addr_63_32 : 32; }; unsigned int DW_6_DATA; } DST_ADDR_HI_UNION; static const size_t kMaxSize_ = 0x3fffe0; } SDMA_PKT_COPY_LINEAR; // linear sub-window typedef struct SDMA_PKT_COPY_LINEAR_RECT_TAG { static const unsigned int pitch_bits = 19; static const unsigned int slice_bits = 28; static const unsigned int rect_xy_bits = 14; static const unsigned int rect_z_bits = 11; union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int reserved : 13; unsigned int element : 3; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int src_addr_31_0 : 32; }; unsigned int DW_1_DATA; } SRC_ADDR_LO_UNION; union { struct { unsigned int src_addr_63_32 : 32; }; unsigned int DW_2_DATA; } SRC_ADDR_HI_UNION; union { struct { unsigned int src_offset_x : 14; unsigned int reserved_1 : 2; unsigned int src_offset_y : 14; unsigned int reserved_2 : 2; }; unsigned int DW_3_DATA; } SRC_PARAMETER_1_UNION; union { struct { unsigned int src_offset_z : 11; unsigned int reserved_1 : 2; unsigned int src_pitch : pitch_bits; }; unsigned int DW_4_DATA; } SRC_PARAMETER_2_UNION; union { struct { unsigned int src_slice_pitch : slice_bits; unsigned int reserved_1 : 4; }; unsigned int DW_5_DATA; } SRC_PARAMETER_3_UNION; union { struct { unsigned int dst_addr_31_0 : 32; }; unsigned int DW_6_DATA; } DST_ADDR_LO_UNION; union { struct { unsigned int dst_addr_63_32 : 32; }; unsigned int DW_7_DATA; } DST_ADDR_HI_UNION; union { struct { unsigned int dst_offset_x : 14; unsigned int reserved_1 : 2; unsigned int dst_offset_y : 14; unsigned int reserved_2 : 2; }; unsigned int DW_8_DATA; } DST_PARAMETER_1_UNION; union { struct { unsigned int dst_offset_z : 11; unsigned int reserved_1 : 2; unsigned int dst_pitch : pitch_bits; }; unsigned int DW_9_DATA; } DST_PARAMETER_2_UNION; union { struct { unsigned int dst_slice_pitch : slice_bits; unsigned int reserved_1 : 4; }; unsigned int DW_10_DATA; } DST_PARAMETER_3_UNION; union { struct { unsigned int rect_x : rect_xy_bits; unsigned int reserved_1 : 2; unsigned int rect_y : rect_xy_bits; unsigned int reserved_2 : 2; }; unsigned int DW_11_DATA; } RECT_PARAMETER_1_UNION; union { struct { unsigned int rect_z : rect_z_bits; unsigned int reserved_1 : 5; unsigned int dst_swap : 2; unsigned int reserved_2 : 6; unsigned int src_swap : 2; unsigned int reserved_3 : 6; }; unsigned int DW_12_DATA; } RECT_PARAMETER_2_UNION; } SDMA_PKT_COPY_LINEAR_RECT; typedef struct SDMA_PKT_CONSTANT_FILL_TAG { union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int sw : 2; unsigned int reserved_0 : 12; unsigned int fillsize : 2; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int dst_addr_31_0 : 32; }; unsigned int DW_1_DATA; } DST_ADDR_LO_UNION; union { struct { unsigned int dst_addr_63_32 : 32; }; unsigned int DW_2_DATA; } DST_ADDR_HI_UNION; union { struct { unsigned int src_data_31_0 : 32; }; unsigned int DW_3_DATA; } DATA_UNION; union { struct { unsigned int count : 22; unsigned int reserved_0 : 10; }; unsigned int DW_4_DATA; } COUNT_UNION; static const size_t kMaxSize_ = 0x3fffe0; } SDMA_PKT_CONSTANT_FILL; typedef struct SDMA_PKT_FENCE_TAG { union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int mtype : 3; unsigned int gcc : 1; unsigned int sys : 1; unsigned int pad1 : 1; unsigned int snp : 1; unsigned int gpa : 1; unsigned int l2_policy : 2; unsigned int reserved_0 : 6; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int addr_31_0 : 32; }; unsigned int DW_1_DATA; } ADDR_LO_UNION; union { struct { unsigned int addr_63_32 : 32; }; unsigned int DW_2_DATA; } ADDR_HI_UNION; union { struct { unsigned int data : 32; }; unsigned int DW_3_DATA; } DATA_UNION; } SDMA_PKT_FENCE; typedef struct SDMA_PKT_POLL_REGMEM_TAG { union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int reserved_0 : 10; unsigned int hdp_flush : 1; unsigned int reserved_1 : 1; unsigned int func : 3; unsigned int mem_poll : 1; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int addr_31_0 : 32; }; unsigned int DW_1_DATA; } ADDR_LO_UNION; union { struct { unsigned int addr_63_32 : 32; }; unsigned int DW_2_DATA; } ADDR_HI_UNION; union { struct { unsigned int value : 32; }; unsigned int DW_3_DATA; } VALUE_UNION; union { struct { unsigned int mask : 32; }; unsigned int DW_4_DATA; } MASK_UNION; union { struct { unsigned int interval : 16; unsigned int retry_count : 12; unsigned int reserved_0 : 4; }; unsigned int DW_5_DATA; } DW5_UNION; } SDMA_PKT_POLL_REGMEM; typedef struct SDMA_PKT_ATOMIC_TAG { union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int l : 1; unsigned int reserved_0 : 8; unsigned int operation : 7; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int addr_31_0 : 32; }; unsigned int DW_1_DATA; } ADDR_LO_UNION; union { struct { unsigned int addr_63_32 : 32; }; unsigned int DW_2_DATA; } ADDR_HI_UNION; union { struct { unsigned int src_data_31_0 : 32; }; unsigned int DW_3_DATA; } SRC_DATA_LO_UNION; union { struct { unsigned int src_data_63_32 : 32; }; unsigned int DW_4_DATA; } SRC_DATA_HI_UNION; union { struct { unsigned int cmp_data_31_0 : 32; }; unsigned int DW_5_DATA; } CMP_DATA_LO_UNION; union { struct { unsigned int cmp_data_63_32 : 32; }; unsigned int DW_6_DATA; } CMP_DATA_HI_UNION; union { struct { unsigned int loop_interval : 13; unsigned int reserved_0 : 19; }; unsigned int DW_7_DATA; } LOOP_UNION; } SDMA_PKT_ATOMIC; typedef struct SDMA_PKT_TIMESTAMP_TAG { union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int reserved_0 : 16; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int addr_31_0 : 32; }; unsigned int DW_1_DATA; } ADDR_LO_UNION; union { struct { unsigned int addr_63_32 : 32; }; unsigned int DW_2_DATA; } ADDR_HI_UNION; } SDMA_PKT_TIMESTAMP; typedef struct SDMA_PKT_TRAP_TAG { union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int reserved_0 : 16; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int int_ctx : 28; unsigned int reserved_1 : 4; }; unsigned int DW_1_DATA; } INT_CONTEXT_UNION; } SDMA_PKT_TRAP; // HDP flush packet, no parameters. typedef struct SDMA_PKT_HDP_FLUSH_TAG { unsigned int DW_0_DATA; unsigned int DW_1_DATA; unsigned int DW_2_DATA; unsigned int DW_3_DATA; unsigned int DW_4_DATA; unsigned int DW_5_DATA; // Version of gfx9 sDMA microcode introducing SDMA_PKT_HDP_FLUSH static const uint16_t kMinVersion_ = 0x1A5; } SDMA_PKT_HDP_FLUSH; static const SDMA_PKT_HDP_FLUSH hdp_flush_cmd = {0x8, 0x0, 0x80000000, 0x0, 0x0, 0x0}; typedef struct SDMA_PKT_GCR_TAG { union { struct { unsigned int op : 8; unsigned int sub_op : 8; unsigned int : 16; }; unsigned int DW_0_DATA; } HEADER_UNION; union { struct { unsigned int : 7; unsigned int BaseVA_LO : 25; }; unsigned int DW_1_DATA; } WORD1_UNION; union { struct { unsigned int BaseVA_HI : 16; unsigned int GCR_CONTROL_GLI_INV : 2; unsigned int GCR_CONTROL_GL1_RANGE : 2; unsigned int GCR_CONTROL_GLM_WB : 1; unsigned int GCR_CONTROL_GLM_INV : 1; unsigned int GCR_CONTROL_GLK_WB : 1; unsigned int GCR_CONTROL_GLK_INV : 1; unsigned int GCR_CONTROL_GLV_INV : 1; unsigned int GCR_CONTROL_GL1_INV : 1; unsigned int GCR_CONTROL_GL2_US : 1; unsigned int GCR_CONTROL_GL2_RANGE : 2; unsigned int GCR_CONTROL_GL2_DISCARD : 1; unsigned int GCR_CONTROL_GL2_INV : 1; unsigned int GCR_CONTROL_GL2_WB : 1; }; unsigned int DW_2_DATA; } WORD2_UNION; union { struct { unsigned int GCR_CONTROL_RANGE_IS_PA : 1; unsigned int GCR_CONTROL_SEQ : 2; unsigned int : 4; unsigned int LimitVA_LO : 25; }; unsigned int DW_3_DATA; } WORD3_UNION; union { struct { unsigned int LimitVA_HI : 16; unsigned int : 8; unsigned int VMID : 4; unsigned int : 4; }; unsigned int DW_4_DATA; } WORD4_UNION; } SDMA_PKT_GCR; } // namespace amd } // namespace rocr #endif // HSA_RUNTIME_CORE_INC_SDMA_REGISTERS_H_ ROCR-Runtime-rocm-5.0.0/src/core/inc/signal.h000066400000000000000000000550021420110115200205430ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA runtime C++ interface file. #ifndef HSA_RUNTME_CORE_INC_SIGNAL_H_ #define HSA_RUNTME_CORE_INC_SIGNAL_H_ #include #include #include #include #include #include "hsakmt.h" #include "core/common/shared.h" #include "core/inc/checked.h" #include "core/inc/exceptions.h" #include "core/util/utils.h" #include "core/util/locks.h" #include "inc/amd_hsa_signal.h" // Allow hsa_signal_t to be keys in STL structures. namespace std { template <> struct less { __forceinline bool operator()(const hsa_signal_t& x, const hsa_signal_t& y) const { return x.handle < y.handle; } typedef hsa_signal_t first_argument_type; typedef hsa_signal_t second_argument_type; typedef bool result_type; }; } namespace rocr { namespace core { class Agent; class Signal; /// @brief ABI and object conversion struct for signals. May be shared between processes. struct SharedSignal { amd_signal_t amd_signal; uint64_t sdma_start_ts; Signal* core_signal; Check<0x71FCCA6A3D5D5276, true> id; uint8_t reserved[8]; uint64_t sdma_end_ts; uint8_t reserved2[24]; SharedSignal() { memset(&amd_signal, 0, sizeof(amd_signal)); amd_signal.kind = AMD_SIGNAL_KIND_INVALID; core_signal = nullptr; } bool IsValid() const { return (Convert(this).handle != 0) && id.IsValid(); } bool IsIPC() const { return core_signal == nullptr; } void GetSdmaTsAddresses(uint64_t*& start, uint64_t*& end) { /* SDMA timestamps on gfx7xx/8xxx require 32 byte alignment (gfx9xx relaxes alignment to 8 bytes). This conflicts with the frozen format for amd_signal_t so we place the time stamps in sdma_start/end_ts instead (amd_signal.start_ts is also properly aligned). Reading of the timestamps occurs in GetRawTs(). */ start = &sdma_start_ts; end = &sdma_end_ts; } void CopyPrep() { // Clear sdma_end_ts before a copy so we can detect if the copy was done via // SDMA or blit kernel. sdma_start_ts = 0; sdma_end_ts = 0; } void GetRawTs(bool FetchCopyTs, uint64_t& start, uint64_t& end) { /* If the read is for a copy we need to check if it was done by blit kernel or SDMA. Since we clear sdma_start/end_ts during CopyPrep we know it was a SDMA copy if one of those is non-zero. Otherwise return compute kernel stamps from amd_signal. */ if (FetchCopyTs && sdma_end_ts != 0) { start = sdma_start_ts; end = sdma_end_ts; return; } start = amd_signal.start_ts; end = amd_signal.end_ts; } static __forceinline SharedSignal* Convert(hsa_signal_t signal) { SharedSignal* ret = reinterpret_cast(static_cast(signal.handle) - offsetof(SharedSignal, amd_signal)); return ret; } static __forceinline hsa_signal_t Convert(const SharedSignal* signal) { assert(signal != nullptr && "Conversion on null Signal object."); const uint64_t handle = static_cast(reinterpret_cast(&signal->amd_signal)); const hsa_signal_t signal_handle = {handle}; return signal_handle; } }; static_assert(std::is_standard_layout::value, "SharedSignal must remain standard layout for IPC use."); static_assert(std::is_trivially_destructible::value, "SharedSignal must not be modified on delete for IPC use."); static_assert((offsetof(SharedSignal, sdma_start_ts) % 32) == 0, "Bad SDMA time stamp alignment."); static_assert((offsetof(SharedSignal, sdma_end_ts) % 32) == 0, "Bad SDMA time stamp alignment."); static_assert(sizeof(SharedSignal) == 128, "Bad SharedSignal size."); /// @brief Pool class for SharedSignal suitable for use with Shared. class SharedSignalPool_t : private BaseShared { public: SharedSignalPool_t() : block_size_(minblock_) {} ~SharedSignalPool_t() { clear(); } SharedSignal* alloc(); void free(SharedSignal* ptr); void clear(); private: static const size_t minblock_ = 4096 / sizeof(SharedSignal); KernelMutex lock_; std::vector free_list_; std::vector> block_list_; size_t block_size_; }; class LocalSignal { public: // Temporary, for legacy tools lib support. explicit LocalSignal(hsa_signal_value_t initial_value) { local_signal_.shared_object()->amd_signal.value = initial_value; } LocalSignal(hsa_signal_value_t initial_value, bool exportable); SharedSignal* signal() const { return local_signal_.shared_object(); } private: Shared local_signal_; }; /// @brief An abstract base class which helps implement the public hsa_signal_t /// type (an opaque handle) and its associated APIs. At its core, signal uses /// a 32 or 64 bit value. This value can be waitied on or signaled atomically /// using specified memory ordering semantics. class Signal { public: /// @brief Constructor Links and publishes the signal interface object. explicit Signal(SharedSignal* abi_block, bool enableIPC = false) : signal_(abi_block->amd_signal), async_copy_agent_(NULL), refcount_(1) { assert(abi_block != nullptr && "Signal abi_block must not be NULL"); waiting_ = 0; retained_ = 1; if (enableIPC) { abi_block->core_signal = nullptr; registerIpc(); } else { abi_block->core_signal = this; } } /// @brief Interface to discard a signal handle (hsa_signal_t) /// Decrements signal ref count and invokes doDestroySignal() when /// Signal is no longer in use. void DestroySignal() { // If handle is now invalid wake any retained sleepers. if (--refcount_ == 0) CasRelaxed(0, 0); // Release signal, last release will destroy the object. Release(); } /// @brief Converts from this interface class to the public /// hsa_signal_t type - an opaque handle. static __forceinline hsa_signal_t Convert(Signal* signal) { assert(signal != nullptr && "Conversion on null Signal object."); const uint64_t handle = static_cast(reinterpret_cast(&signal->signal_)); const hsa_signal_t signal_handle = {handle}; return signal_handle; } /// @brief Converts from this interface class to the public /// hsa_signal_t type - an opaque handle. static __forceinline const hsa_signal_t Convert(const Signal* signal) { assert(signal != nullptr && "Conversion on null Signal object."); const uint64_t handle = static_cast(reinterpret_cast(&signal->signal_)); const hsa_signal_t signal_handle = {handle}; return signal_handle; } /// @brief Converts from public hsa_signal_t type (an opaque handle) to /// this interface class object. static __forceinline Signal* Convert(hsa_signal_t signal) { if (signal.handle == 0) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, ""); SharedSignal* shared = SharedSignal::Convert(signal); if (!shared->IsValid()) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_SIGNAL, "Signal handle is invalid."); if (shared->IsIPC()) { Signal* ret = lookupIpc(signal); if (ret == nullptr) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_SIGNAL, "Signal handle is invalid."); return ret; } else { return shared->core_signal; } } static Signal* DuplicateHandle(hsa_signal_t signal) { if (signal.handle == 0) return nullptr; SharedSignal* shared = SharedSignal::Convert(signal); if (!shared->IsIPC()) { if (!shared->IsValid()) return nullptr; shared->core_signal->refcount_++; shared->core_signal->Retain(); return shared->core_signal; } // IPC signals may only be duplicated while holding the ipcMap lock. return duplicateIpc(signal); } bool IsValid() const { return refcount_ != 0; } bool __forceinline isIPC() const { return SharedSignal::Convert(Convert(this))->IsIPC(); } // Below are various methods corresponding to the APIs, which load/store the // signal value or modify the existing signal value automically and with // specified memory ordering semantics. virtual hsa_signal_value_t LoadRelaxed() = 0; virtual hsa_signal_value_t LoadAcquire() = 0; virtual void StoreRelaxed(hsa_signal_value_t value) = 0; virtual void StoreRelease(hsa_signal_value_t value) = 0; virtual hsa_signal_value_t WaitRelaxed(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint) = 0; virtual hsa_signal_value_t WaitAcquire(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint) = 0; virtual void AndRelaxed(hsa_signal_value_t value) = 0; virtual void AndAcquire(hsa_signal_value_t value) = 0; virtual void AndRelease(hsa_signal_value_t value) = 0; virtual void AndAcqRel(hsa_signal_value_t value) = 0; virtual void OrRelaxed(hsa_signal_value_t value) = 0; virtual void OrAcquire(hsa_signal_value_t value) = 0; virtual void OrRelease(hsa_signal_value_t value) = 0; virtual void OrAcqRel(hsa_signal_value_t value) = 0; virtual void XorRelaxed(hsa_signal_value_t value) = 0; virtual void XorAcquire(hsa_signal_value_t value) = 0; virtual void XorRelease(hsa_signal_value_t value) = 0; virtual void XorAcqRel(hsa_signal_value_t value) = 0; virtual void AddRelaxed(hsa_signal_value_t value) = 0; virtual void AddAcquire(hsa_signal_value_t value) = 0; virtual void AddRelease(hsa_signal_value_t value) = 0; virtual void AddAcqRel(hsa_signal_value_t value) = 0; virtual void SubRelaxed(hsa_signal_value_t value) = 0; virtual void SubAcquire(hsa_signal_value_t value) = 0; virtual void SubRelease(hsa_signal_value_t value) = 0; virtual void SubAcqRel(hsa_signal_value_t value) = 0; virtual hsa_signal_value_t ExchRelaxed(hsa_signal_value_t value) = 0; virtual hsa_signal_value_t ExchAcquire(hsa_signal_value_t value) = 0; virtual hsa_signal_value_t ExchRelease(hsa_signal_value_t value) = 0; virtual hsa_signal_value_t ExchAcqRel(hsa_signal_value_t value) = 0; virtual hsa_signal_value_t CasRelaxed(hsa_signal_value_t expected, hsa_signal_value_t value) = 0; virtual hsa_signal_value_t CasAcquire(hsa_signal_value_t expected, hsa_signal_value_t value) = 0; virtual hsa_signal_value_t CasRelease(hsa_signal_value_t expected, hsa_signal_value_t value) = 0; virtual hsa_signal_value_t CasAcqRel(hsa_signal_value_t expected, hsa_signal_value_t value) = 0; //------------------------- // implementation specific //------------------------- typedef void* rtti_t; /// @brief Returns the address of the value. virtual hsa_signal_value_t* ValueLocation() const = 0; /// @brief Applies only to InterrupEvent type, returns the event used to. /// Returns NULL for DefaultEvent Type. virtual HsaEvent* EopEvent() = 0; /// @brief Waits until any signal in the list satisfies its condition or /// timeout is reached. /// Returns the index of a satisfied signal. Returns -1 on timeout and /// errors. static uint32_t WaitAny(uint32_t signal_count, const hsa_signal_t* hsa_signals, const hsa_signal_condition_t* conds, const hsa_signal_value_t* values, uint64_t timeout_hint, hsa_wait_state_t wait_hint, hsa_signal_value_t* satisfying_value); __forceinline bool IsType(rtti_t id) { return _IsA(id); } /// @brief Prevents the signal from being destroyed until the matching Release(). void Retain() { retained_++; } void Release(); /// @brief Checks if signal is currently in use by a wait API. bool InWaiting() const { return waiting_ != 0; } // Prep for copy profiling. Store copy agent and ready API block. __forceinline void async_copy_agent(core::Agent* agent) { async_copy_agent_ = agent; core::SharedSignal::Convert(Convert(this))->CopyPrep(); } __forceinline core::Agent* async_copy_agent() { return async_copy_agent_; } void GetSdmaTsAddresses(uint64_t*& start, uint64_t*& end) { core::SharedSignal::Convert(Convert(this))->GetSdmaTsAddresses(start, end); } // Set FetchCopyTs = true when reading time stamps from a copy operation. void GetRawTs(bool FetchCopyTs, uint64_t& start, uint64_t& end) { core::SharedSignal::Convert(Convert(this))->GetRawTs(FetchCopyTs, start, end); } /// @brief Structure which defines key signal elements like type and value. /// Address of this struct is used as a value for the opaque handle of type /// hsa_signal_t provided to the public API. amd_signal_t& signal_; protected: virtual ~Signal(); /// @brief Overrideable deletion function virtual void doDestroySignal() { delete this; } /// @brief Simple RTTI type checking helper /// Returns true if the object can be converted to the query type via /// static_cast. /// Do not use directly. Use IsType in the desired derived type instead. virtual bool _IsA(rtti_t id) const = 0; /// @variable Indicates number of runtime threads waiting on this signal. /// Value of zero means no waits. std::atomic waiting_; /// @variable Pointer to agent used to perform an async copy. core::Agent* async_copy_agent_; private: static KernelMutex ipcLock_; static std::map ipcMap_; static Signal* lookupIpc(hsa_signal_t signal); static Signal* duplicateIpc(hsa_signal_t signal); /// @variable Ref count of this signal's handle (see IPC APIs) std::atomic refcount_; /// @variable Count of handle references and Retain() calls for this handle (see IPC APIs) std::atomic retained_; void registerIpc(); bool deregisterIpc(); DISALLOW_COPY_AND_ASSIGN(Signal); }; /// @brief Handle signal operations which are not for use on doorbells. class DoorbellSignal : public Signal { public: using Signal::Signal; /// @brief This operation is illegal hsa_signal_value_t LoadRelaxed() final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t LoadAcquire() final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t WaitRelaxed(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t WaitAcquire(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint) final override { assert(false); return 0; } /// @brief This operation is illegal void AndRelaxed(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void AndAcquire(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void AndRelease(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void AndAcqRel(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void OrRelaxed(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void OrAcquire(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void OrRelease(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void OrAcqRel(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void XorRelaxed(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void XorAcquire(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void XorRelease(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void XorAcqRel(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void AddRelaxed(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void AddAcquire(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void AddRelease(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void AddAcqRel(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void SubRelaxed(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void SubAcquire(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void SubRelease(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal void SubAcqRel(hsa_signal_value_t value) final override { assert(false); } /// @brief This operation is illegal hsa_signal_value_t ExchRelaxed(hsa_signal_value_t value) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t ExchAcquire(hsa_signal_value_t value) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t ExchRelease(hsa_signal_value_t value) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t ExchAcqRel(hsa_signal_value_t value) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t CasRelaxed(hsa_signal_value_t expected, hsa_signal_value_t value) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t CasAcquire(hsa_signal_value_t expected, hsa_signal_value_t value) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t CasRelease(hsa_signal_value_t expected, hsa_signal_value_t value) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t CasAcqRel(hsa_signal_value_t expected, hsa_signal_value_t value) final override { assert(false); return 0; } /// @brief This operation is illegal hsa_signal_value_t* ValueLocation() const final override { assert(false); return NULL; } /// @brief This operation is illegal HsaEvent* EopEvent() final override { assert(false); return NULL; } protected: /// @brief Disallow destroying doorbell apart from its queue. void doDestroySignal() final override { assert(false); } }; struct hsa_signal_handle { hsa_signal_t signal; hsa_signal_handle() {} hsa_signal_handle(hsa_signal_t Signal) { signal = Signal; } operator hsa_signal_t() { return signal; } Signal* operator->() { return core::Signal::Convert(signal); } }; static_assert( sizeof(hsa_signal_handle) == sizeof(hsa_signal_t), "hsa_signal_handle and hsa_signal_t must have identical binary layout."); static_assert( sizeof(hsa_signal_handle[2]) == sizeof(hsa_signal_t[2]), "hsa_signal_handle and hsa_signal_t must have identical binary layout."); class SignalGroup : public Checked<0xBD35DDDD578F091> { public: static __forceinline hsa_signal_group_t Convert(SignalGroup* group) { const hsa_signal_group_t handle = {static_cast(reinterpret_cast(group))}; return handle; } static __forceinline SignalGroup* Convert(hsa_signal_group_t group) { return reinterpret_cast(static_cast(group.handle)); } SignalGroup(uint32_t num_signals, const hsa_signal_t* signals); ~SignalGroup() { delete[] signals; } bool IsValid() const { if (CheckedType::IsValid() && signals != NULL) return true; return false; } const hsa_signal_t* List() const { return signals; } uint32_t Count() const { return count; } private: hsa_signal_t* signals; const uint32_t count; DISALLOW_COPY_AND_ASSIGN(SignalGroup); }; class SignalDeleter { public: void operator()(Signal* ptr) { ptr->DestroySignal(); } }; using unique_signal_ptr = ::std::unique_ptr; } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/runtime/000077500000000000000000000000001420110115200200255ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_aql_queue.cpp000066400000000000000000001461611420110115200233440ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_aql_queue.h" #ifdef __linux__ #include #include #include #include #include #endif #ifdef _WIN32 #include #endif #include #include #include "core/inc/runtime.h" #include "core/inc/amd_memory_region.h" #include "core/inc/signal.h" #include "core/inc/queue.h" #include "core/util/utils.h" #include "core/inc/registers.h" #include "core/inc/interrupt_signal.h" #include "core/inc/default_signal.h" #include "core/inc/hsa_ext_amd_impl.h" #include "core/inc/amd_gpu_pm4.h" namespace rocr { namespace AMD { // Queue::amd_queue_ is cache-aligned for performance. const uint32_t kAmdQueueAlignBytes = 0x40; HsaEvent* AqlQueue::queue_event_ = nullptr; std::atomic AqlQueue::queue_count_(0); KernelMutex AqlQueue::queue_lock_; int AqlQueue::rtti_id_ = 0; AqlQueue::AqlQueue(GpuAgent* agent, size_t req_size_pkts, HSAuint32 node_id, ScratchInfo& scratch, core::HsaEventCallback callback, void* err_data, bool is_kv) : Queue(), LocalSignal(0, false), DoorbellSignal(signal()), ring_buf_(nullptr), ring_buf_alloc_bytes_(0), queue_id_(HSA_QUEUEID(-1)), active_(false), agent_(agent), queue_scratch_(scratch), errors_callback_(callback), errors_data_(err_data), is_kv_queue_(is_kv), pm4_ib_buf_(nullptr), pm4_ib_size_b_(0x1000), dynamicScratchState(0), exceptionState(0), suspended_(false), priority_(HSA_QUEUE_PRIORITY_NORMAL), exception_signal_(nullptr) { // When queue_full_workaround_ is set to 1, the ring buffer is internally // doubled in size. Virtual addresses in the upper half of the ring allocation // are mapped to the same set of pages backing the lower half. // Values written to the HW doorbell are modulo the doubled size. // This allows the HW to accept (doorbell == last_doorbell + queue_size). // This workaround is required for GFXIP 7 and GFXIP 8 ASICs. const core::Isa* isa = agent_->isa(); queue_full_workaround_ = (isa->GetMajorVersion() == 7 || isa->GetMajorVersion() == 8) ? 1 : 0; // Identify doorbell semantics for this agent. doorbell_type_ = agent->properties().Capability.ui32.DoorbellType; // Queue size is a function of several restrictions. const uint32_t min_pkts = ComputeRingBufferMinPkts(); const uint32_t max_pkts = ComputeRingBufferMaxPkts(); // Apply sizing constraints to the ring buffer. uint32_t queue_size_pkts = uint32_t(req_size_pkts); queue_size_pkts = Min(queue_size_pkts, max_pkts); queue_size_pkts = Max(queue_size_pkts, min_pkts); uint32_t queue_size_bytes = queue_size_pkts * sizeof(core::AqlPacket); if ((queue_size_bytes & (queue_size_bytes - 1)) != 0) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_QUEUE_CREATION, "Requested queue with non-power of two packet capacity.\n"); // Allocate the AQL packet ring buffer. AllocRegisteredRingBuffer(queue_size_pkts); if (ring_buf_ == nullptr) throw std::bad_alloc(); MAKE_NAMED_SCOPE_GUARD(RingGuard, [&]() { FreeRegisteredRingBuffer(); }); // Fill the ring buffer with invalid packet headers. // Leave packet content uninitialized to help track errors. for (uint32_t pkt_id = 0; pkt_id < queue_size_pkts; ++pkt_id) { (((core::AqlPacket*)ring_buf_)[pkt_id]).dispatch.header = HSA_PACKET_TYPE_INVALID; } // Zero the amd_queue_ structure to clear RPTR/WPTR before queue attach. memset(&amd_queue_, 0, sizeof(amd_queue_)); // Initialize and map a HW AQL queue. HsaQueueResource queue_rsrc = {0}; queue_rsrc.Queue_read_ptr_aql = (uint64_t*)&amd_queue_.read_dispatch_id; if (doorbell_type_ == 2) { // Hardware write pointer supports AQL semantics. queue_rsrc.Queue_write_ptr_aql = (uint64_t*)&amd_queue_.write_dispatch_id; } else { // Map hardware write pointer to a software proxy. queue_rsrc.Queue_write_ptr_aql = (uint64_t*)&amd_queue_.max_legacy_doorbell_dispatch_id_plus_1; } // Populate amd_queue_ structure. amd_queue_.hsa_queue.type = HSA_QUEUE_TYPE_MULTI; amd_queue_.hsa_queue.features = HSA_QUEUE_FEATURE_KERNEL_DISPATCH; amd_queue_.hsa_queue.base_address = ring_buf_; amd_queue_.hsa_queue.doorbell_signal = Signal::Convert(this); amd_queue_.hsa_queue.size = queue_size_pkts; amd_queue_.hsa_queue.id = INVALID_QUEUEID; amd_queue_.read_dispatch_id_field_base_byte_offset = uint32_t( uintptr_t(&amd_queue_.read_dispatch_id) - uintptr_t(&amd_queue_)); // Initialize the doorbell signal structure. memset(&signal_, 0, sizeof(signal_)); signal_.kind = (doorbell_type_ == 2) ? AMD_SIGNAL_KIND_DOORBELL : AMD_SIGNAL_KIND_LEGACY_DOORBELL; signal_.legacy_hardware_doorbell_ptr = nullptr; signal_.queue_ptr = &amd_queue_; const auto& props = agent->properties(); amd_queue_.max_cu_id = (props.NumFComputeCores / props.NumSIMDPerCU) - 1; amd_queue_.max_wave_id = (props.MaxWavesPerSIMD * props.NumSIMDPerCU) - 1; #ifdef HSA_LARGE_MODEL AMD_HSA_BITS_SET(amd_queue_.queue_properties, AMD_QUEUE_PROPERTIES_IS_PTR64, 1); #else AMD_HSA_BITS_SET(amd_queue_.queue_properties, AMD_QUEUE_PROPERTIES_IS_PTR64, 0); #endif // Set group and private memory apertures in amd_queue_. auto& regions = agent->regions(); for (auto region : regions) { const MemoryRegion* amdregion = static_cast(region); uint64_t base = amdregion->GetBaseAddress(); if (amdregion->IsLDS()) { #ifdef HSA_LARGE_MODEL amd_queue_.group_segment_aperture_base_hi = uint32_t(uintptr_t(base) >> 32); #else amd_queue_.group_segment_aperture_base_hi = uint32_t(base); #endif } if (amdregion->IsScratch()) { #ifdef HSA_LARGE_MODEL amd_queue_.private_segment_aperture_base_hi = uint32_t(uintptr_t(base) >> 32); #else amd_queue_.private_segment_aperture_base_hi = uint32_t(base); #endif } } assert(amd_queue_.group_segment_aperture_base_hi != 0 && "No group region found."); if (core::Runtime::runtime_singleton_->flag().check_flat_scratch()) { assert(amd_queue_.private_segment_aperture_base_hi != 0 && "No private region found."); } MAKE_NAMED_SCOPE_GUARD(EventGuard, [&]() { ScopedAcquire _lock(&queue_lock_); queue_count_--; if (queue_count_ == 0) { core::InterruptSignal::DestroyEvent(queue_event_); queue_event_ = nullptr; } }); MAKE_NAMED_SCOPE_GUARD(SignalGuard, [&]() { if (amd_queue_.queue_inactive_signal.handle != 0) HSA::hsa_signal_destroy(amd_queue_.queue_inactive_signal); if (exception_signal_ != nullptr) exception_signal_->DestroySignal(); }); if (core::g_use_interrupt_wait) { ScopedAcquire _lock(&queue_lock_); queue_count_++; if (queue_event_ == nullptr) { assert(queue_count_ == 1 && "Inconsistency in queue event reference counting found.\n"); queue_event_ = core::InterruptSignal::CreateEvent(HSA_EVENTTYPE_SIGNAL, false); if (queue_event_ == nullptr) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Queue event creation failed.\n"); } auto Signal = new core::InterruptSignal(0, queue_event_); assert(Signal != nullptr && "Should have thrown!\n"); amd_queue_.queue_inactive_signal = core::InterruptSignal::Convert(Signal); exception_signal_ = new core::InterruptSignal(0, queue_event_); assert(exception_signal_ != nullptr && "Should have thrown!\n"); } else { EventGuard.Dismiss(); auto Signal = new core::DefaultSignal(0); assert(Signal != nullptr && "Should have thrown!\n"); amd_queue_.queue_inactive_signal = core::DefaultSignal::Convert(Signal); exception_signal_ = new core::DefaultSignal(0); assert(exception_signal_ != nullptr && "Should have thrown!\n"); } // Ensure the amd_queue_ is fully initialized before creating the KFD queue. // This ensures that the debugger can access the fields once it detects there // is a KFD queue. The debugger may access the aperture addresses, queue // scratch base, and queue type. HSAKMT_STATUS kmt_status; if (core::Runtime::runtime_singleton_->KfdVersion().supports_exception_debugging) { queue_rsrc.ErrorReason = &exception_signal_->signal_.value; kmt_status = hsaKmtCreateQueue(node_id, HSA_QUEUE_COMPUTE_AQL, 100, priority_, ring_buf_, ring_buf_alloc_bytes_, queue_event_, &queue_rsrc); } else { kmt_status = hsaKmtCreateQueue(node_id, HSA_QUEUE_COMPUTE_AQL, 100, priority_, ring_buf_, ring_buf_alloc_bytes_, NULL, &queue_rsrc); } if (kmt_status != HSAKMT_STATUS_SUCCESS) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Queue create failed at hsaKmtCreateQueue\n"); // Complete populating the doorbell signal structure. signal_.legacy_hardware_doorbell_ptr = (volatile uint32_t*)queue_rsrc.Queue_DoorBell; // Bind Id of Queue such that is unique i.e. it is not re-used by another // queue (AQL, HOST) in the same process during its lifetime. amd_queue_.hsa_queue.id = this->GetQueueId(); queue_id_ = queue_rsrc.QueueId; MAKE_NAMED_SCOPE_GUARD(QueueGuard, [&]() { hsaKmtDestroyQueue(queue_id_); }); // Initialize scratch memory related entities queue_scratch_.queue_retry = amd_queue_.queue_inactive_signal; InitScratchSRD(); if (core::Runtime::runtime_singleton_->KfdVersion().supports_exception_debugging) { if (AMD::hsa_amd_signal_async_handler(amd_queue_.queue_inactive_signal, HSA_SIGNAL_CONDITION_NE, 0, DynamicScratchHandler, this) != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Queue event handler failed registration.\n"); if (AMD::hsa_amd_signal_async_handler(core::Signal::Convert(exception_signal_), HSA_SIGNAL_CONDITION_NE, 0, ExceptionHandler, this) != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Queue event handler failed registration.\n"); } else { if (AMD::hsa_amd_signal_async_handler(amd_queue_.queue_inactive_signal, HSA_SIGNAL_CONDITION_NE, 0, DynamicScratchHandler, this) != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Queue event handler failed registration.\n"); exceptionState = ERROR_HANDLER_DONE; } // Allocate IB for icache flushes. pm4_ib_buf_ = agent_->system_allocator()(pm4_ib_size_b_, 0x1000, core::MemoryRegion::AllocateExecutable); if (pm4_ib_buf_ == nullptr) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "PM4 IB allocation failed.\n"); MAKE_NAMED_SCOPE_GUARD(PM4IBGuard, [&]() { agent_->system_deallocator()(pm4_ib_buf_); }); // Set initial CU mask if (!core::Runtime::runtime_singleton_->flag().cu_mask_skip_init()) SetCUMasking(0, nullptr); active_ = true; PM4IBGuard.Dismiss(); RingGuard.Dismiss(); QueueGuard.Dismiss(); EventGuard.Dismiss(); SignalGuard.Dismiss(); } AqlQueue::~AqlQueue() { // Remove error handler synchronously. // Sequences error handler callbacks with queue destroy. dynamicScratchState |= ERROR_HANDLER_TERMINATE; while ((dynamicScratchState & ERROR_HANDLER_DONE) != ERROR_HANDLER_DONE) { HSA::hsa_signal_store_screlease(amd_queue_.queue_inactive_signal, 0x8000000000000000ull); HSA::hsa_signal_wait_relaxed(amd_queue_.queue_inactive_signal, HSA_SIGNAL_CONDITION_NE, 0x8000000000000000ull, -1ull, HSA_WAIT_STATE_BLOCKED); } // Remove kfd exception handler exceptionState |= ERROR_HANDLER_TERMINATE; while ((exceptionState & ERROR_HANDLER_DONE) != ERROR_HANDLER_DONE) { exception_signal_->StoreRelease(-1ull); exception_signal_->WaitRelaxed(HSA_SIGNAL_CONDITION_NE, -1ull, -1ull, HSA_WAIT_STATE_BLOCKED); } Inactivate(); agent_->ReleaseQueueScratch(queue_scratch_); FreeRegisteredRingBuffer(); exception_signal_->DestroySignal(); HSA::hsa_signal_destroy(amd_queue_.queue_inactive_signal); if (core::g_use_interrupt_wait) { ScopedAcquire lock(&queue_lock_); queue_count_--; if (queue_count_ == 0) { core::InterruptSignal::DestroyEvent(queue_event_); queue_event_ = nullptr; } } agent_->system_deallocator()(pm4_ib_buf_); } void AqlQueue::Destroy() { if (amd_queue_.hsa_queue.type == HSA_QUEUE_TYPE_COOPERATIVE) { agent_->GWSRelease(); return; } delete this; } uint64_t AqlQueue::LoadReadIndexAcquire() { return atomic::Load(&amd_queue_.read_dispatch_id, std::memory_order_acquire); } uint64_t AqlQueue::LoadReadIndexRelaxed() { return atomic::Load(&amd_queue_.read_dispatch_id, std::memory_order_relaxed); } uint64_t AqlQueue::LoadWriteIndexAcquire() { return atomic::Load(&amd_queue_.write_dispatch_id, std::memory_order_acquire); } uint64_t AqlQueue::LoadWriteIndexRelaxed() { return atomic::Load(&amd_queue_.write_dispatch_id, std::memory_order_relaxed); } void AqlQueue::StoreWriteIndexRelaxed(uint64_t value) { atomic::Store(&amd_queue_.write_dispatch_id, value, std::memory_order_relaxed); } void AqlQueue::StoreWriteIndexRelease(uint64_t value) { atomic::Store(&amd_queue_.write_dispatch_id, value, std::memory_order_release); } uint64_t AqlQueue::CasWriteIndexAcqRel(uint64_t expected, uint64_t value) { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_acq_rel); } uint64_t AqlQueue::CasWriteIndexAcquire(uint64_t expected, uint64_t value) { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_acquire); } uint64_t AqlQueue::CasWriteIndexRelaxed(uint64_t expected, uint64_t value) { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_relaxed); } uint64_t AqlQueue::CasWriteIndexRelease(uint64_t expected, uint64_t value) { return atomic::Cas(&amd_queue_.write_dispatch_id, value, expected, std::memory_order_release); } uint64_t AqlQueue::AddWriteIndexAcqRel(uint64_t value) { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_acq_rel); } uint64_t AqlQueue::AddWriteIndexAcquire(uint64_t value) { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_acquire); } uint64_t AqlQueue::AddWriteIndexRelaxed(uint64_t value) { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_relaxed); } uint64_t AqlQueue::AddWriteIndexRelease(uint64_t value) { return atomic::Add(&amd_queue_.write_dispatch_id, value, std::memory_order_release); } void AqlQueue::StoreRelaxed(hsa_signal_value_t value) { if (doorbell_type_ == 2) { // Hardware doorbell supports AQL semantics. atomic::Store(signal_.hardware_doorbell_ptr, uint64_t(value), std::memory_order_release); return; } // Acquire spinlock protecting the legacy doorbell. while (atomic::Cas(&amd_queue_.legacy_doorbell_lock, 1U, 0U, std::memory_order_acquire) != 0) { os::YieldThread(); } #ifdef HSA_LARGE_MODEL // AMD hardware convention expects the packet index to point beyond // the last packet to be processed. Packet indices written to the // max_legacy_doorbell_dispatch_id_plus_1 field must conform to this // expectation, since this field is used as the HW-visible write index. uint64_t legacy_dispatch_id = value + 1; #else // In the small machine model it is difficult to distinguish packet index // wrap at 2^32 packets from a backwards doorbell. Instead, ignore the // doorbell value and submit the write index instead. It is OK to issue // a doorbell for packets in the INVALID or ALWAYS_RESERVED state. // The HW will stall on these packets until they enter a valid state. uint64_t legacy_dispatch_id = amd_queue_.write_dispatch_id; // The write index may extend more than a full queue of packets beyond // the read index. The hardware can process at most a full queue of packets // at a time. Clamp the write index appropriately. A doorbell for the // remaining packets is guaranteed to be sent at a later time. legacy_dispatch_id = Min(legacy_dispatch_id, uint64_t(amd_queue_.read_dispatch_id) + amd_queue_.hsa_queue.size); #endif // Discard backwards and duplicate doorbells. if (legacy_dispatch_id > amd_queue_.max_legacy_doorbell_dispatch_id_plus_1) { // Record the most recent packet index used in a doorbell submission. // This field will be interpreted as a write index upon HW queue connect. // Make ring buffer visible to HW before updating write index. atomic::Store(&amd_queue_.max_legacy_doorbell_dispatch_id_plus_1, legacy_dispatch_id, std::memory_order_release); // Write the dispatch id to the hardware MMIO doorbell. // Make write index visible to HW before sending doorbell. if (doorbell_type_ == 0) { // The legacy GFXIP 7 hardware doorbell expects: // 1. Packet index wrapped to a point within the ring buffer // 2. Packet index converted to DWORD count uint64_t queue_size_mask = ((1 + queue_full_workaround_) * amd_queue_.hsa_queue.size) - 1; atomic::Store(signal_.legacy_hardware_doorbell_ptr, uint32_t((legacy_dispatch_id & queue_size_mask) * (sizeof(core::AqlPacket) / sizeof(uint32_t))), std::memory_order_release); } else if (doorbell_type_ == 1) { atomic::Store(signal_.legacy_hardware_doorbell_ptr, uint32_t(legacy_dispatch_id), std::memory_order_release); } else { assert(false && "Agent has unsupported doorbell semantics"); } } // Release spinlock protecting the legacy doorbell. // Also ensures timely delivery of (write-combined) doorbell to HW. atomic::Store(&amd_queue_.legacy_doorbell_lock, 0U, std::memory_order_release); } void AqlQueue::StoreRelease(hsa_signal_value_t value) { std::atomic_thread_fence(std::memory_order_release); StoreRelaxed(value); } uint32_t AqlQueue::ComputeRingBufferMinPkts() { // From CP_HQD_PQ_CONTROL.QUEUE_SIZE specification: // Size of the primary queue (PQ) will be: 2^(HQD_QUEUE_SIZE+1) DWs. // Min Size is 7 (2^8 = 256 DWs) and max size is 29 (2^30 = 1 G-DW) uint32_t min_bytes = 0x400; if (queue_full_workaround_ == 1) { #ifdef __linux__ // Double mapping requires one page of backing store. min_bytes = Max(min_bytes, 0x1000U); #endif #ifdef _WIN32 // Shared memory mapping is at system allocation granularity. SYSTEM_INFO sys_info; GetNativeSystemInfo(&sys_info); min_bytes = Max(min_bytes, uint32_t(sys_info.dwAllocationGranularity)); #endif } return uint32_t(min_bytes / sizeof(core::AqlPacket)); } uint32_t AqlQueue::ComputeRingBufferMaxPkts() { // From CP_HQD_PQ_CONTROL.QUEUE_SIZE specification: // Size of the primary queue (PQ) will be: 2^(HQD_QUEUE_SIZE+1) DWs. // Min Size is 7 (2^8 = 256 DWs) and max size is 29 (2^30 = 1 G-DW) uint64_t max_bytes = 0x100000000; if (queue_full_workaround_ == 1) { // Double mapping halves maximum size. max_bytes /= 2; } return uint32_t(max_bytes / sizeof(core::AqlPacket)); } void AqlQueue::AllocRegisteredRingBuffer(uint32_t queue_size_pkts) { if ((agent_->profile() == HSA_PROFILE_FULL) && queue_full_workaround_) { // Compute the physical and virtual size of the queue. uint32_t ring_buf_phys_size_bytes = uint32_t(queue_size_pkts * sizeof(core::AqlPacket)); ring_buf_alloc_bytes_ = 2 * ring_buf_phys_size_bytes; #ifdef __linux__ // Create a system-unique shared memory path for this thread. char ring_buf_shm_path[16]; pid_t sys_unique_tid = pid_t(syscall(__NR_gettid)); sprintf(ring_buf_shm_path, "/%u", sys_unique_tid); int ring_buf_shm_fd = -1; void* reserve_va = NULL; ring_buf_shm_fd = CreateRingBufferFD(ring_buf_shm_path, ring_buf_phys_size_bytes); if (ring_buf_shm_fd == -1) { return; } // Reserve a VA range twice the size of the physical backing store. reserve_va = mmap(NULL, ring_buf_alloc_bytes_, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); assert(reserve_va != MAP_FAILED && "mmap failed"); // Remap the lower and upper halves of the VA range. // Map both halves to the shared memory backing store. // If the GPU device is KV, do not set PROT_EXEC flag. void* ring_buf_lower_half = NULL; void* ring_buf_upper_half = NULL; if (is_kv_queue_) { ring_buf_lower_half = mmap(reserve_va, ring_buf_phys_size_bytes, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, ring_buf_shm_fd, 0); assert(ring_buf_lower_half != MAP_FAILED && "mmap failed"); ring_buf_upper_half = mmap((void*)(uintptr_t(reserve_va) + ring_buf_phys_size_bytes), ring_buf_phys_size_bytes, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, ring_buf_shm_fd, 0); assert(ring_buf_upper_half != MAP_FAILED && "mmap failed"); } else { ring_buf_lower_half = mmap(reserve_va, ring_buf_phys_size_bytes, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_FIXED, ring_buf_shm_fd, 0); assert(ring_buf_lower_half != MAP_FAILED && "mmap failed"); ring_buf_upper_half = mmap((void*)(uintptr_t(reserve_va) + ring_buf_phys_size_bytes), ring_buf_phys_size_bytes, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_FIXED, ring_buf_shm_fd, 0); assert(ring_buf_upper_half != MAP_FAILED && "mmap failed"); } // Successfully created mapping. ring_buf_ = ring_buf_lower_half; // Release explicit reference to shared memory object. CloseRingBufferFD(ring_buf_shm_path, ring_buf_shm_fd); return; #endif #ifdef _WIN32 HANDLE ring_buf_mapping = INVALID_HANDLE_VALUE; void* ring_buf_lower_half = NULL; void* ring_buf_upper_half = NULL; do { // Create a page file mapping to back the ring buffer. ring_buf_mapping = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_EXECUTE_READWRITE | SEC_COMMIT, 0, ring_buf_phys_size_bytes, NULL); if (ring_buf_mapping == NULL) { break; } // Retry until obtaining an appropriate virtual address mapping. for (int num_attempts = 0; num_attempts < 1000; ++num_attempts) { // Find a virtual address range twice the size of the file mapping. void* reserve_va = VirtualAllocEx(GetCurrentProcess(), NULL, ring_buf_alloc_bytes_, MEM_TOP_DOWN | MEM_RESERVE, PAGE_EXECUTE_READWRITE); if (reserve_va == NULL) { break; } VirtualFree(reserve_va, 0, MEM_RELEASE); // Map the ring buffer into the free virtual range. // This may fail: another thread can allocate in this range. ring_buf_lower_half = MapViewOfFileEx( ring_buf_mapping, FILE_MAP_ALL_ACCESS | FILE_MAP_EXECUTE, 0, 0, ring_buf_phys_size_bytes, reserve_va); if (ring_buf_lower_half == NULL) { // Virtual range allocated by another thread, try again. continue; } ring_buf_upper_half = MapViewOfFileEx( ring_buf_mapping, FILE_MAP_ALL_ACCESS | FILE_MAP_EXECUTE, 0, 0, ring_buf_phys_size_bytes, (void*)(uintptr_t(reserve_va) + ring_buf_phys_size_bytes)); if (ring_buf_upper_half == NULL) { // Virtual range allocated by another thread, try again. UnmapViewOfFile(ring_buf_lower_half); continue; } // Successfully created mapping. ring_buf_ = ring_buf_lower_half; break; } if (ring_buf_ == NULL) { break; } // Release file mapping (reference counted by views). CloseHandle(ring_buf_mapping); // Don't register the memory: causes a failure in the KFD. // Instead use implicit registration to access the ring buffer. return; } while (false); // Resource cleanup on failure. UnmapViewOfFile(ring_buf_upper_half); UnmapViewOfFile(ring_buf_lower_half); CloseHandle(ring_buf_mapping); #endif } else { // Allocate storage for the ring buffer. ring_buf_alloc_bytes_ = AlignUp( queue_size_pkts * sizeof(core::AqlPacket), 4096); ring_buf_ = agent_->system_allocator()( ring_buf_alloc_bytes_, 0x1000, core::MemoryRegion::AllocateExecutable | (queue_full_workaround_ ? core::MemoryRegion::AllocateDoubleMap : 0)); assert(ring_buf_ != NULL && "AQL queue memory allocation failure"); // The virtual ring allocation is twice as large as requested. // Each half maps to the same set of physical pages. if (queue_full_workaround_) ring_buf_alloc_bytes_ *= 2; } } void AqlQueue::FreeRegisteredRingBuffer() { if ((agent_->profile() == HSA_PROFILE_FULL) && queue_full_workaround_) { #ifdef __linux__ munmap(ring_buf_, ring_buf_alloc_bytes_); #endif #ifdef _WIN32 UnmapViewOfFile(ring_buf_); UnmapViewOfFile( (void*)(uintptr_t(ring_buf_) + (ring_buf_alloc_bytes_ / 2))); #endif } else { agent_->system_deallocator()(ring_buf_); } ring_buf_ = NULL; ring_buf_alloc_bytes_ = 0; } void AqlQueue::CloseRingBufferFD(const char* ring_buf_shm_path, int fd) const { #ifdef __linux__ #if !defined(HAVE_MEMFD_CREATE) shm_unlink(ring_buf_shm_path); #endif close(fd); #else assert(false && "Function only needed on Linux."); #endif } int AqlQueue::CreateRingBufferFD(const char* ring_buf_shm_path, uint32_t ring_buf_phys_size_bytes) const { #ifdef __linux__ int fd; #ifdef HAVE_MEMFD_CREATE fd = syscall(__NR_memfd_create, ring_buf_shm_path, 0); if (fd == -1) return -1; if (ftruncate(fd, ring_buf_phys_size_bytes) == -1) { CloseRingBufferFD(ring_buf_shm_path, fd); return -1; } #else fd = shm_open(ring_buf_shm_path, O_CREAT | O_RDWR | O_EXCL, S_IRUSR | S_IWUSR); if (fd == -1) return -1; if (posix_fallocate(fd, 0, ring_buf_phys_size_bytes) != 0) { CloseRingBufferFD(ring_buf_shm_path, fd); return -1; } #endif return fd; #else assert(false && "Function only needed on Linux."); return -1; #endif } void AqlQueue::Suspend() { suspended_ = true; auto err = hsaKmtUpdateQueue(queue_id_, 0, priority_, ring_buf_, ring_buf_alloc_bytes_, NULL); assert(err == HSAKMT_STATUS_SUCCESS && "hsaKmtUpdateQueue failed."); } hsa_status_t AqlQueue::Inactivate() { bool active = active_.exchange(false, std::memory_order_relaxed); if (active) { auto err = hsaKmtDestroyQueue(queue_id_); assert(err == HSAKMT_STATUS_SUCCESS && "hsaKmtDestroyQueue failed."); atomic::Fence(std::memory_order_acquire); } return HSA_STATUS_SUCCESS; } hsa_status_t AqlQueue::SetPriority(HSA_QUEUE_PRIORITY priority) { if (suspended_) { return HSA_STATUS_ERROR_INVALID_QUEUE; } priority_ = priority; auto err = hsaKmtUpdateQueue(queue_id_, 100, priority_, ring_buf_, ring_buf_alloc_bytes_, NULL); return (err == HSAKMT_STATUS_SUCCESS ? HSA_STATUS_SUCCESS : HSA_STATUS_ERROR_OUT_OF_RESOURCES); } template bool AqlQueue::DynamicScratchHandler(hsa_signal_value_t error_code, void* arg) { AqlQueue* queue = (AqlQueue*)arg; hsa_status_t errorCode = HSA_STATUS_SUCCESS; bool fatal = false; bool changeWait = false; hsa_signal_value_t waitVal; if ((queue->dynamicScratchState & ERROR_HANDLER_SCRATCH_RETRY) == ERROR_HANDLER_SCRATCH_RETRY) { queue->dynamicScratchState &= ~ERROR_HANDLER_SCRATCH_RETRY; changeWait = true; waitVal = 0; HSA::hsa_signal_and_relaxed(queue->amd_queue_.queue_inactive_signal, ~0x8000000000000000ull); error_code &= ~0x8000000000000000ull; } // Process errors only if queue is not terminating. if ((queue->dynamicScratchState & ERROR_HANDLER_TERMINATE) != ERROR_HANDLER_TERMINATE) { if (error_code == 512) { // Large scratch reclaim auto& scratch = queue->queue_scratch_; queue->agent_->ReleaseQueueScratch(scratch); scratch.queue_base = nullptr; scratch.size = 0; scratch.size_per_thread = 0; scratch.queue_process_offset = 0; queue->InitScratchSRD(); HSA::hsa_signal_store_relaxed(queue->amd_queue_.queue_inactive_signal, 0); // Resumes queue processing. atomic::Store(&queue->amd_queue_.queue_properties, queue->amd_queue_.queue_properties & (~AMD_QUEUE_PROPERTIES_USE_SCRATCH_ONCE), std::memory_order_release); atomic::Fence(std::memory_order_release); return true; } // Process only one queue error. if (error_code & 0x401) { // insufficient scratch, wave64 or wave32 // Insufficient scratch - recoverable, don't process dynamic scratch if errors are present. auto& scratch = queue->queue_scratch_; queue->agent_->ReleaseQueueScratch(scratch); uint64_t pkt_slot_idx = queue->amd_queue_.read_dispatch_id & (queue->amd_queue_.hsa_queue.size - 1); core::AqlPacket& pkt = ((core::AqlPacket*)queue->amd_queue_.hsa_queue.base_address)[pkt_slot_idx]; assert(pkt.IsValid() && "Invalid packet in dynamic scratch handler."); assert(pkt.type() == HSA_PACKET_TYPE_KERNEL_DISPATCH && "Invalid packet in dynamic scratch handler."); uint32_t scratch_request = pkt.dispatch.private_segment_size; const uint32_t MaxScratchSlots = (queue->amd_queue_.max_cu_id + 1) * queue->agent_->properties().MaxSlotsScratchCU; scratch.size_per_thread = scratch_request; scratch.lanes_per_wave = (error_code & 0x400) ? 32 : 64; // Align whole waves to 1KB. scratch.size_per_thread = AlignUp(scratch.size_per_thread, 1024 / scratch.lanes_per_wave); scratch.size = scratch.size_per_thread * MaxScratchSlots * scratch.lanes_per_wave; uint64_t lanes_per_group = (uint64_t(pkt.dispatch.workgroup_size_x) * pkt.dispatch.workgroup_size_y) * pkt.dispatch.workgroup_size_z; uint64_t waves_per_group = (lanes_per_group + scratch.lanes_per_wave - 1) / scratch.lanes_per_wave; scratch.waves_per_group = waves_per_group; uint64_t groups = ((uint64_t(pkt.dispatch.grid_size_x) + pkt.dispatch.workgroup_size_x - 1) / pkt.dispatch.workgroup_size_x) * ((uint64_t(pkt.dispatch.grid_size_y) + pkt.dispatch.workgroup_size_y - 1) / pkt.dispatch.workgroup_size_y) * ((uint64_t(pkt.dispatch.grid_size_z) + pkt.dispatch.workgroup_size_z - 1) / pkt.dispatch.workgroup_size_z); // Assign an equal number of groups to each engine, clipping to capacity limits const uint32_t engines = queue->agent_->properties().NumShaderBanks; groups = ((groups + engines - 1) / engines) * engines; scratch.wanted_slots = groups * waves_per_group; scratch.wanted_slots = Min(scratch.wanted_slots, uint64_t(MaxScratchSlots)); scratch.dispatch_size = scratch.size_per_thread * scratch.wanted_slots * scratch.lanes_per_wave; queue->agent_->AcquireQueueScratch(scratch); if (scratch.retry) { queue->dynamicScratchState |= ERROR_HANDLER_SCRATCH_RETRY; changeWait = true; waitVal = error_code; } else { // Out of scratch - promote error if (scratch.queue_base == nullptr) { errorCode = HSA_STATUS_ERROR_OUT_OF_RESOURCES; } else { // Mark large scratch allocation for single use. if (scratch.large) { queue->amd_queue_.queue_properties |= AMD_QUEUE_PROPERTIES_USE_SCRATCH_ONCE; // Set system release fence to flush scratch stores with older firmware versions. if ((queue->agent_->isa()->GetMajorVersion() == 8) && (queue->agent_->GetMicrocodeVersion() < 729)) { pkt.dispatch.header &= ~(((1 << HSA_PACKET_HEADER_WIDTH_SCRELEASE_FENCE_SCOPE) - 1) << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE); pkt.dispatch.header |= (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE); } } // Reset scratch memory related entities for the queue queue->InitScratchSRD(); // Restart the queue. HSA::hsa_signal_store_screlease(queue->amd_queue_.queue_inactive_signal, 0); } } } else if (HandleExceptions) { if ((error_code & 2) == 2) { // Invalid dim errorCode = HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS; } else if ((error_code & 4) == 4) { // Invalid group memory errorCode = HSA_STATUS_ERROR_INVALID_ALLOCATION; } else if ((error_code & 8) == 8) { // Invalid (or NULL) code errorCode = HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } else if (((error_code & 32) == 32) || // Invalid format: 32 is generic, ((error_code & 256) == 256)) { // 256 is vendor specific packets errorCode = HSA_STATUS_ERROR_INVALID_PACKET_FORMAT; } else if ((error_code & 64) == 64) { // Group is too large errorCode = HSA_STATUS_ERROR_INVALID_ARGUMENT; } else if ((error_code & 128) == 128) { // Out of VGPRs errorCode = HSA_STATUS_ERROR_INVALID_ISA; } else if ((error_code & 0x20000000) == 0x20000000) { // Memory violation (>48-bit) errorCode = hsa_status_t(HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION); } else if ((error_code & 0x40000000) == 0x40000000) { // Illegal instruction errorCode = hsa_status_t(HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION); } else if ((error_code & 0x80000000) == 0x80000000) { // Debug trap errorCode = HSA_STATUS_ERROR_EXCEPTION; fatal = true; } else { // Undefined code assert(false && "Undefined queue error code"); errorCode = HSA_STATUS_ERROR; fatal = true; } } else { // Not handling exceptions, clear so that ExceptionHandler can run. HSA::hsa_signal_store_relaxed(queue->amd_queue_.queue_inactive_signal, 0); } if (errorCode == HSA_STATUS_SUCCESS) { if (changeWait) { core::Runtime::runtime_singleton_->SetAsyncSignalHandler( queue->amd_queue_.queue_inactive_signal, HSA_SIGNAL_CONDITION_NE, waitVal, DynamicScratchHandler, queue); return false; } return true; } queue->Suspend(); if (queue->errors_callback_ != nullptr) { queue->errors_callback_(errorCode, queue->public_handle(), queue->errors_data_); } if (fatal) { // Temporarilly removed until there is clarity on exactly what debugtrap's semantics are. // assert(false && "Fatal queue error"); // std::abort(); } } // Copy here is to protect against queue being released between setting the scratch state and // updating the signal value. The signal itself is safe to use because it is ref counted rather // than being released with the queue. hsa_signal_t signal = queue->amd_queue_.queue_inactive_signal; queue->dynamicScratchState = ERROR_HANDLER_DONE; HSA::hsa_signal_store_screlease(signal, -1ull); return false; } bool AqlQueue::ExceptionHandler(hsa_signal_value_t error_code, void* arg) { struct queue_error_t { uint32_t code; hsa_status_t status; }; static const queue_error_t QueueErrors[] = { // EC_QUEUE_WAVE_ABORT 1, HSA_STATUS_ERROR_EXCEPTION, // EC_QUEUE_WAVE_TRAP 2, HSA_STATUS_ERROR_EXCEPTION, // EC_QUEUE_WAVE_MATH_ERROR 3, HSA_STATUS_ERROR_EXCEPTION, // EC_QUEUE_WAVE_ILLEGAL_INSTRUCTION 4, (hsa_status_t)HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION, // EC_QUEUE_WAVE_MEMORY_VIOLATION 5, (hsa_status_t)HSA_STATUS_ERROR_MEMORY_FAULT, // EC_QUEUE_WAVE_APERTURE_VIOLATION 6, (hsa_status_t)HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION, // EC_QUEUE_PACKET_DISPATCH_DIM_INVALID 16, HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS, // EC_QUEUE_PACKET_DISPATCH_GROUP_SEGMENT_SIZE_INVALID 17, HSA_STATUS_ERROR_INVALID_ALLOCATION, // EC_QUEUE_PACKET_DISPATCH_CODE_INVALID 18, HSA_STATUS_ERROR_INVALID_CODE_OBJECT, // EC_QUEUE_PACKET_UNSUPPORTED 20, HSA_STATUS_ERROR_INVALID_PACKET_FORMAT, // EC_QUEUE_PACKET_DISPATCH_WORK_GROUP_SIZE_INVALID 21, HSA_STATUS_ERROR_INVALID_ARGUMENT, // EC_QUEUE_PACKET_DISPATCH_REGISTER_SIZE_INVALID 22, HSA_STATUS_ERROR_INVALID_ISA, // EC_QUEUE_PACKET_VENDOR_UNSUPPORTED 23, HSA_STATUS_ERROR_INVALID_PACKET_FORMAT, // EC_QUEUE_PREEMPTION_ERROR 31, HSA_STATUS_ERROR, // EC_DEVICE_MEMORY_VIOLATION 33, (hsa_status_t)HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION, // EC_DEVICE_RAS_ERROR 34, HSA_STATUS_ERROR, // EC_DEVICE_FATAL_HALT 35, HSA_STATUS_ERROR, // EC_DEVICE_NEW 36, HSA_STATUS_ERROR, // EC_PROCESS_DEVICE_REMOVE 50, HSA_STATUS_ERROR}; AqlQueue* queue = (AqlQueue*)arg; hsa_status_t errorCode = HSA_STATUS_ERROR; if (queue->exceptionState == ERROR_HANDLER_TERMINATE) { Signal* signal = queue->exception_signal_; queue->exceptionState = ERROR_HANDLER_DONE; signal->StoreRelease(0); return false; } for (auto& error : QueueErrors) { if (error_code & (1 << (error.code - 1))) { errorCode = error.status; break; } } // Undefined or unexpected code assert((errorCode != HSA_STATUS_ERROR) && "Undefined or unexpected queue error code"); queue->Suspend(); if (queue->errors_callback_ != nullptr) { queue->errors_callback_(errorCode, queue->public_handle(), queue->errors_data_); } Signal* signal = queue->exception_signal_; queue->exceptionState = ERROR_HANDLER_DONE; signal->StoreRelease(0); return false; } hsa_status_t AqlQueue::SetCUMasking(uint32_t num_cu_mask_count, const uint32_t* cu_mask) { uint32_t cu_count; agent_->GetInfo((hsa_agent_info_t)HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, &cu_count); size_t mask_dwords = (cu_count + 31) / 32; // Mask to trim the last uint32_t in cu_mask to the physical CU count uint32_t tail_mask = (1 << (cu_count % 32)) - 1; auto global_mask = core::Runtime::runtime_singleton_->flag().cu_mask(agent_->enumeration_index()); std::vector mask; bool clipped = false; // num_cu_mask_count = 0 resets the CU mask. if (num_cu_mask_count == 0) { for (int i = 0; i < mask_dwords; i++) mask.push_back(-1); } else { for (int i = 0; i < num_cu_mask_count / 32; i++) mask.push_back(cu_mask[i]); } // Apply global mask to user mask if (!global_mask.empty()) { // Limit mask processing to smallest needed dword range size_t limit = Min(global_mask.size(), mask.size(), mask_dwords); // Check for disabling requested cus. for (int i = limit; i < mask.size(); i++) { if (mask[i] != 0) { clipped = true; break; } } mask.resize(limit, 0); for (size_t i = 0; i < limit; i++) { clipped |= ((mask[i] & (~global_mask[i])) != 0); mask[i] &= global_mask[i]; } } else { // Limit to physical CU range only size_t limit = Min(mask.size(), mask_dwords); mask.resize(limit, 0); } // Clip last dword to physical CU limit if necessary if ((mask.size() == mask_dwords) && (tail_mask != 0)) mask[mask_dwords - 1] &= tail_mask; // Apply mask if non-default or not queue initialization. ScopedAcquire lock(&mask_lock_); if ((!cu_mask_.empty()) || (num_cu_mask_count != 0) || (!global_mask.empty())) { HSAKMT_STATUS ret = hsaKmtSetQueueCUMask(queue_id_, mask.size() * 32, reinterpret_cast(&mask[0])); if (ret != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR; } // update current cu masking tracking. cu_mask_ = std::move(mask); return clipped ? (hsa_status_t)HSA_STATUS_CU_MASK_REDUCED : HSA_STATUS_SUCCESS; } hsa_status_t AqlQueue::GetCUMasking(uint32_t num_cu_mask_count, uint32_t* cu_mask) { ScopedAcquire lock(&mask_lock_); assert(!cu_mask_.empty() && "No current cu_mask!"); uint32_t user_dword_count = num_cu_mask_count / 32; if (user_dword_count > cu_mask_.size()) { memset(&cu_mask[cu_mask_.size()], 0, sizeof(uint32_t) * (user_dword_count - cu_mask_.size())); user_dword_count = cu_mask_.size(); } memcpy(cu_mask, &cu_mask_[0], sizeof(uint32_t) * user_dword_count); return HSA_STATUS_SUCCESS; } void AqlQueue::ExecutePM4(uint32_t* cmd_data, size_t cmd_size_b) { // pm4_ib_buf_ is a shared resource, so mutually exclude here. ScopedAcquire lock(&pm4_ib_mutex_); // Obtain reference to any container queue. core::Queue* queue = core::Queue::Convert(public_handle()); // Obtain a queue slot for a single AQL packet. uint64_t write_idx = queue->AddWriteIndexAcqRel(1); while ((write_idx - queue->LoadReadIndexRelaxed()) >= queue->amd_queue_.hsa_queue.size) { os::YieldThread(); } uint32_t slot_idx = uint32_t(write_idx % queue->amd_queue_.hsa_queue.size); constexpr uint32_t slot_size_b = 0x40; uint32_t* queue_slot = (uint32_t*)(uintptr_t(queue->amd_queue_.hsa_queue.base_address) + (slot_idx * slot_size_b)); // Copy client PM4 command into IB. assert(cmd_size_b < pm4_ib_size_b_ && "PM4 exceeds IB size"); memcpy(pm4_ib_buf_, cmd_data, cmd_size_b); // Construct a PM4 command to execute the IB. constexpr uint32_t ib_jump_size_dw = 4; uint32_t ib_jump_cmd[ib_jump_size_dw] = { PM4_HDR(PM4_HDR_IT_OPCODE_INDIRECT_BUFFER, ib_jump_size_dw, agent_->isa()->GetMajorVersion()), PM4_INDIRECT_BUFFER_DW1_IB_BASE_LO(uint32_t(uintptr_t(pm4_ib_buf_) >> 2)), PM4_INDIRECT_BUFFER_DW2_IB_BASE_HI(uint32_t(uintptr_t(pm4_ib_buf_) >> 32)), (PM4_INDIRECT_BUFFER_DW3_IB_SIZE(uint32_t(cmd_size_b / sizeof(uint32_t))) | PM4_INDIRECT_BUFFER_DW3_IB_VALID(1))}; // To respect multi-producer semantics, first buffer commands for the queue slot. constexpr uint32_t slot_size_dw = uint32_t(slot_size_b / sizeof(uint32_t)); uint32_t slot_data[slot_size_dw]; if (agent_->isa()->GetMajorVersion() <= 8) { // Construct a set of PM4 to fit inside the AQL packet slot. uint32_t slot_dw_idx = 0; // Construct a no-op command to pad the queue slot. constexpr uint32_t rel_mem_size_dw = 7; constexpr uint32_t nop_pad_size_dw = slot_size_dw - (ib_jump_size_dw + rel_mem_size_dw); uint32_t* nop_pad = &slot_data[slot_dw_idx]; slot_dw_idx += nop_pad_size_dw; nop_pad[0] = PM4_HDR(PM4_HDR_IT_OPCODE_NOP, nop_pad_size_dw, agent_->isa()->GetMajorVersion()); for (uint32_t i = 1; i < nop_pad_size_dw; ++i) { nop_pad[i] = 0; } // Copy in command to execute the IB. assert(slot_dw_idx + ib_jump_size_dw <= slot_size_dw && "PM4 exceeded queue slot size"); uint32_t* ib_jump = &slot_data[slot_dw_idx]; slot_dw_idx += ib_jump_size_dw; memcpy(ib_jump, ib_jump_cmd, sizeof(ib_jump_cmd)); // Construct a command to advance the read index and invalidate the packet // header. This must be the last command since this releases the queue slot // for writing. assert(slot_dw_idx + rel_mem_size_dw <= slot_size_dw && "PM4 exceeded queue slot size"); uint32_t* rel_mem = &slot_data[slot_dw_idx]; rel_mem[0] = PM4_HDR(PM4_HDR_IT_OPCODE_RELEASE_MEM, rel_mem_size_dw, agent_->isa()->GetMajorVersion()); rel_mem[1] = PM4_RELEASE_MEM_DW1_EVENT_INDEX(PM4_RELEASE_MEM_EVENT_INDEX_AQL); rel_mem[2] = 0; rel_mem[3] = 0; rel_mem[4] = 0; rel_mem[5] = 0; rel_mem[6] = 0; } else if (agent_->isa()->GetMajorVersion() >= 9) { // Construct an AQL packet to jump to the PM4 IB. struct amd_aql_pm4_ib { uint16_t header; uint16_t ven_hdr; uint32_t ib_jump_cmd[4]; uint32_t dw_cnt_remain; uint32_t reserved[8]; hsa_signal_t completion_signal; }; constexpr uint32_t AMD_AQL_FORMAT_PM4_IB = 0x1; amd_aql_pm4_ib aql_pm4_ib{}; aql_pm4_ib.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE; aql_pm4_ib.ven_hdr = AMD_AQL_FORMAT_PM4_IB; aql_pm4_ib.ib_jump_cmd[0] = ib_jump_cmd[0]; aql_pm4_ib.ib_jump_cmd[1] = ib_jump_cmd[1]; aql_pm4_ib.ib_jump_cmd[2] = ib_jump_cmd[2]; aql_pm4_ib.ib_jump_cmd[3] = ib_jump_cmd[3]; aql_pm4_ib.dw_cnt_remain = 0xA; memcpy(slot_data, &aql_pm4_ib, sizeof(aql_pm4_ib)); } else { assert(false && "AqlQueue::ExecutePM4 not implemented"); } // Copy buffered commands into the queue slot. // Overwrite the AQL invalid header (first dword) last. // This prevents the slot from being read until it's fully written. memcpy(&queue_slot[1], &slot_data[1], slot_size_b - sizeof(uint32_t)); atomic::Store(&queue_slot[0], slot_data[0], std::memory_order_release); // Submit the packet slot. core::Signal* doorbell = core::Signal::Convert(queue->amd_queue_.hsa_queue.doorbell_signal); doorbell->StoreRelease(write_idx); // Wait for the packet to be consumed. // Should be switched to a signal wait when aql_pm4_ib can be used on all // supported platforms. while (queue->LoadReadIndexRelaxed() <= write_idx) { os::YieldThread(); } } // @brief Define the Scratch Buffer Descriptor and related parameters // that enable kernel access scratch memory void AqlQueue::InitScratchSRD() { // Populate scratch resource descriptor SQ_BUF_RSRC_WORD0 srd0; SQ_BUF_RSRC_WORD1 srd1; SQ_BUF_RSRC_WORD2 srd2; uint32_t srd3_u32; uint32_t scratch_base_hi = 0; uintptr_t scratch_base = uintptr_t(queue_scratch_.queue_base); #ifdef HSA_LARGE_MODEL scratch_base_hi = uint32_t(scratch_base >> 32); #endif srd0.bits.BASE_ADDRESS = uint32_t(scratch_base); srd1.bits.BASE_ADDRESS_HI = scratch_base_hi; srd1.bits.STRIDE = 0; srd1.bits.CACHE_SWIZZLE = 0; srd1.bits.SWIZZLE_ENABLE = 1; srd2.bits.NUM_RECORDS = uint32_t(queue_scratch_.size); if (agent_->isa()->GetMajorVersion() < 10) { SQ_BUF_RSRC_WORD3 srd3; srd3.bits.DST_SEL_X = SQ_SEL_X; srd3.bits.DST_SEL_Y = SQ_SEL_Y; srd3.bits.DST_SEL_Z = SQ_SEL_Z; srd3.bits.DST_SEL_W = SQ_SEL_W; srd3.bits.NUM_FORMAT = BUF_NUM_FORMAT_UINT; srd3.bits.DATA_FORMAT = BUF_DATA_FORMAT_32; srd3.bits.ELEMENT_SIZE = 1; // 4 srd3.bits.INDEX_STRIDE = 3; // 64 srd3.bits.ADD_TID_ENABLE = 1; srd3.bits.ATC__CI__VI = (agent_->profile() == HSA_PROFILE_FULL); srd3.bits.HASH_ENABLE = 0; srd3.bits.HEAP = 0; srd3.bits.MTYPE__CI__VI = 0; srd3.bits.TYPE = SQ_RSRC_BUF; srd3_u32 = srd3.u32All; } else { SQ_BUF_RSRC_WORD3_GFX10 srd3; srd3.bits.DST_SEL_X = SQ_SEL_X; srd3.bits.DST_SEL_Y = SQ_SEL_Y; srd3.bits.DST_SEL_Z = SQ_SEL_Z; srd3.bits.DST_SEL_W = SQ_SEL_W; srd3.bits.FORMAT = BUF_FORMAT_32_UINT; srd3.bits.RESERVED1 = 0; srd3.bits.INDEX_STRIDE = 0; // filled in by CP srd3.bits.ADD_TID_ENABLE = 1; srd3.bits.RESOURCE_LEVEL = 1; srd3.bits.RESERVED2 = 0; srd3.bits.OOB_SELECT = 2; // no bounds check in swizzle mode srd3.bits.TYPE = SQ_RSRC_BUF; srd3_u32 = srd3.u32All; } // Update Queue's Scratch descriptor's property amd_queue_.scratch_resource_descriptor[0] = srd0.u32All; amd_queue_.scratch_resource_descriptor[1] = srd1.u32All; amd_queue_.scratch_resource_descriptor[2] = srd2.u32All; amd_queue_.scratch_resource_descriptor[3] = srd3_u32; // Populate flat scratch parameters in amd_queue_. amd_queue_.scratch_backing_memory_location = queue_scratch_.queue_process_offset; amd_queue_.scratch_backing_memory_byte_size = queue_scratch_.size; // For backwards compatibility this field records the per-lane scratch // for a 64 lane wavefront. If scratch was allocated for 32 lane waves // then the effective size for a 64 lane wave is halved. amd_queue_.scratch_wave64_lane_byte_size = uint32_t((queue_scratch_.size_per_thread * queue_scratch_.lanes_per_wave) / 64); // Set concurrent wavefront limits only when scratch is being used. COMPUTE_TMPRING_SIZE tmpring_size = {}; if (queue_scratch_.size == 0) { amd_queue_.compute_tmpring_size = tmpring_size.u32All; return; } // Determine the maximum number of waves device can support const auto& agent_props = agent_->properties(); uint32_t num_cus = agent_props.NumFComputeCores / agent_props.NumSIMDPerCU; uint32_t max_scratch_waves = num_cus * agent_props.MaxSlotsScratchCU; // Scratch is allocated program COMPUTE_TMPRING_SIZE register // Scratch Size per Wave is specified in terms of kilobytes uint32_t wave_scratch = (((queue_scratch_.lanes_per_wave * queue_scratch_.size_per_thread) + 1023) / 1024); tmpring_size.bits.WAVESIZE = wave_scratch; assert(wave_scratch == tmpring_size.bits.WAVESIZE && "WAVESIZE Overflow."); uint32_t num_waves = queue_scratch_.size / (tmpring_size.bits.WAVESIZE * 1024); tmpring_size.bits.WAVES = std::min(num_waves, max_scratch_waves); amd_queue_.compute_tmpring_size = tmpring_size.u32All; return; } hsa_status_t AqlQueue::EnableGWS(int gws_slot_count) { uint32_t discard; auto status = hsaKmtAllocQueueGWS(queue_id_, gws_slot_count, &discard); if (status != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; amd_queue_.hsa_queue.type = HSA_QUEUE_TYPE_COOPERATIVE; return HSA_STATUS_SUCCESS; } } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_blit_kernel.cpp000066400000000000000000000715301420110115200236520ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_blit_kernel.h" #include #include #include #include "core/inc/amd_gpu_agent.h" #include "core/inc/hsa_internal.h" #include "core/util/utils.h" namespace rocr { namespace AMD { static const uint16_t kInvalidPacketHeader = HSA_PACKET_TYPE_INVALID; static std::string kBlitKernelSource(R"( // Compatibility function for GFXIP 7. function s_load_dword_offset(byte_offset) if kGFXIPVersion == 7 return byte_offset / 4 else return byte_offset end end // Memory copy for all cases except: // (src_addr & 0x3) != (dst_addr & 0x3) // // Kernel argument buffer: // [DW 0, 1] Phase 1 src start address // [DW 2, 3] Phase 1 dst start address // [DW 4, 5] Phase 2 src start address // [DW 6, 7] Phase 2 dst start address // [DW 8, 9] Phase 3 src start address // [DW 10,11] Phase 3 dst start address // [DW 12,13] Phase 4 src start address // [DW 14,15] Phase 4 dst start address // [DW 16,17] Phase 4 src end address // [DW 18,19] Phase 4 dst end address // [DW 20 ] Total number of workitems var kCopyAlignedVecWidth = 4 var kCopyAlignedUnroll = 1 shader CopyAligned type(CS) user_sgpr_count(2) sgpr_count(32) vgpr_count(8 + (kCopyAlignedUnroll * kCopyAlignedVecWidth)) // Retrieve kernel arguments. s_load_dwordx4 s[4:7], s[0:1], s_load_dword_offset(0x0) s_load_dwordx4 s[8:11], s[0:1], s_load_dword_offset(0x10) s_load_dwordx4 s[12:15], s[0:1], s_load_dword_offset(0x20) s_load_dwordx4 s[16:19], s[0:1], s_load_dword_offset(0x30) s_load_dwordx4 s[20:23], s[0:1], s_load_dword_offset(0x40) s_load_dword s24, s[0:1], s_load_dword_offset(0x50) s_waitcnt lgkmcnt(0) // Compute workitem id. s_lshl_b32 s2, s2, 0x6 v_add_u32 v0, vcc, s2, v0 // ===================================================== // Phase 1: Byte copy up to 0x100 destination alignment. // ===================================================== // Compute phase source address. v_mov_b32 v3, s5 v_add_u32 v2, vcc, v0, s4 v_addc_u32 v3, vcc, v3, 0x0, vcc // Compute phase destination address. v_mov_b32 v5, s7 v_add_u32 v4, vcc, v0, s6 v_addc_u32 v5, vcc, v5, 0x0, vcc L_COPY_ALIGNED_PHASE_1_LOOP: // Mask off lanes (or branch out) after phase end. v_cmp_lt_u64 vcc, v[2:3], s[8:9] s_cbranch_vccz L_COPY_ALIGNED_PHASE_1_DONE s_and_b64 exec, exec, vcc // Load from/advance the source address. flat_load_ubyte v1, v[2:3] s_waitcnt vmcnt(0) v_add_u32 v2, vcc, v2, s24 v_addc_u32 v3, vcc, v3, 0x0, vcc // Write to/advance the destination address. flat_store_byte v[4:5], v1 v_add_u32 v4, vcc, v4, s24 v_addc_u32 v5, vcc, v5, 0x0, vcc // Repeat until branched out. s_branch L_COPY_ALIGNED_PHASE_1_LOOP L_COPY_ALIGNED_PHASE_1_DONE: // Restore EXEC mask for all lanes. s_mov_b64 exec, 0xFFFFFFFFFFFFFFFF // ======================================================== // Phase 2: Unrolled dword[x4] copy up to last whole block. // ======================================================== // Compute unrolled dword[x4] stride across all threads. if kCopyAlignedVecWidth == 4 s_lshl_b32 s25, s24, 0x4 else s_lshl_b32 s25, s24, 0x2 end // Compute phase source address. if kCopyAlignedVecWidth == 4 v_lshlrev_b32 v1, 0x4, v0 else v_lshlrev_b32 v1, 0x2, v0 end v_mov_b32 v3, s9 v_add_u32 v2, vcc, v1, s8 v_addc_u32 v3, vcc, v3, 0x0, vcc // Compute phase destination address. v_mov_b32 v5, s11 v_add_u32 v4, vcc, v1, s10 v_addc_u32 v5, vcc, v5, 0x0, vcc L_COPY_ALIGNED_PHASE_2_LOOP: // Branch out after phase end. v_cmp_lt_u64 vcc, v[2:3], s[12:13] s_cbranch_vccz L_COPY_ALIGNED_PHASE_2_DONE // Load from/advance the source address. for var i = 0; i < kCopyAlignedUnroll; i ++ if kCopyAlignedVecWidth == 4 flat_load_dwordx4 v[8 + (i * 4)], v[2:3] else flat_load_dword v[8 + i], v[2:3] end v_add_u32 v2, vcc, v2, s25 v_addc_u32 v3, vcc, v3, 0x0, vcc end // Write to/advance the destination address. s_waitcnt vmcnt(0) for var i = 0; i < kCopyAlignedUnroll; i ++ if kCopyAlignedVecWidth == 4 flat_store_dwordx4 v[4:5], v[8 + (i * 4)] else flat_store_dword v[4:5], v[8 + i] end v_add_u32 v4, vcc, v4, s25 v_addc_u32 v5, vcc, v5, 0x0, vcc end // Repeat until branched out. s_branch L_COPY_ALIGNED_PHASE_2_LOOP L_COPY_ALIGNED_PHASE_2_DONE: // =========================================== // Phase 3: Dword copy up to last whole dword. // =========================================== // Compute dword stride across all threads. s_lshl_b32 s25, s24, 0x2 // Compute phase source address. v_lshlrev_b32 v1, 0x2, v0 v_mov_b32 v3, s13 v_add_u32 v2, vcc, v1, s12 v_addc_u32 v3, vcc, v3, 0x0, vcc // Compute phase destination address. v_mov_b32 v5, s15 v_add_u32 v4, vcc, v1, s14 v_addc_u32 v5, vcc, v5, 0x0, vcc L_COPY_ALIGNED_PHASE_3_LOOP: // Mask off lanes (or branch out) after phase end. v_cmp_lt_u64 vcc, v[2:3], s[16:17] s_cbranch_vccz L_COPY_ALIGNED_PHASE_3_DONE s_and_b64 exec, exec, vcc // Load from/advance the source address. flat_load_dword v1, v[2:3] v_add_u32 v2, vcc, v2, s25 v_addc_u32 v3, vcc, v3, 0x0, vcc s_waitcnt vmcnt(0) // Write to/advance the destination address. flat_store_dword v[4:5], v1 v_add_u32 v4, vcc, v4, s25 v_addc_u32 v5, vcc, v5, 0x0, vcc // Repeat until branched out. s_branch L_COPY_ALIGNED_PHASE_3_LOOP L_COPY_ALIGNED_PHASE_3_DONE: // Restore EXEC mask for all lanes. s_mov_b64 exec, 0xFFFFFFFFFFFFFFFF // ============================= // Phase 4: Byte copy up to end. // ============================= // Compute phase source address. v_mov_b32 v3, s17 v_add_u32 v2, vcc, v0, s16 v_addc_u32 v3, vcc, v3, 0x0, vcc // Compute phase destination address. v_mov_b32 v5, s19 v_add_u32 v4, vcc, v0, s18 v_addc_u32 v5, vcc, v5, 0x0, vcc // Mask off lanes (or branch out) after phase end. v_cmp_lt_u64 vcc, v[2:3], s[20:21] s_cbranch_vccz L_COPY_ALIGNED_PHASE_4_DONE s_and_b64 exec, exec, vcc // Load from the source address. flat_load_ubyte v1, v[2:3] s_waitcnt vmcnt(0) // Write to the destination address. flat_store_byte v[4:5], v1 L_COPY_ALIGNED_PHASE_4_DONE: s_endpgm end // Memory copy for this case: // (src_addr & 0x3) != (dst_addr & 0x3) // // Kernel argument buffer: // [DW 0, 1] Phase 1 src start address // [DW 2, 3] Phase 1 dst start address // [DW 4, 5] Phase 2 src start address // [DW 6, 7] Phase 2 dst start address // [DW 8, 9] Phase 2 src end address // [DW 10,11] Phase 2 dst end address // [DW 12 ] Total number of workitems var kCopyMisalignedUnroll = 4 shader CopyMisaligned type(CS) user_sgpr_count(2) sgpr_count(23) vgpr_count(6 + kCopyMisalignedUnroll) // Retrieve kernel arguments. s_load_dwordx4 s[4:7], s[0:1], s_load_dword_offset(0x0) s_load_dwordx4 s[8:11], s[0:1], s_load_dword_offset(0x10) s_load_dwordx4 s[12:15], s[0:1], s_load_dword_offset(0x20) s_load_dword s16, s[0:1], s_load_dword_offset(0x30) s_waitcnt lgkmcnt(0) // Compute workitem id. s_lshl_b32 s2, s2, 0x6 v_add_u32 v0, vcc, s2, v0 // =================================================== // Phase 1: Unrolled byte copy up to last whole block. // =================================================== // Compute phase source address. v_mov_b32 v3, s5 v_add_u32 v2, vcc, v0, s4 v_addc_u32 v3, vcc, v3, 0x0, vcc // Compute phase destination address. v_mov_b32 v5, s7 v_add_u32 v4, vcc, v0, s6 v_addc_u32 v5, vcc, v5, 0x0, vcc L_COPY_MISALIGNED_PHASE_1_LOOP: // Branch out after phase end. v_cmp_lt_u64 vcc, v[2:3], s[8:9] s_cbranch_vccz L_COPY_MISALIGNED_PHASE_1_DONE // Load from/advance the source address. for var i = 0; i < kCopyMisalignedUnroll; i ++ flat_load_ubyte v[6 + i], v[2:3] v_add_u32 v2, vcc, v2, s16 v_addc_u32 v3, vcc, v3, 0x0, vcc end // Write to/advance the destination address. s_waitcnt vmcnt(0) for var i = 0; i < kCopyMisalignedUnroll; i ++ flat_store_byte v[4:5], v[6 + i] v_add_u32 v4, vcc, v4, s16 v_addc_u32 v5, vcc, v5, 0x0, vcc end // Repeat until branched out. s_branch L_COPY_MISALIGNED_PHASE_1_LOOP L_COPY_MISALIGNED_PHASE_1_DONE: // ============================= // Phase 2: Byte copy up to end. // ============================= // Compute phase source address. v_mov_b32 v3, s9 v_add_u32 v2, vcc, v0, s8 v_addc_u32 v3, vcc, v3, 0x0, vcc // Compute phase destination address. v_mov_b32 v5, s11 v_add_u32 v4, vcc, v0, s10 v_addc_u32 v5, vcc, v5, 0x0, vcc L_COPY_MISALIGNED_PHASE_2_LOOP: // Mask off lanes (or branch out) after phase end. v_cmp_lt_u64 vcc, v[2:3], s[12:13] s_cbranch_vccz L_COPY_MISALIGNED_PHASE_2_DONE s_and_b64 exec, exec, vcc // Load from/advance the source address. flat_load_ubyte v1, v[2:3] v_add_u32 v2, vcc, v2, s16 v_addc_u32 v3, vcc, v3, 0x0, vcc s_waitcnt vmcnt(0) // Write to/advance the destination address. flat_store_byte v[4:5], v1 v_add_u32 v4, vcc, v4, s16 v_addc_u32 v5, vcc, v5, 0x0, vcc // Repeat until branched out. s_branch L_COPY_MISALIGNED_PHASE_2_LOOP L_COPY_MISALIGNED_PHASE_2_DONE: s_endpgm end // Memory fill for dword-aligned region. // // Kernel argument buffer: // [DW 0, 1] Phase 1 dst start address // [DW 2, 3] Phase 2 dst start address // [DW 4, 5] Phase 2 dst end address // [DW 6 ] Value to fill memory with // [DW 7 ] Total number of workitems var kFillVecWidth = 4 var kFillUnroll = 1 shader Fill type(CS) user_sgpr_count(2) sgpr_count(19) vgpr_count(8) // Retrieve kernel arguments. s_load_dwordx4 s[4:7], s[0:1], s_load_dword_offset(0x0) s_load_dwordx4 s[8:11], s[0:1], s_load_dword_offset(0x10) s_waitcnt lgkmcnt(0) // Compute workitem id. s_lshl_b32 s2, s2, 0x6 v_add_u32 v0, vcc, s2, v0 // Copy fill pattern into VGPRs. for var i = 0; i < kFillVecWidth; i ++ v_mov_b32 v[4 + i], s10 end // ======================================================== // Phase 1: Unrolled dword[x4] fill up to last whole block. // ======================================================== // Compute unrolled dword[x4] stride across all threads. if kFillVecWidth == 4 s_lshl_b32 s12, s11, 0x4 else s_lshl_b32 s12, s11, 0x2 end // Compute phase destination address. if kFillVecWidth == 4 v_lshlrev_b32 v1, 0x4, v0 else v_lshlrev_b32 v1, 0x2, v0 end v_mov_b32 v3, s5 v_add_u32 v2, vcc, v1, s4 v_addc_u32 v3, vcc, v3, 0x0, vcc L_FILL_PHASE_1_LOOP: // Branch out after phase end. v_cmp_lt_u64 vcc, v[2:3], s[6:7] s_cbranch_vccz L_FILL_PHASE_1_DONE // Write to/advance the destination address. for var i = 0; i < kFillUnroll; i ++ if kFillVecWidth == 4 flat_store_dwordx4 v[2:3], v[4:7] else flat_store_dword v[2:3], v4 end v_add_u32 v2, vcc, v2, s12 v_addc_u32 v3, vcc, v3, 0x0, vcc end // Repeat until branched out. s_branch L_FILL_PHASE_1_LOOP L_FILL_PHASE_1_DONE: // ============================== // Phase 2: Dword fill up to end. // ============================== // Compute dword stride across all threads. s_lshl_b32 s12, s11, 0x2 // Compute phase destination address. v_lshlrev_b32 v1, 0x2, v0 v_mov_b32 v3, s7 v_add_u32 v2, vcc, v1, s6 v_addc_u32 v3, vcc, v3, 0x0, vcc L_FILL_PHASE_2_LOOP: // Mask off lanes (or branch out) after phase end. v_cmp_lt_u64 vcc, v[2:3], s[8:9] s_cbranch_vccz L_FILL_PHASE_2_DONE s_and_b64 exec, exec, vcc // Write to/advance the destination address. flat_store_dword v[2:3], v4 v_add_u32 v2, vcc, v2, s12 v_addc_u32 v3, vcc, v3, 0x0, vcc // Repeat until branched out. s_branch L_FILL_PHASE_2_LOOP L_FILL_PHASE_2_DONE: s_endpgm end )"); // Search kernel source for variable definition and return value. int GetKernelSourceParam(const char* paramName) { std::stringstream paramDef; paramDef << "var " << paramName << " = "; std::string::size_type paramDefLoc = kBlitKernelSource.find(paramDef.str()); assert(paramDefLoc != std::string::npos); std::string::size_type paramValLoc = paramDefLoc + paramDef.str().size(); std::string::size_type paramEndLoc = kBlitKernelSource.find('\n', paramDefLoc); assert(paramDefLoc != std::string::npos); std::string paramVal(&kBlitKernelSource[paramValLoc], &kBlitKernelSource[paramEndLoc]); return std::stoi(paramVal); } static int kCopyAlignedVecWidth = GetKernelSourceParam("kCopyAlignedVecWidth"); static int kCopyAlignedUnroll = GetKernelSourceParam("kCopyAlignedUnroll"); static int kCopyMisalignedUnroll = GetKernelSourceParam("kCopyMisalignedUnroll"); static int kFillVecWidth = GetKernelSourceParam("kFillVecWidth"); static int kFillUnroll = GetKernelSourceParam("kFillUnroll"); BlitKernel::BlitKernel(core::Queue* queue) : core::Blit(), queue_(queue), kernarg_async_(NULL), kernarg_async_mask_(0), kernarg_async_counter_(0), num_cus_(0) { completion_signal_.handle = 0; } BlitKernel::~BlitKernel() {} hsa_status_t BlitKernel::Initialize(const core::Agent& agent) { queue_bitmask_ = queue_->public_handle()->size - 1; hsa_status_t status = HSA::hsa_signal_create(1, 0, NULL, &completion_signal_); if (HSA_STATUS_SUCCESS != status) { return status; } const AMD::GpuAgent& gpuAgent = static_cast(agent); kernarg_async_ = reinterpret_cast( gpuAgent.system_allocator()(queue_->public_handle()->size * AlignUp(sizeof(KernelArgs), 16), 16, core::MemoryRegion::AllocateNoFlags)); kernarg_async_mask_ = queue_->public_handle()->size - 1; // Obtain the number of compute units in the underlying agent. num_cus_ = gpuAgent.properties().NumFComputeCores / 4; // Assemble shaders to AQL code objects. std::map kernel_names = { {KernelType::CopyAligned, "CopyAligned"}, {KernelType::CopyMisaligned, "CopyMisaligned"}, {KernelType::Fill, "Fill"}}; for (auto kernel_name : kernel_names) { KernelCode& kernel = kernels_[kernel_name.first]; gpuAgent.AssembleShader(kernel_name.second, AMD::GpuAgent::AssembleTarget::AQL, kernel.code_buf_, kernel.code_buf_size_); } if (agent.profiling_enabled()) { return EnableProfiling(true); } return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::Destroy(const core::Agent& agent) { std::lock_guard guard(lock_); const AMD::GpuAgent& gpuAgent = static_cast(agent); for (auto kernel_pair : kernels_) { gpuAgent.ReleaseShader(kernel_pair.second.code_buf_, kernel_pair.second.code_buf_size_); } if (kernarg_async_ != NULL) { gpuAgent.system_deallocator()(kernarg_async_); } if (completion_signal_.handle != 0) { HSA::hsa_signal_destroy(completion_signal_); } return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::SubmitLinearCopyCommand(void* dst, const void* src, size_t size) { // Protect completion_signal_. std::lock_guard guard(lock_); HSA::hsa_signal_store_relaxed(completion_signal_, 1); std::vector dep_signals(0); hsa_status_t stat = SubmitLinearCopyCommand( dst, src, size, dep_signals, *core::Signal::Convert(completion_signal_)); if (stat != HSA_STATUS_SUCCESS) { return stat; } // Wait for the packet to finish. if (HSA::hsa_signal_wait_scacquire(completion_signal_, HSA_SIGNAL_CONDITION_LT, 1, uint64_t(-1), HSA_WAIT_STATE_ACTIVE) != 0) { // Signal wait returned unexpected value. return HSA_STATUS_ERROR; } return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::SubmitLinearCopyCommand( void* dst, const void* src, size_t size, std::vector& dep_signals, core::Signal& out_signal) { // Reserve write index for barrier(s) + dispatch packet. const uint32_t num_barrier_packet = uint32_t((dep_signals.size() + 4) / 5); const uint32_t total_num_packet = num_barrier_packet + 1; uint64_t write_index = AcquireWriteIndex(total_num_packet); uint64_t write_index_temp = write_index; // Insert barrier packets to handle dependent signals. // Barrier bit keeps signal checking traffic from competing with a copy. const uint16_t kBarrierPacketHeader = (HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE) | (HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) | (HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE); hsa_barrier_and_packet_t barrier_packet = {0}; barrier_packet.header = HSA_PACKET_TYPE_INVALID; hsa_barrier_and_packet_t* queue_buffer = reinterpret_cast( queue_->public_handle()->base_address); const size_t dep_signal_count = dep_signals.size(); for (size_t i = 0; i < dep_signal_count; ++i) { const size_t idx = i % 5; barrier_packet.dep_signal[idx] = core::Signal::Convert(dep_signals[i]); if (i == (dep_signal_count - 1) || idx == 4) { std::atomic_thread_fence(std::memory_order_acquire); queue_buffer[(write_index)&queue_bitmask_] = barrier_packet; std::atomic_thread_fence(std::memory_order_release); queue_buffer[(write_index)&queue_bitmask_].header = kBarrierPacketHeader; ++write_index; memset(&barrier_packet, 0, sizeof(hsa_barrier_and_packet_t)); barrier_packet.header = HSA_PACKET_TYPE_INVALID; } } // Insert dispatch packet for copy kernel. KernelArgs* args = ObtainAsyncKernelCopyArg(); KernelCode* kernel_code = nullptr; int num_workitems = 0; bool aligned = ((uintptr_t(src) & 0x3) == (uintptr_t(dst) & 0x3)); if (aligned) { // Use dword-based aligned kernel. kernel_code = &kernels_[KernelType::CopyAligned]; // Compute the size of each copy phase. num_workitems = 64 * 4 * num_cus_; // Phase 1 (byte copy) ends when destination is 0x100-aligned. uintptr_t src_start = uintptr_t(src); uintptr_t dst_start = uintptr_t(dst); uint64_t phase1_size = std::min(size, uint64_t(0x100 - (dst_start & 0xFF)) & 0xFF); // Phase 2 (unrolled dwordx4 copy) ends when last whole block fits. uint64_t phase2_block = num_workitems * sizeof(uint32_t) * kCopyAlignedUnroll * kCopyAlignedVecWidth; uint64_t phase2_size = ((size - phase1_size) / phase2_block) * phase2_block; // Phase 3 (dword copy) ends when last whole dword fits. uint64_t phase3_size = ((size - phase1_size - phase2_size) / sizeof(uint32_t)) * sizeof(uint32_t); args->copy_aligned.phase1_src_start = src_start; args->copy_aligned.phase1_dst_start = dst_start; args->copy_aligned.phase2_src_start = src_start + phase1_size; args->copy_aligned.phase2_dst_start = dst_start + phase1_size; args->copy_aligned.phase3_src_start = src_start + phase1_size + phase2_size; args->copy_aligned.phase3_dst_start = dst_start + phase1_size + phase2_size; args->copy_aligned.phase4_src_start = src_start + phase1_size + phase2_size + phase3_size; args->copy_aligned.phase4_dst_start = dst_start + phase1_size + phase2_size + phase3_size; args->copy_aligned.phase4_src_end = src_start + size; args->copy_aligned.phase4_dst_end = dst_start + size; args->copy_aligned.num_workitems = num_workitems; } else { // Use byte-based misaligned kernel. kernel_code = &kernels_[KernelType::CopyMisaligned]; // Compute the size of each copy phase. num_workitems = 64 * 4 * num_cus_; // Phase 1 (unrolled byte copy) ends when last whole block fits. uintptr_t src_start = uintptr_t(src); uintptr_t dst_start = uintptr_t(dst); uint64_t phase1_block = num_workitems * sizeof(uint8_t) * kCopyMisalignedUnroll; uint64_t phase1_size = (size / phase1_block) * phase1_block; args->copy_misaligned.phase1_src_start = src_start; args->copy_misaligned.phase1_dst_start = dst_start; args->copy_misaligned.phase2_src_start = src_start + phase1_size; args->copy_misaligned.phase2_dst_start = dst_start + phase1_size; args->copy_misaligned.phase2_src_end = src_start + size; args->copy_misaligned.phase2_dst_end = dst_start + size; args->copy_misaligned.num_workitems = num_workitems; } hsa_signal_t signal = {(core::Signal::Convert(&out_signal)).handle}; PopulateQueue(write_index, uintptr_t(kernel_code->code_buf_), args, num_workitems, signal); // Submit barrier(s) and dispatch packets. ReleaseWriteIndex(write_index_temp, total_num_packet); return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::SubmitLinearFillCommand(void* ptr, uint32_t value, size_t count) { std::lock_guard guard(lock_); // Reject misaligned base address. if ((uintptr_t(ptr) & 0x3) != 0) { return HSA_STATUS_ERROR; } // Compute the size of each fill phase. int num_workitems = 64 * num_cus_; // Phase 1 (unrolled dwordx4 copy) ends when last whole block fits. uintptr_t dst_start = uintptr_t(ptr); uint64_t fill_size = count * sizeof(uint32_t); uint64_t phase1_block = num_workitems * sizeof(uint32_t) * kFillUnroll * kFillVecWidth; uint64_t phase1_size = (fill_size / phase1_block) * phase1_block; KernelArgs* args = ObtainAsyncKernelCopyArg(); args->fill.phase1_dst_start = dst_start; args->fill.phase2_dst_start = dst_start + phase1_size; args->fill.phase2_dst_end = dst_start + fill_size; args->fill.fill_value = value; args->fill.num_workitems = num_workitems; // Submit dispatch packet. HSA::hsa_signal_store_relaxed(completion_signal_, 1); uint64_t write_index = AcquireWriteIndex(1); PopulateQueue(write_index, uintptr_t(kernels_[KernelType::Fill].code_buf_), args, num_workitems, completion_signal_); ReleaseWriteIndex(write_index, 1); // Wait for the packet to finish. if (HSA::hsa_signal_wait_scacquire(completion_signal_, HSA_SIGNAL_CONDITION_LT, 1, uint64_t(-1), HSA_WAIT_STATE_ACTIVE) != 0) { // Signal wait returned unexpected value. return HSA_STATUS_ERROR; } return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::EnableProfiling(bool enable) { queue_->SetProfiling(enable); return HSA_STATUS_SUCCESS; } uint64_t BlitKernel::AcquireWriteIndex(uint32_t num_packet) { assert(queue_->public_handle()->size >= num_packet); uint64_t write_index = queue_->AddWriteIndexAcqRel(num_packet); while (write_index + num_packet - queue_->LoadReadIndexRelaxed() > queue_->public_handle()->size) { os::YieldThread(); } return write_index; } void BlitKernel::ReleaseWriteIndex(uint64_t write_index, uint32_t num_packet) { // Update doorbel register with last packet id. core::Signal* doorbell = core::Signal::Convert(queue_->public_handle()->doorbell_signal); doorbell->StoreRelease(write_index + num_packet - 1); } void BlitKernel::PopulateQueue(uint64_t index, uint64_t code_handle, void* args, uint32_t grid_size_x, hsa_signal_t completion_signal) { assert(IsMultipleOf(args, 16)); hsa_kernel_dispatch_packet_t packet = {0}; static const uint16_t kDispatchPacketHeader = (HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) | (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) | (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE); packet.header = kInvalidPacketHeader; packet.kernel_object = code_handle; packet.kernarg_address = args; // Setup working size. const int kNumDimension = 1; packet.setup = kNumDimension << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS; packet.grid_size_x = AlignUp(static_cast(grid_size_x), 64); packet.grid_size_y = packet.grid_size_z = 1; packet.workgroup_size_x = 64; packet.workgroup_size_y = packet.workgroup_size_z = 1; packet.completion_signal = completion_signal; // Populate queue buffer with AQL packet. hsa_kernel_dispatch_packet_t* queue_buffer = reinterpret_cast( queue_->public_handle()->base_address); std::atomic_thread_fence(std::memory_order_acquire); queue_buffer[index & queue_bitmask_] = packet; std::atomic_thread_fence(std::memory_order_release); queue_buffer[index & queue_bitmask_].header = kDispatchPacketHeader; } BlitKernel::KernelArgs* BlitKernel::ObtainAsyncKernelCopyArg() { const uint32_t index = atomic::Add(&kernarg_async_counter_, 1U, std::memory_order_acquire) & kernarg_async_mask_; KernelArgs* arg = &kernarg_async_[index]; assert(IsMultipleOf(arg, 16)); return arg; } } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_blit_sdma.cpp000066400000000000000000001204731420110115200233170ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_blit_sdma.h" #include #include #include #include #include #include "core/inc/amd_gpu_agent.h" #include "core/inc/amd_memory_region.h" #include "core/inc/runtime.h" #include "core/inc/sdma_registers.h" #include "core/inc/signal.h" #include "core/inc/interrupt_signal.h" namespace rocr { namespace AMD { inline uint32_t ptrlow32(const void* p) { return static_cast(reinterpret_cast(p)); } inline uint32_t ptrhigh32(const void* p) { #if defined(HSA_LARGE_MODEL) return static_cast(reinterpret_cast(p) >> 32); #else return 0; #endif } const size_t BlitSdmaBase::kQueueSize = 1024 * 1024; const size_t BlitSdmaBase::kCopyPacketSize = sizeof(SDMA_PKT_COPY_LINEAR); const size_t BlitSdmaBase::kMaxSingleCopySize = SDMA_PKT_COPY_LINEAR::kMaxSize_; const size_t BlitSdmaBase::kMaxSingleFillSize = SDMA_PKT_CONSTANT_FILL::kMaxSize_; // Initialize size of various sDMA commands use by this module template const uint32_t BlitSdma::linear_copy_command_size_ = sizeof(SDMA_PKT_COPY_LINEAR); template const uint32_t BlitSdma::fill_command_size_ = sizeof(SDMA_PKT_CONSTANT_FILL); template const uint32_t BlitSdma::fence_command_size_ = sizeof(SDMA_PKT_FENCE); template const uint32_t BlitSdma::poll_command_size_ = sizeof(SDMA_PKT_POLL_REGMEM); template const uint32_t BlitSdma::flush_command_size_ = sizeof(SDMA_PKT_POLL_REGMEM); template const uint32_t BlitSdma::atomic_command_size_ = sizeof(SDMA_PKT_ATOMIC); template const uint32_t BlitSdma::timestamp_command_size_ = sizeof(SDMA_PKT_TIMESTAMP); template const uint32_t BlitSdma::trap_command_size_ = sizeof(SDMA_PKT_TRAP); template const uint32_t BlitSdma::gcr_command_size_ = sizeof(SDMA_PKT_GCR); template BlitSdma::BlitSdma() : agent_(NULL), queue_start_addr_(NULL), parity_(false), cached_reserve_index_(0), cached_commit_index_(0), platform_atomic_support_(true), hdp_flush_support_(false) { std::memset(&queue_resource_, 0, sizeof(queue_resource_)); } template BlitSdma::~BlitSdma() {} template hsa_status_t BlitSdma::Initialize( const core::Agent& agent, bool use_xgmi) { if (queue_start_addr_ != NULL) { // Already initialized. return HSA_STATUS_SUCCESS; } if (agent.device_type() != core::Agent::kAmdGpuDevice) { return HSA_STATUS_ERROR; } agent_ = reinterpret_cast(&const_cast(agent)); if (HSA_PROFILE_FULL == agent_->profile()) { assert(false && "Only support SDMA for dgpu currently"); return HSA_STATUS_ERROR; } if (agent_->isa()->GetVersion() == core::Isa::Version(7, 0, 1)) { platform_atomic_support_ = false; } else { const core::Runtime::LinkInfo& link = core::Runtime::runtime_singleton_->GetLinkInfo( agent_->node_id(), core::Runtime::runtime_singleton_->cpu_agents()[0]->node_id()); platform_atomic_support_ = link.info.atomic_support_64bit; } // HDP flush supported on gfx900 and forward. // FIXME: Not working on gfx10, raises SRBM write protection interrupt. if (agent_->isa()->GetMajorVersion() == 9) { hdp_flush_support_ = true; } // Allocate queue buffer. queue_start_addr_ = (char*)agent_->system_allocator()(kQueueSize, 0x1000, core::MemoryRegion::AllocateExecutable); if (queue_start_addr_ == NULL) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } MAKE_NAMED_SCOPE_GUARD(cleanupOnException, [&]() { Destroy(agent); };); std::memset(queue_start_addr_, 0, kQueueSize); // Access kernel driver to initialize the queue control block // This call binds user mode queue object to underlying compute // device. ROCr creates queues that are of two kinds: PCIe optimized // and xGMI optimized. Which queue to create is indicated via input // boolean flag const HSA_QUEUE_TYPE kQueueType_ = use_xgmi ? HSA_QUEUE_SDMA_XGMI : HSA_QUEUE_SDMA; if (HSAKMT_STATUS_SUCCESS != hsaKmtCreateQueue(agent_->node_id(), kQueueType_, 100, HSA_QUEUE_PRIORITY_MAXIMUM, queue_start_addr_, kQueueSize, NULL, &queue_resource_)) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } cached_reserve_index_ = *reinterpret_cast(queue_resource_.Queue_write_ptr); cached_commit_index_ = cached_reserve_index_; signals_[0].reset(new core::InterruptSignal(0)); signals_[1].reset(new core::InterruptSignal(0)); cleanupOnException.Dismiss(); return HSA_STATUS_SUCCESS; } template hsa_status_t BlitSdma::Destroy( const core::Agent& agent) { // Release all allocated resources and reset them to zero. if (queue_resource_.QueueId != 0) { // Release queue resources from the kernel auto err = hsaKmtDestroyQueue(queue_resource_.QueueId); assert(err == HSAKMT_STATUS_SUCCESS); memset(&queue_resource_, 0, sizeof(queue_resource_)); } if (queue_start_addr_ != NULL) { // Release queue buffer. agent_->system_deallocator()(queue_start_addr_); } queue_start_addr_ = NULL; cached_reserve_index_ = 0; cached_commit_index_ = 0; signals_[0].reset(); signals_[1].reset(); return HSA_STATUS_SUCCESS; } template hsa_status_t BlitSdma::SubmitBlockingCommand(const void* cmd, size_t cmd_size) { ScopedAcquire lock(&lock_); // Alternate between completion signals // Using two allows overlapping command writing and copies core::Signal* completionSignal; if (parity_) completionSignal = signals_[0].get(); else completionSignal = signals_[1].get(); parity_ ^= true; // Wait for prior operation with this signal to complete completionSignal->WaitRelaxed(HSA_SIGNAL_CONDITION_EQ, 0, -1, HSA_WAIT_STATE_BLOCKED); // Mark signal as in use, guard against exception leaving the signal in an unusable state. completionSignal->StoreRelaxed(2); MAKE_SCOPE_GUARD([&]() { completionSignal->StoreRelaxed(0); }); lock.Release(); // Submit command and wait for completion hsa_status_t ret = SubmitCommand(cmd, cmd_size, std::vector(), *completionSignal); completionSignal->WaitRelaxed(HSA_SIGNAL_CONDITION_EQ, 1, -1, HSA_WAIT_STATE_BLOCKED); return ret; } template hsa_status_t BlitSdma::SubmitCommand( const void* cmd, size_t cmd_size, const std::vector& dep_signals, core::Signal& out_signal) { // The signal is 64 bit value, and poll checks for 32 bit value. So we // need to use two poll operations per dependent signal. const uint32_t num_poll_command = static_cast(2 * dep_signals.size()); const uint32_t total_poll_command_size = (num_poll_command * poll_command_size_); // Load the profiling state early in case the user disable or enable the // profiling in the middle of the call. const bool profiling_enabled = agent_->profiling_enabled(); uint64_t* start_ts_addr = nullptr; uint64_t* end_ts_addr = nullptr; uint32_t total_timestamp_command_size = 0; if (profiling_enabled) { out_signal.GetSdmaTsAddresses(start_ts_addr, end_ts_addr); total_timestamp_command_size = 2 * timestamp_command_size_; } // On agent that does not support platform atomic, we replace it with // one or two fence packet(s) to update the signal value. The reason fence // is used and not write packet is because the SDMA engine may overlap a // serial copy/write packets. const uint64_t completion_signal_value = static_cast(out_signal.LoadRelaxed() - 1); const size_t sync_command_size = (platform_atomic_support_) ? atomic_command_size_ : (completion_signal_value > UINT32_MAX) ? 2 * fence_command_size_ : fence_command_size_; // If the signal is an interrupt signal, we also need to make SDMA engine to // send interrupt packet to IH. const size_t interrupt_command_size = (out_signal.signal_.event_mailbox_ptr != 0) ? (fence_command_size_ + trap_command_size_) : 0; // Add space for acquire or release Hdp flush command uint32_t flush_cmd_size = 0; if (core::Runtime::runtime_singleton_->flag().enable_sdma_hdp_flush()) { if ((HwIndexMonotonic) && (hdp_flush_support_)) { flush_cmd_size = flush_command_size_; } } // Add space for cache flush. if (useGCR) flush_cmd_size += gcr_command_size_ * 2; const uint32_t total_command_size = total_poll_command_size + cmd_size + sync_command_size + total_timestamp_command_size + interrupt_command_size + flush_cmd_size; RingIndexTy curr_index; char* command_addr = AcquireWriteAddress(total_command_size, curr_index); if (command_addr == NULL) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } for (size_t i = 0; i < dep_signals.size(); ++i) { uint32_t* signal_addr = reinterpret_cast(dep_signals[i]->ValueLocation()); // Wait for the higher 64 bit to 0. BuildPollCommand(command_addr, &signal_addr[1], 0); command_addr += poll_command_size_; // Then wait for the lower 64 bit to 0. BuildPollCommand(command_addr, &signal_addr[0], 0); command_addr += poll_command_size_; } if (profiling_enabled) { BuildGetGlobalTimestampCommand(command_addr, reinterpret_cast(start_ts_addr)); command_addr += timestamp_command_size_; } // Issue a Hdp flush cmd if (core::Runtime::runtime_singleton_->flag().enable_sdma_hdp_flush()) { if ((HwIndexMonotonic) && (hdp_flush_support_)) { BuildHdpFlushCommand(command_addr); command_addr += flush_command_size_; } } // Issue cache invalidate if (useGCR) { BuildGCRCommand(command_addr, true); command_addr += gcr_command_size_; } // Do the command after all polls are satisfied. memcpy(command_addr, cmd, cmd_size); command_addr += cmd_size; // Issue cache writeback if (useGCR) { BuildGCRCommand(command_addr, false); command_addr += gcr_command_size_; } if (profiling_enabled) { assert(IsMultipleOf(end_ts_addr, 32)); BuildGetGlobalTimestampCommand(command_addr, reinterpret_cast(end_ts_addr)); command_addr += timestamp_command_size_; } // After transfer is completed, decrement the signal value. if (platform_atomic_support_) { BuildAtomicDecrementCommand(command_addr, out_signal.ValueLocation()); command_addr += atomic_command_size_; } else { uint32_t* signal_value_location = reinterpret_cast(out_signal.ValueLocation()); if (completion_signal_value > UINT32_MAX) { BuildFenceCommand(command_addr, signal_value_location + 1, static_cast(completion_signal_value >> 32)); command_addr += fence_command_size_; } BuildFenceCommand(command_addr, signal_value_location, static_cast(completion_signal_value)); command_addr += fence_command_size_; } // Update mailbox event and send interrupt to IH. if (out_signal.signal_.event_mailbox_ptr != 0) { BuildFenceCommand(command_addr, reinterpret_cast(out_signal.signal_.event_mailbox_ptr), static_cast(out_signal.signal_.event_id)); command_addr += fence_command_size_; BuildTrapCommand(command_addr, out_signal.signal_.event_id); } ReleaseWriteAddress(curr_index, total_command_size); return HSA_STATUS_SUCCESS; } template hsa_status_t BlitSdma::SubmitLinearCopyCommand(void* dst, const void* src, size_t size) { // Break the copy into multiple copy operation incase the copy size exceeds // the SDMA linear copy limit. const uint32_t num_copy_command = (size + kMaxSingleCopySize - 1) / kMaxSingleCopySize; std::vector buff(num_copy_command); BuildCopyCommand(reinterpret_cast(&buff[0]), num_copy_command, dst, src, size); return SubmitBlockingCommand(&buff[0], buff.size() * sizeof(SDMA_PKT_COPY_LINEAR)); } template hsa_status_t BlitSdma::SubmitLinearCopyCommand(void* dst, const void* src, size_t size, std::vector& dep_signals, core::Signal& out_signal) { // Break the copy into multiple copy operations when the copy size exceeds // the SDMA linear copy limit. const uint32_t num_copy_command = (size + kMaxSingleCopySize - 1) / kMaxSingleCopySize; // Assemble copy packets. std::vector buff(num_copy_command); BuildCopyCommand(reinterpret_cast(&buff[0]), num_copy_command, dst, src, size); return SubmitCommand(&buff[0], buff.size() * sizeof(SDMA_PKT_COPY_LINEAR), dep_signals, out_signal); } template hsa_status_t BlitSdma::SubmitCopyRectCommand( const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, std::vector& dep_signals, core::Signal& out_signal) { // Hardware requires DWORD alignment for base address, pitches // Also confirm that we have a geometric rect (copied block does not wrap an edge). if (((uintptr_t)dst->base) % 4 != 0 || ((uintptr_t)src->base) % 4 != 0) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect base address not aligned."); if (((uintptr_t)dst->pitch) % 4 != 0 || ((uintptr_t)src->pitch) % 4 != 0) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect pitch not aligned."); if (((uintptr_t)dst->slice) % 4 != 0 || ((uintptr_t)src->slice) % 4 != 0) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect slice not aligned."); if (uint64_t(src_offset->x) + range->x > src->pitch || uint64_t(dst_offset->x) + range->x > dst->pitch) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect width out of range."); if ((src->slice != 0) && (uint64_t(src_offset->y) + range->y) > src->slice / src->pitch) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect height out of range."); if ((dst->slice != 0) && (uint64_t(dst_offset->y) + range->y) > dst->slice / dst->pitch) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect height out of range."); if (range->z > 1 && (src->slice == 0 || dst->slice == 0)) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect slice needed."); const uint max_pitch = 1 << SDMA_PKT_COPY_LINEAR_RECT::pitch_bits; std::vector pkts; auto append = [&](size_t size) { assert(size == sizeof(SDMA_PKT_COPY_LINEAR_RECT) && "SDMA packet size missmatch"); pkts.emplace_back(SDMA_PKT_COPY_LINEAR_RECT()); return &pkts.back(); }; // Do wide pitch 2D copies along X-Z if (range->z == 1 && (src->pitch > max_pitch || dst->pitch > max_pitch)) { hsa_pitched_ptr_t Src = *src; hsa_pitched_ptr_t Dst = *dst; hsa_dim3_t Soff = *src_offset; hsa_dim3_t Doff = *dst_offset; hsa_dim3_t Range = *range; Src.base = static_cast(Src.base) + Soff.z * Src.slice + Soff.y * Src.pitch; Dst.base = static_cast(Dst.base) + Doff.z * Dst.slice + Doff.y * Dst.pitch; Soff.y = Soff.z = 0; Doff.y = Doff.z = 0; Src.slice = Src.pitch; Src.pitch = 0; Dst.slice = Dst.pitch; Dst.pitch = 0; Range.z = Range.y; Range.y = 1; BuildCopyRectCommand(append, &Dst, &Doff, &Src, &Soff, &Range); } else { BuildCopyRectCommand(append, dst, dst_offset, src, src_offset, range); } return SubmitCommand(&pkts[0], pkts.size() * sizeof(SDMA_PKT_COPY_LINEAR_RECT), dep_signals, out_signal); } template hsa_status_t BlitSdma::SubmitLinearFillCommand(void* ptr, uint32_t value, size_t count) { const size_t size = count * sizeof(uint32_t); const uint32_t num_fill_command = (size + kMaxSingleFillSize - 1) / kMaxSingleFillSize; std::vector buff(num_fill_command); BuildFillCommand(reinterpret_cast(&buff[0]), num_fill_command, ptr, value, count); return SubmitBlockingCommand(&buff[0], buff.size() * sizeof(SDMA_PKT_CONSTANT_FILL)); } template hsa_status_t BlitSdma::EnableProfiling( bool enable) { return HSA_STATUS_SUCCESS; } template char* BlitSdma::AcquireWriteAddress( uint32_t cmd_size, RingIndexTy& curr_index) { // Ring is full when all but one byte is written. if (cmd_size >= kQueueSize) { return NULL; } while (true) { curr_index = atomic::Load(&cached_reserve_index_, std::memory_order_acquire); // Check whether a linear region of the requested size is available. // If == cmd_size: region is at beginning of ring. // If < cmd_size: region intersects end of ring, pad with no-ops and retry. if (WrapIntoRing(curr_index + cmd_size) < cmd_size) { PadRingToEnd(curr_index); continue; } // Check whether the engine has finished using this region. const RingIndexTy new_index = curr_index + cmd_size; if (CanWriteUpto(new_index) == false) { // Wait for read index to move and try again. os::YieldThread(); continue; } // Try to reserve this part of the ring. if (atomic::Cas(&cached_reserve_index_, new_index, curr_index, std::memory_order_release) == curr_index) { return queue_start_addr_ + WrapIntoRing(curr_index); } // Another thread reserved curr_index, try again. os::YieldThread(); } return NULL; } template void BlitSdma::UpdateWriteAndDoorbellRegister(RingIndexTy curr_index, RingIndexTy new_index) { while (true) { // Make sure that the address before ::curr_index is already released. // Otherwise the CP may read invalid packets. if (atomic::Load(&cached_commit_index_, std::memory_order_acquire) == curr_index) { if (core::Runtime::runtime_singleton_->flag().sdma_wait_idle()) { // TODO: remove when sdma wpointer issue is resolved. // Wait until the SDMA engine finish processing all packets before // updating the wptr and doorbell. while (WrapIntoRing(*reinterpret_cast(queue_resource_.Queue_read_ptr)) != WrapIntoRing(curr_index)) { os::YieldThread(); } } // Update write pointer and doorbel register. *reinterpret_cast(queue_resource_.Queue_write_ptr) = (HwIndexMonotonic ? new_index : WrapIntoRing(new_index)); // Ensure write pointer is visible to GPU before doorbell. std::atomic_thread_fence(std::memory_order_release); *reinterpret_cast(queue_resource_.Queue_DoorBell) = (HwIndexMonotonic ? new_index : WrapIntoRing(new_index)); atomic::Store(&cached_commit_index_, new_index, std::memory_order_release); break; } // Waiting for another thread to submit preceding commands first. os::YieldThread(); } } template void BlitSdma::ReleaseWriteAddress( RingIndexTy curr_index, uint32_t cmd_size) { if (cmd_size > kQueueSize) { assert(false && "cmd_addr is outside the queue buffer range"); return; } UpdateWriteAndDoorbellRegister(curr_index, curr_index + cmd_size); } template void BlitSdma::PadRingToEnd( RingIndexTy curr_index) { // Reserve region from here to the end of the ring. RingIndexTy new_index = curr_index + (kQueueSize - WrapIntoRing(curr_index)); // Check whether the engine has finished using this region. if (CanWriteUpto(new_index) == false) { // Wait for read index to move and try again. return; } if (atomic::Cas(&cached_reserve_index_, new_index, curr_index, std::memory_order_release) == curr_index) { // Write and submit NOP commands in reserved region. char* nop_address = queue_start_addr_ + WrapIntoRing(curr_index); memset(nop_address, 0, new_index - curr_index); UpdateWriteAndDoorbellRegister(curr_index, new_index); } } template uint32_t BlitSdma::WrapIntoRing( RingIndexTy index) { return index & (kQueueSize - 1); } template bool BlitSdma::CanWriteUpto( RingIndexTy upto_index) { // Get/calculate the monotonic read index. RingIndexTy hw_read_index = *reinterpret_cast(queue_resource_.Queue_read_ptr); RingIndexTy read_index; if (HwIndexMonotonic) { read_index = hw_read_index; } else { // Calculate distance from commit index to HW read index. // Commit index is always < kQueueSize away from HW read index. RingIndexTy commit_index = atomic::Load(&cached_commit_index_, std::memory_order_relaxed); RingIndexTy dist_to_read_index = WrapIntoRing(commit_index - hw_read_index); read_index = commit_index - dist_to_read_index; } // Check whether the read pointer has passed the given index. // At most we can submit (kQueueSize - 1) bytes at a time. return (upto_index - read_index) < kQueueSize; } template void BlitSdma::BuildFenceCommand( char* fence_command_addr, uint32_t* fence, uint32_t fence_value) { assert(fence_command_addr != NULL); SDMA_PKT_FENCE* packet_addr = reinterpret_cast(fence_command_addr); memset(packet_addr, 0, sizeof(SDMA_PKT_FENCE)); packet_addr->HEADER_UNION.op = SDMA_OP_FENCE; if (agent_->isa()->GetMajorVersion() >= 10) { packet_addr->HEADER_UNION.mtype = 3; } packet_addr->ADDR_LO_UNION.addr_31_0 = ptrlow32(fence); packet_addr->ADDR_HI_UNION.addr_63_32 = ptrhigh32(fence); packet_addr->DATA_UNION.data = fence_value; } template void BlitSdma::BuildCopyCommand( char* cmd_addr, uint32_t num_copy_command, void* dst, const void* src, size_t size) { size_t cur_size = 0; for (uint32_t i = 0; i < num_copy_command; ++i) { const uint32_t copy_size = static_cast(std::min((size - cur_size), kMaxSingleCopySize)); void* cur_dst = static_cast(dst) + cur_size; const void* cur_src = static_cast(src) + cur_size; SDMA_PKT_COPY_LINEAR* packet_addr = reinterpret_cast(cmd_addr); memset(packet_addr, 0, sizeof(SDMA_PKT_COPY_LINEAR)); packet_addr->HEADER_UNION.op = SDMA_OP_COPY; packet_addr->HEADER_UNION.sub_op = SDMA_SUBOP_COPY_LINEAR; packet_addr->COUNT_UNION.count = copy_size + SizeToCountOffset; packet_addr->SRC_ADDR_LO_UNION.src_addr_31_0 = ptrlow32(cur_src); packet_addr->SRC_ADDR_HI_UNION.src_addr_63_32 = ptrhigh32(cur_src); packet_addr->DST_ADDR_LO_UNION.dst_addr_31_0 = ptrlow32(cur_dst); packet_addr->DST_ADDR_HI_UNION.dst_addr_63_32 = ptrhigh32(cur_dst); cmd_addr += linear_copy_command_size_; cur_size += copy_size; } assert(cur_size == size); } /* Copies are done in terms of elements (1, 2, 4, 8, or 16 bytes) and have alignment restrictions. Elements are coded by the log2 of the element size in bytes (ie. element 0=1 byte, 4=16 byte). This routine breaks a large rect into tiles that can be handled by hardware. Pitches and offsets must be representable in terms of elements in all tiles of the copy. */ template void BlitSdma::BuildCopyRectCommand( const std::function& append, const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range) { // Returns the index of the first set bit (ie log2 of the largest power of 2 that evenly divides // width), the largest element that perfectly covers width. // width | 16 ensures that we don't return a higher element than is supported and avoids // issues with 0. auto maxAlignedElement = [](size_t width) { return __builtin_ctz(width | 16); }; // Limits in terms of element count const uint32_t max_pitch = 1 << SDMA_PKT_COPY_LINEAR_RECT::pitch_bits; const uint32_t max_slice = 1 << SDMA_PKT_COPY_LINEAR_RECT::slice_bits; const uint32_t max_x = 1 << SDMA_PKT_COPY_LINEAR_RECT::rect_xy_bits; const uint32_t max_y = 1 << SDMA_PKT_COPY_LINEAR_RECT::rect_xy_bits; const uint32_t max_z = 1 << SDMA_PKT_COPY_LINEAR_RECT::rect_z_bits; // Find maximum element that describes the pitch and slice. // Pitch and slice must both be represented in units of elements. No element larger than this // may be used in any tile as the pitches would not be exactly represented. int max_ele = Min(maxAlignedElement(src->pitch), maxAlignedElement(dst->pitch)); if (range->z != 1) // Only need to consider slice if HW will copy along Z. max_ele = Min(max_ele, maxAlignedElement(src->slice), maxAlignedElement(dst->slice)); /* Find the minimum element size that will be needed for any tile. No subdivision of a range admits a larger element size for the smallest element in any subdivision than the element size that covers the whole range, though some can be worse (this is easily model checked). Subdividing with any element larger than the covering element won't change the covering element of the remainder ( Range%Element = (Range-N*LargerElement)%Element since LargerElement%Element=0 ). Ex. range->x=71, assume max range is 16 elements: We can break at 64 giving tiles: [0,63], [64-70] (width 64 & 7). 64 is covered by element 4 (16B) and 7 is covered by element 0 (1B). Exactly covering 71 requires using element 0. Base addresses in each tile must be DWORD aligned, if not then the offset from an aligned address must be represented in elements. This may reduce the size of the element, but since elements are integer multiples of each other this is harmless. src and dst base has already been checked for DWORD alignment so we only need to consider the offset here. */ int min_ele = Min(max_ele, maxAlignedElement(range->x), maxAlignedElement(src_offset->x % 4), maxAlignedElement(dst_offset->x % 4)); // Check that pitch and slice can be represented in the tile with the smallest element if ((src->pitch >> min_ele) > max_pitch || (dst->pitch >> min_ele) > max_pitch) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect pitch out of limits.\n"); if (range->z != 1) { // Only need to consider slice if HW will copy along Z. if ((src->slice >> min_ele) > max_slice || (dst->slice >> min_ele) > max_slice) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Copy rect slice out of limits.\n"); } // Break copy into tiles for (uint32_t z = 0; z < range->z; z += max_z) { for (uint32_t y = 0; y < range->y; y += max_y) { uint32_t x = 0; while (x < range->x) { uint32_t width = range->x - x; // Get largest element which describes the start of this tile after its base address has // been aligned. Base addresses must be DWORD (4 byte) aligned. int aligned_ele = Min(maxAlignedElement((src_offset->x + x) % 4), maxAlignedElement((dst_offset->x + x) % 4), max_ele); // Get largest permissible element which exactly covers width int element = Min(maxAlignedElement(width), aligned_ele); int xcount = width >> element; // If width is too large then width is at least max_x bytes (bigger than any element) so // drop the width restriction and clip element count to max_x. if (xcount > max_x) { element = aligned_ele; xcount = Min(width >> element, max_x); } // Get base addresses and offsets for this tile. uintptr_t sbase = (uintptr_t)src->base + src_offset->x + x + (src_offset->y + y) * src->pitch + (src_offset->z + z) * src->slice; uintptr_t dbase = (uintptr_t)dst->base + dst_offset->x + x + (dst_offset->y + y) * dst->pitch + (dst_offset->z + z) * dst->slice; uint soff = (sbase % 4) >> element; uint doff = (dbase % 4) >> element; sbase &= ~3ull; dbase &= ~3ull; x += xcount << element; SDMA_PKT_COPY_LINEAR_RECT* pkt = (SDMA_PKT_COPY_LINEAR_RECT*)append(sizeof(SDMA_PKT_COPY_LINEAR_RECT)); *pkt = {}; pkt->HEADER_UNION.op = SDMA_OP_COPY; pkt->HEADER_UNION.sub_op = SDMA_SUBOP_COPY_LINEAR_RECT; pkt->HEADER_UNION.element = element; pkt->SRC_ADDR_LO_UNION.src_addr_31_0 = sbase; pkt->SRC_ADDR_HI_UNION.src_addr_63_32 = sbase >> 32; pkt->SRC_PARAMETER_1_UNION.src_offset_x = soff; pkt->SRC_PARAMETER_2_UNION.src_pitch = (src->pitch >> element) - 1; pkt->SRC_PARAMETER_3_UNION.src_slice_pitch = (range->z == 1) ? 0 : (src->slice >> element) - 1; pkt->DST_ADDR_LO_UNION.dst_addr_31_0 = dbase; pkt->DST_ADDR_HI_UNION.dst_addr_63_32 = dbase >> 32; pkt->DST_PARAMETER_1_UNION.dst_offset_x = doff; pkt->DST_PARAMETER_2_UNION.dst_pitch = (dst->pitch >> element) - 1; pkt->DST_PARAMETER_3_UNION.dst_slice_pitch = (range->z == 1) ? 0 : (dst->slice >> element) - 1; pkt->RECT_PARAMETER_1_UNION.rect_x = xcount - 1; pkt->RECT_PARAMETER_1_UNION.rect_y = Min(range->y - y, max_y) - 1; pkt->RECT_PARAMETER_2_UNION.rect_z = Min(range->z - z, max_z) - 1; } } } } template void BlitSdma::BuildFillCommand( char* cmd_addr, uint32_t num_fill_command, void* ptr, uint32_t value, size_t count) { char* cur_ptr = reinterpret_cast(ptr); const uint32_t maxDwordCount = kMaxSingleFillSize / sizeof(uint32_t); SDMA_PKT_CONSTANT_FILL* packet_addr = reinterpret_cast(cmd_addr); for (uint32_t i = 0; i < num_fill_command; i++) { assert(count != 0 && "SDMA fill command count error."); const uint32_t fill_count = Min(count, size_t(maxDwordCount)); memset(packet_addr, 0, sizeof(SDMA_PKT_CONSTANT_FILL)); packet_addr->HEADER_UNION.op = SDMA_OP_CONST_FILL; packet_addr->HEADER_UNION.fillsize = 2; // DW fill packet_addr->DST_ADDR_LO_UNION.dst_addr_31_0 = ptrlow32(cur_ptr); packet_addr->DST_ADDR_HI_UNION.dst_addr_63_32 = ptrhigh32(cur_ptr); packet_addr->DATA_UNION.src_data_31_0 = value; packet_addr->COUNT_UNION.count = (fill_count + SizeToCountOffset) * sizeof(uint32_t); packet_addr++; cur_ptr += fill_count * sizeof(uint32_t); count -= fill_count; } assert(count == 0 && "SDMA fill command count error."); } template void BlitSdma::BuildPollCommand( char* cmd_addr, void* addr, uint32_t reference) { SDMA_PKT_POLL_REGMEM* packet_addr = reinterpret_cast(cmd_addr); memset(packet_addr, 0, sizeof(SDMA_PKT_POLL_REGMEM)); packet_addr->HEADER_UNION.op = SDMA_OP_POLL_REGMEM; packet_addr->HEADER_UNION.mem_poll = 1; packet_addr->HEADER_UNION.func = 0x3; // IsEqual. packet_addr->ADDR_LO_UNION.addr_31_0 = ptrlow32(addr); packet_addr->ADDR_HI_UNION.addr_63_32 = ptrhigh32(addr); packet_addr->VALUE_UNION.value = reference; packet_addr->MASK_UNION.mask = 0xffffffff; // Compare the whole content. packet_addr->DW5_UNION.interval = 0x04; packet_addr->DW5_UNION.retry_count = 0xfff; // Retry forever. } template void BlitSdma::BuildAtomicDecrementCommand(char* cmd_addr, void* addr) { SDMA_PKT_ATOMIC* packet_addr = reinterpret_cast(cmd_addr); memset(packet_addr, 0, sizeof(SDMA_PKT_ATOMIC)); packet_addr->HEADER_UNION.op = SDMA_OP_ATOMIC; packet_addr->HEADER_UNION.operation = SDMA_ATOMIC_ADD64; packet_addr->ADDR_LO_UNION.addr_31_0 = ptrlow32(addr); packet_addr->ADDR_HI_UNION.addr_63_32 = ptrhigh32(addr); packet_addr->SRC_DATA_LO_UNION.src_data_31_0 = 0xffffffff; packet_addr->SRC_DATA_HI_UNION.src_data_63_32 = 0xffffffff; } template void BlitSdma::BuildGetGlobalTimestampCommand(char* cmd_addr, void* write_address) { SDMA_PKT_TIMESTAMP* packet_addr = reinterpret_cast(cmd_addr); memset(packet_addr, 0, sizeof(SDMA_PKT_TIMESTAMP)); packet_addr->HEADER_UNION.op = SDMA_OP_TIMESTAMP; packet_addr->HEADER_UNION.sub_op = SDMA_SUBOP_TIMESTAMP_GET_GLOBAL; packet_addr->ADDR_LO_UNION.addr_31_0 = ptrlow32(write_address); packet_addr->ADDR_HI_UNION.addr_63_32 = ptrhigh32(write_address); } template void BlitSdma::BuildTrapCommand( char* cmd_addr, uint32_t event_id) { SDMA_PKT_TRAP* packet_addr = reinterpret_cast(cmd_addr); memset(packet_addr, 0, sizeof(SDMA_PKT_TRAP)); packet_addr->HEADER_UNION.op = SDMA_OP_TRAP; packet_addr->INT_CONTEXT_UNION.int_ctx = event_id; } template void BlitSdma::BuildHdpFlushCommand( char* cmd_addr) { assert(cmd_addr != NULL); SDMA_PKT_POLL_REGMEM* addr = reinterpret_cast(cmd_addr); memcpy(addr, &hdp_flush_cmd, flush_command_size_); } template void BlitSdma::BuildGCRCommand( char* cmd_addr, bool invalidate) { assert(cmd_addr != NULL); assert(useGCR && "Unsupported SDMA command - GCR."); SDMA_PKT_GCR* addr = reinterpret_cast(cmd_addr); memset(addr, 0, sizeof(SDMA_PKT_GCR)); addr->HEADER_UNION.op = SDMA_OP_GCR; addr->HEADER_UNION.sub_op = SDMA_SUBOP_USER_GCR; addr->WORD2_UNION.GCR_CONTROL_GL2_WB = 1; addr->WORD2_UNION.GCR_CONTROL_GLK_WB = 1; if (invalidate) { addr->WORD2_UNION.GCR_CONTROL_GL2_INV = 1; addr->WORD2_UNION.GCR_CONTROL_GL1_INV = 1; addr->WORD2_UNION.GCR_CONTROL_GLV_INV = 1; addr->WORD2_UNION.GCR_CONTROL_GLK_INV = 1; } // Discarding all lines for now. addr->WORD2_UNION.GCR_CONTROL_GL2_RANGE = 0; } template class BlitSdma; template class BlitSdma; template class BlitSdma; } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_cpu_agent.cpp000066400000000000000000000332311420110115200233210ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_cpu_agent.h" #include #include #include "core/inc/amd_memory_region.h" #include "core/inc/host_queue.h" #include "inc/hsa_ext_image.h" namespace rocr { namespace AMD { CpuAgent::CpuAgent(HSAuint32 node, const HsaNodeProperties& node_props) : core::Agent(node, kAmdCpuDevice), properties_(node_props) { InitRegionList(); InitCacheList(); } CpuAgent::~CpuAgent() { std::for_each(regions_.begin(), regions_.end(), DeleteObject()); regions_.clear(); } void CpuAgent::InitRegionList() { const bool is_apu_node = (properties_.NumFComputeCores > 0); std::vector mem_props(properties_.NumMemoryBanks); if (HSAKMT_STATUS_SUCCESS == hsaKmtGetNodeMemoryProperties(node_id(), properties_.NumMemoryBanks, &mem_props[0])) { std::vector::iterator system_prop = std::find_if(mem_props.begin(), mem_props.end(), [](HsaMemoryProperties prop) -> bool { return (prop.SizeInBytes > 0 && prop.HeapType == HSA_HEAPTYPE_SYSTEM); }); HsaMemoryProperties system_props; std::memset(&system_props, 0, sizeof(HsaMemoryProperties)); system_props.HeapType = HSA_HEAPTYPE_SYSTEM; system_props.SizeInBytes = 0; system_props.VirtualBaseAddress = 0; if (system_prop != mem_props.end()) system_props = *system_prop; MemoryRegion* system_region_fine = new MemoryRegion(true, false, is_apu_node, this, system_props); regions_.push_back(system_region_fine); MemoryRegion* system_region_kernarg = new MemoryRegion(true, true, is_apu_node, this, system_props); regions_.push_back(system_region_kernarg); if (!is_apu_node) { MemoryRegion* system_region_coarse = new MemoryRegion(false, false, is_apu_node, this, system_props); regions_.push_back(system_region_coarse); } } } void CpuAgent::InitCacheList() { // Get CPU cache information. cache_props_.resize(properties_.NumCaches); if (HSAKMT_STATUS_SUCCESS != hsaKmtGetNodeCacheProperties(node_id(), properties_.CComputeIdLo, properties_.NumCaches, &cache_props_[0])) { cache_props_.clear(); } else { // Only store CPU D-cache. for (size_t cache_id = 0; cache_id < cache_props_.size(); ++cache_id) { const HsaCacheType type = cache_props_[cache_id].CacheType; if (type.ui32.CPU != 1 || type.ui32.Instruction == 1) { cache_props_.erase(cache_props_.begin() + cache_id); --cache_id; } } } // Update cache objects caches_.clear(); caches_.resize(cache_props_.size()); char name[64]; GetInfo(HSA_AGENT_INFO_NAME, name); std::string deviceName = name; for (size_t i = 0; i < caches_.size(); i++) caches_[i].reset(new core::Cache(deviceName + " L" + std::to_string(cache_props_[i].CacheLevel), cache_props_[i].CacheLevel, cache_props_[i].CacheSize)); } hsa_status_t CpuAgent::VisitRegion(bool include_peer, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const { if (!include_peer) { return VisitRegion(regions_, callback, data); } // Expose all system regions in the system. hsa_status_t stat = VisitRegion( core::Runtime::runtime_singleton_->system_regions_fine(), callback, data); if (stat != HSA_STATUS_SUCCESS) { return stat; } return VisitRegion(core::Runtime::runtime_singleton_->system_regions_coarse(), callback, data); } hsa_status_t CpuAgent::VisitRegion( const std::vector& regions, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const { for (const core::MemoryRegion* region : regions) { hsa_region_t region_handle = core::MemoryRegion::Convert(region); hsa_status_t status = callback(region_handle, data); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } hsa_status_t CpuAgent::IterateRegion( hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const { return VisitRegion(true, callback, data); } hsa_status_t CpuAgent::IterateCache(hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* data) const { for (size_t i = 0; i < caches_.size(); i++) { hsa_status_t stat = callback(core::Cache::Convert(caches_[i].get()), data); if (stat != HSA_STATUS_SUCCESS) return stat; } return HSA_STATUS_SUCCESS; } hsa_status_t CpuAgent::GetInfo(hsa_agent_info_t attribute, void* value) const { // agent, and vendor name size limit const size_t attribute_u = static_cast(attribute); switch (attribute_u) { // The code copies HsaNodeProperties.MarketingName a Unicode string // which is encoded in UTF-16 as a 7-bit ASCII string. The value of // HsaNodeProperties.MarketingName is obtained from the "model name" // property of /proc/cpuinfo file case HSA_AGENT_INFO_NAME: case HSA_AMD_AGENT_INFO_PRODUCT_NAME: { std::memset(value, 0, HSA_PUBLIC_NAME_SIZE); char* temp = reinterpret_cast(value); for (uint32_t idx = 0; properties_.MarketingName[idx] != 0 && idx < HSA_PUBLIC_NAME_SIZE - 1; idx++) { temp[idx] = (uint8_t)properties_.MarketingName[idx]; } break; } case HSA_AGENT_INFO_VENDOR_NAME: // TODO: hardcode for now, wait until SWDEV-88894 implemented std::memset(value, 0, HSA_PUBLIC_NAME_SIZE); std::memcpy(value, "CPU", sizeof("CPU")); break; case HSA_AGENT_INFO_FEATURE: *((hsa_agent_feature_t*)value) = static_cast(0); break; case HSA_AGENT_INFO_MACHINE_MODEL: #if defined(HSA_LARGE_MODEL) *((hsa_machine_model_t*)value) = HSA_MACHINE_MODEL_LARGE; #else *((hsa_machine_model_t*)value) = HSA_MACHINE_MODEL_SMALL; #endif break; case HSA_AGENT_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES: case HSA_AGENT_INFO_DEFAULT_FLOAT_ROUNDING_MODE: // TODO: validate if this is true. *((hsa_default_float_rounding_mode_t*)value) = HSA_DEFAULT_FLOAT_ROUNDING_MODE_NEAR; break; case HSA_AGENT_INFO_FAST_F16_OPERATION: // TODO: validate if this is true. *((bool*)value) = false; break; case HSA_AGENT_INFO_PROFILE: *((hsa_profile_t*)value) = HSA_PROFILE_FULL; break; case HSA_AGENT_INFO_WAVEFRONT_SIZE: *((uint32_t*)value) = 0; break; case HSA_AGENT_INFO_WORKGROUP_MAX_DIM: std::memset(value, 0, sizeof(uint16_t) * 3); break; case HSA_AGENT_INFO_WORKGROUP_MAX_SIZE: *((uint32_t*)value) = 0; break; case HSA_AGENT_INFO_GRID_MAX_DIM: std::memset(value, 0, sizeof(hsa_dim3_t)); break; case HSA_AGENT_INFO_GRID_MAX_SIZE: *((uint32_t*)value) = 0; break; case HSA_AGENT_INFO_FBARRIER_MAX_SIZE: // TODO: ? *((uint32_t*)value) = 0; break; case HSA_AGENT_INFO_QUEUES_MAX: *((uint32_t*)value) = 0; break; case HSA_AGENT_INFO_QUEUE_MIN_SIZE: *((uint32_t*)value) = 0; break; case HSA_AGENT_INFO_QUEUE_MAX_SIZE: *((uint32_t*)value) = 0; break; case HSA_AGENT_INFO_QUEUE_TYPE: *((hsa_queue_type32_t*)value) = HSA_QUEUE_TYPE_MULTI; break; case HSA_AGENT_INFO_NODE: // TODO: associate with OS NUMA support (numactl / GetNumaProcessorNode). *((uint32_t*)value) = node_id(); break; case HSA_AGENT_INFO_DEVICE: *((hsa_device_type_t*)value) = HSA_DEVICE_TYPE_CPU; break; case HSA_AGENT_INFO_CACHE_SIZE: { std::memset(value, 0, sizeof(uint32_t) * 4); assert(cache_props_.size() > 0 && "CPU cache info missing."); const size_t num_cache = cache_props_.size(); for (size_t i = 0; i < num_cache; ++i) { const uint32_t line_level = cache_props_[i].CacheLevel; ((uint32_t*)value)[line_level - 1] = cache_props_[i].CacheSize * 1024; } } break; case HSA_AGENT_INFO_ISA: ((hsa_isa_t*)value)->handle = 0; break; case HSA_AGENT_INFO_EXTENSIONS: memset(value, 0, sizeof(uint8_t) * 128); break; case HSA_AGENT_INFO_VERSION_MAJOR: *((uint16_t*)value) = 1; break; case HSA_AGENT_INFO_VERSION_MINOR: *((uint16_t*)value) = 1; break; case HSA_EXT_AGENT_INFO_IMAGE_1D_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_1DA_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_1DB_MAX_ELEMENTS: *((uint32_t*)value) = 0; break; case HSA_EXT_AGENT_INFO_IMAGE_2D_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_2DA_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_2DDEPTH_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_2DADEPTH_MAX_ELEMENTS: memset(value, 0, sizeof(uint32_t) * 2); break; case HSA_EXT_AGENT_INFO_IMAGE_3D_MAX_ELEMENTS: memset(value, 0, sizeof(uint32_t) * 3); break; case HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS: *((uint32_t*)value) = 0; break; case HSA_EXT_AGENT_INFO_MAX_IMAGE_RD_HANDLES: case HSA_EXT_AGENT_INFO_MAX_IMAGE_RORW_HANDLES: case HSA_EXT_AGENT_INFO_MAX_SAMPLER_HANDLERS: *((uint32_t*)value) = 0; break; case HSA_AMD_AGENT_INFO_CHIP_ID: *((uint32_t*)value) = properties_.DeviceId; break; case HSA_AMD_AGENT_INFO_CACHELINE_SIZE: // TODO: hardcode for now. *((uint32_t*)value) = 64; break; case HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT: *((uint32_t*)value) = properties_.NumCPUCores; break; case HSA_AMD_AGENT_INFO_MAX_CLOCK_FREQUENCY: *((uint32_t*)value) = properties_.MaxEngineClockMhzCCompute; break; case HSA_AMD_AGENT_INFO_DRIVER_NODE_ID: *((uint32_t*)value) = node_id(); break; case HSA_AMD_AGENT_INFO_MAX_ADDRESS_WATCH_POINTS: *((uint32_t*)value) = static_cast( 1 << properties_.Capability.ui32.WatchPointsTotalBits); break; case HSA_AMD_AGENT_INFO_BDFID: *((uint32_t*)value) = static_cast(properties_.LocationId); break; case HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU: *((uint32_t*)value) = static_cast( properties_.NumSIMDPerCU * properties_.MaxWavesPerSIMD); break; case HSA_AMD_AGENT_INFO_NUM_SIMDS_PER_CU: *((uint32_t*)value) = properties_.NumSIMDPerCU; break; case HSA_AMD_AGENT_INFO_NUM_SHADER_ENGINES: *((uint32_t*)value) = properties_.NumShaderBanks; break; case HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE: *((uint32_t*)value) = properties_.NumArrays; break; case HSA_AMD_AGENT_INFO_HDP_FLUSH: *((hsa_amd_hdp_flush_t*)value) = {nullptr, nullptr}; break; case HSA_AMD_AGENT_INFO_DOMAIN: *((uint32_t*)value) = static_cast(properties_.Domain); break; case HSA_AMD_AGENT_INFO_UUID: { // At this point CPU devices do not support UUID's. char uuid_tmp[] = "CPU-XX"; snprintf((char*)value, sizeof(uuid_tmp), "%s", uuid_tmp); break; } case HSA_AMD_AGENT_INFO_ASIC_REVISION: *((uint32_t*)value) = static_cast(properties_.Capability.ui32.ASICRevision); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; break; } return HSA_STATUS_SUCCESS; } hsa_status_t CpuAgent::QueueCreate(size_t size, hsa_queue_type32_t queue_type, core::HsaEventCallback event_callback, void* data, uint32_t private_segment_size, uint32_t group_segment_size, core::Queue** queue) { // No HW AQL packet processor on CPU device. return HSA_STATUS_ERROR; } } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_filter_device.cpp000066400000000000000000000211731420110115200241620ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_filter_device.h" #include #include #include #include #include #include #include #include #include #include "hsakmt.h" #include "core/util/utils.h" #include "core/inc/runtime.h" #include "core/inc/amd_cpu_agent.h" #include "core/inc/amd_gpu_agent.h" #include "core/inc/amd_memory_region.h" namespace rocr { namespace AMD { bool RvdFilter::FilterDevices() { return core::Runtime::runtime_singleton_->flag().filter_visible_gpus(); } bool RvdFilter::SelectZeroDevices() { const std::string& envVal = core::Runtime::runtime_singleton_->flag().visible_gpus(); return envVal.empty(); } void RvdFilter::BuildRvdTokenList() { // Determine if user has chosen ZERO devices to be surfaced const std::string& envVal = core::Runtime::runtime_singleton_->flag().visible_gpus(); if (envVal.empty()) { return; } // Parse env value into tokens separated by comma (',') delimiter std::string token; char separator = ','; std::stringstream stream(envVal); while (getline(stream, token, separator)) { std::transform(token.begin(), token.end(), token.begin(), ::toupper); token = trim(token); rvdTokenList_.push_back(token); } } void RvdFilter::BuildDeviceUuidList(uint32_t numNodes) { HSAKMT_STATUS status; HsaNodeProperties props = {0}; for (HSAuint32 idx = 0; idx < numNodes; idx++) { // Query for node properties and ignore Cpu devices status = hsaKmtGetNodeProperties(idx, &props); if (status != HSAKMT_STATUS_SUCCESS) { continue; } if (props.NumFComputeCores == 0) { continue; } // For devices whose UUID is zero build a string that // will not match user provided value if (props.UniqueID == 0) { devUuidList_.push_back("Invalid-UUID"); continue; } // For devices that support valid UUID values capture UUID // value into a upper case hex string of length 16 including // leading zeros if necessary std::stringstream stream; stream << "GPU-" << std::setfill('0') << std::setw(sizeof(uint64_t) * 2) << std::hex << props.UniqueID; std::string uuidVal(stream.str()); std::transform(uuidVal.begin(), uuidVal.end(), uuidVal.begin(), ::toupper); devUuidList_.push_back(uuidVal); } } int32_t RvdFilter::ProcessUuidToken(const std::string& token) { // Determine if token exceeds max length of a UUID string uint32_t tokenLen = token.length(); if ((tokenLen < 5) || (tokenLen > 20)) { return -1; } // Track the number of devices user token matches int32_t devIdx = -1; int32_t compareVal = -1; uint32_t numGpus = devUuidList_.size(); for (uint32_t idx = 0; idx < numGpus; idx++) { uint32_t uuidLen = devUuidList_[idx].length(); // Token could match UUID of another device if (tokenLen > uuidLen) { compareVal = -1; continue; } // Token could match as substring of device UUID compareVal = token.compare(0, tokenLen, devUuidList_[idx], 0, tokenLen); // Check if user Uuid matches with ROCt Uuid if (compareVal == 0) { if (devIdx != -1) { return -1; } devIdx = idx; } } // Return value includes possibility of both // finding or not finding a device return devIdx; } uint32_t RvdFilter::BuildUsrDeviceList() { // Get number of Gpu devices and user specified tokens uint32_t numGpus = devUuidList_.size(); uint32_t loopCnt = std::min(numGpus, uint32_t(rvdTokenList_.size())); // Evaluate tokens into device index or UUID values int32_t usrIdx = 0; int32_t devIdx = -1; for (uint32_t idx = 0; idx < loopCnt; idx++) { // User token to be evaluated as UUID or device index std::string& token = rvdTokenList_[idx]; // Token encodes a UUID valaue if (token.at(0) == 'G') { devIdx = ProcessUuidToken(token); if (devIdx == -1) { return usrDeviceList_.size(); } // Token encodes device index } else { char* end = nullptr; const char* tmp = token.c_str(); devIdx = std::strtol(tmp, &end, 0); if (*end != '\0') { return usrDeviceList_.size(); } } // Rvd Token evaluates to wrong device index if ((devIdx < 0) || (devIdx >= numGpus)) { return usrDeviceList_.size(); } // Determine if device index is previously seen // Such indices are interpreted as terminators bool exists = (usrDeviceList_.find(devIdx) != usrDeviceList_.end()); if (exists) { return usrDeviceList_.size(); } // Add index to the list of devices that will be // surfaced upon device enumeration usrDeviceList_[devIdx] = usrIdx++; } return usrDeviceList_.size(); } uint32_t RvdFilter::GetUsrDeviceListSize() { return usrDeviceList_.size(); } int32_t RvdFilter::GetUsrDeviceRank(uint32_t roctIdx) { const auto& it = usrDeviceList_.find(roctIdx); if (it != usrDeviceList_.end()) { return it->second; } return -1; } #ifndef NDEBUG void RvdFilter::SetDeviceUuidList() { uint64_t dbgUuid[] = {0xBABABABABABABABA, 0xBABABABABABAABBA, 0xBABABABAABBAABBA, 0xBABAABBAABBAABBA, 0xABBAABBAABBAABBA, 0xABBAABBAABBABABA, 0xABBAABBABABABABA, 0xABBABABABABABABA}; // Override or Set Uuid values for the first four devices uint32_t numGpus = devUuidList_.size(); uint32_t numUuids = (sizeof(dbgUuid) / sizeof(uint64_t)); for (uint32_t idx = 0; (idx < numGpus && (idx < numUuids)); idx++) { std::stringstream stream; // For devices whose UUID is zero if (dbgUuid[idx] == 0) { stream << "GPU-XX"; continue; } // For devices that support valid UUID values stream << "GPU-" << std::setfill('0') << std::setw(sizeof(uint64_t) * 2) << std::hex << dbgUuid[idx]; std::string uuidVal(stream.str()); std::transform(uuidVal.begin(), uuidVal.end(), uuidVal.begin(), ::toupper); devUuidList_[idx] = uuidVal; } } void RvdFilter::PrintDeviceUuidList() { uint32_t numGpus = devUuidList_.size(); for (uint32_t idx = 0; idx < numGpus; idx++) { std::cout << "Dev[" << idx << "]: " << devUuidList_[idx]; std::cout << std::endl << std::flush; } } void RvdFilter::PrintUsrDeviceList() { // Flip the map values as value indicates surface rank for (auto const& elem : usrDeviceList_) { std::cout << "UsrDev[" << elem.second << "]: " << elem.first; std::cout << std::endl << std::flush; } } void RvdFilter::PrintRvdTokenList() { uint32_t numTokens = rvdTokenList_.size(); for (uint32_t idx = 0; idx < numTokens; idx++) { std::cout << "Token[" << idx << "]: " << rvdTokenList_[idx]; std::cout << std::endl << std::flush; } } #endif } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_gpu_agent.cpp000066400000000000000000001670711420110115200233370ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_gpu_agent.h" #include #include #include #include #include #include #include #include #include #include #include "core/inc/amd_aql_queue.h" #include "core/inc/amd_blit_kernel.h" #include "core/inc/amd_blit_sdma.h" #include "core/inc/amd_gpu_pm4.h" #include "core/inc/amd_gpu_shaders.h" #include "core/inc/amd_memory_region.h" #include "core/inc/interrupt_signal.h" #include "core/inc/isa.h" #include "core/inc/runtime.h" #include "core/util/os.h" #include "inc/hsa_ext_image.h" #include "inc/hsa_ven_amd_aqlprofile.h" // Size of scratch (private) segment pre-allocated per thread, in bytes. #define DEFAULT_SCRATCH_BYTES_PER_THREAD 2048 #define MAX_WAVE_SCRATCH 8387584 // See COMPUTE_TMPRING_SIZE.WAVESIZE #define MAX_NUM_DOORBELLS 0x400 namespace rocr { namespace core { extern HsaApiTable hsa_internal_api_table_; } // namespace core namespace AMD { GpuAgent::GpuAgent(HSAuint32 node, const HsaNodeProperties& node_props, bool xnack_mode, uint32_t index) : GpuAgentInt(node), properties_(node_props), current_coherency_type_(HSA_AMD_COHERENCY_TYPE_COHERENT), scratch_used_large_(0), queues_(), is_kv_device_(false), trap_code_buf_(NULL), trap_code_buf_size_(0), doorbell_queue_map_(NULL), memory_bus_width_(0), memory_max_frequency_(0), enum_index_(index), ape1_base_(0), ape1_size_(0), scratch_cache_( [this](void* base, size_t size, bool large) { ReleaseScratch(base, size, large); }) { const bool is_apu_node = (properties_.NumCPUCores > 0); profile_ = (is_apu_node) ? HSA_PROFILE_FULL : HSA_PROFILE_BASE; HSAKMT_STATUS err = hsaKmtGetClockCounters(node_id(), &t0_); t1_ = t0_; historical_clock_ratio_ = 0.0; assert(err == HSAKMT_STATUS_SUCCESS && "hsaGetClockCounters error"); const core::Isa *isa_base = core::IsaRegistry::GetIsa( core::Isa::Version(node_props.EngineId.ui32.Major, node_props.EngineId.ui32.Minor, node_props.EngineId.ui32.Stepping)); if (!isa_base) { throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ISA, "Agent creation failed.\nThe GPU node has an unrecognized id.\n"); } rocr::core::IsaFeature sramecc = rocr::core::IsaFeature::Unsupported; if (isa_base->IsSrameccSupported()) { sramecc = node_props.Capability.ui32.SRAM_EDCSupport == 1 ? core::IsaFeature::Enabled : core::IsaFeature::Disabled; } rocr::core::IsaFeature xnack = rocr::core::IsaFeature::Unsupported; if (isa_base->IsXnackSupported()) { // TODO: This needs to be obtained form KFD once HMM implemented. xnack = xnack_mode ? core::IsaFeature::Enabled : core::IsaFeature::Disabled; } // Set instruction set architecture via node property, only on GPU device. isa_ = (core::Isa*)core::IsaRegistry::GetIsa( core::Isa::Version(node_props.EngineId.ui32.Major, node_props.EngineId.ui32.Minor, node_props.EngineId.ui32.Stepping), sramecc, xnack); assert(isa_ != nullptr && "ISA registry inconsistency."); // Check if the device is Kaveri, only on GPU device. if (isa_->GetMajorVersion() == 7 && isa_->GetMinorVersion() == 0 && isa_->GetStepping() == 0) { is_kv_device_ = true; } current_coherency_type((profile_ == HSA_PROFILE_FULL) ? HSA_AMD_COHERENCY_TYPE_COHERENT : HSA_AMD_COHERENCY_TYPE_NONCOHERENT); max_queues_ = core::Runtime::runtime_singleton_->flag().max_queues(); #if !defined(HSA_LARGE_MODEL) || !defined(__linux__) if (max_queues_ == 0) { max_queues_ = 10; } max_queues_ = std::min(10U, max_queues_); #else if (max_queues_ == 0) { max_queues_ = 128; } max_queues_ = std::min(128U, max_queues_); #endif // Populate region list. InitRegionList(); // Populate cache list. InitCacheList(); } GpuAgent::~GpuAgent() { for (auto& blit : blits_) { if (!blit.empty()) { hsa_status_t status = blit->Destroy(*this); assert(status == HSA_STATUS_SUCCESS); } } if (ape1_base_ != 0) { _aligned_free(reinterpret_cast(ape1_base_)); } scratch_cache_.trim(true); if (scratch_pool_.base() != NULL) { hsaKmtFreeMemory(scratch_pool_.base(), scratch_pool_.size()); } system_deallocator()(doorbell_queue_map_); if (trap_code_buf_ != NULL) { ReleaseShader(trap_code_buf_, trap_code_buf_size_); } std::for_each(regions_.begin(), regions_.end(), DeleteObject()); regions_.clear(); } void GpuAgent::AssembleShader(const char* func_name, AssembleTarget assemble_target, void*& code_buf, size_t& code_buf_size) const { // Select precompiled shader implementation from name/target. struct ASICShader { const void* code; size_t size; int num_sgprs; int num_vgprs; }; struct CompiledShader { ASICShader compute_7; ASICShader compute_8; ASICShader compute_9; ASICShader compute_90a; ASICShader compute_1010; ASICShader compute_10; }; std::map compiled_shaders = { {"TrapHandler", { {NULL, 0, 0, 0}, {kCodeTrapHandler8, sizeof(kCodeTrapHandler8), 2, 4}, {kCodeTrapHandler9, sizeof(kCodeTrapHandler9), 2, 4}, {kCodeTrapHandler90a, sizeof(kCodeTrapHandler90a), 2, 4}, {kCodeTrapHandler1010, sizeof(kCodeTrapHandler1010), 2, 4}, {kCodeTrapHandler10, sizeof(kCodeTrapHandler10), 2, 4}, }}, {"TrapHandlerKfdExceptions", { {NULL, 0, 0, 0}, {kCodeTrapHandler8, sizeof(kCodeTrapHandler8), 2, 4}, {kCodeTrapHandlerV2_9, sizeof(kCodeTrapHandlerV2_9), 2, 4}, {kCodeTrapHandlerV2_9, sizeof(kCodeTrapHandlerV2_9), 2, 4}, {kCodeTrapHandlerV2_1010, sizeof(kCodeTrapHandlerV2_1010), 2, 4}, {kCodeTrapHandlerV2_10, sizeof(kCodeTrapHandlerV2_10), 2, 4}, }}, {"CopyAligned", { {kCodeCopyAligned7, sizeof(kCodeCopyAligned7), 32, 12}, {kCodeCopyAligned8, sizeof(kCodeCopyAligned8), 32, 12}, {kCodeCopyAligned8, sizeof(kCodeCopyAligned8), 32, 12}, {kCodeCopyAligned8, sizeof(kCodeCopyAligned8), 32, 12}, {kCodeCopyAligned10, sizeof(kCodeCopyAligned10), 32, 12}, {kCodeCopyAligned10, sizeof(kCodeCopyAligned10), 32, 12}, }}, {"CopyMisaligned", { {kCodeCopyMisaligned7, sizeof(kCodeCopyMisaligned7), 23, 10}, {kCodeCopyMisaligned8, sizeof(kCodeCopyMisaligned8), 23, 10}, {kCodeCopyMisaligned8, sizeof(kCodeCopyMisaligned8), 23, 10}, {kCodeCopyMisaligned8, sizeof(kCodeCopyMisaligned8), 23, 10}, {kCodeCopyMisaligned10, sizeof(kCodeCopyMisaligned10), 23, 10}, {kCodeCopyMisaligned10, sizeof(kCodeCopyMisaligned10), 23, 10}, }}, {"Fill", { {kCodeFill7, sizeof(kCodeFill7), 19, 8}, {kCodeFill8, sizeof(kCodeFill8), 19, 8}, {kCodeFill8, sizeof(kCodeFill8), 19, 8}, {kCodeFill8, sizeof(kCodeFill8), 19, 8}, {kCodeFill10, sizeof(kCodeFill10), 19, 8}, {kCodeFill10, sizeof(kCodeFill10), 19, 8}, }}}; auto compiled_shader_it = compiled_shaders.find(func_name); assert(compiled_shader_it != compiled_shaders.end() && "Precompiled shader unavailable"); ASICShader* asic_shader = NULL; switch (isa_->GetMajorVersion()) { case 7: asic_shader = &compiled_shader_it->second.compute_7; break; case 8: asic_shader = &compiled_shader_it->second.compute_8; break; case 9: if((isa_->GetMinorVersion() == 0) && (isa_->GetStepping() == 10)) asic_shader = &compiled_shader_it->second.compute_90a; else asic_shader = &compiled_shader_it->second.compute_9; break; case 10: if(isa_->GetMinorVersion() == 1) asic_shader = &compiled_shader_it->second.compute_1010; else asic_shader = &compiled_shader_it->second.compute_10; break; default: assert(false && "Precompiled shader unavailable for target"); } // Allocate a GPU-visible buffer for the shader. size_t header_size = (assemble_target == AssembleTarget::AQL ? sizeof(amd_kernel_code_t) : 0); code_buf_size = AlignUp(header_size + asic_shader->size, 0x1000); code_buf = system_allocator()(code_buf_size, 0x1000, core::MemoryRegion::AllocateExecutable); assert(code_buf != NULL && "Code buffer allocation failed"); memset(code_buf, 0, code_buf_size); // Populate optional code object header. if (assemble_target == AssembleTarget::AQL) { amd_kernel_code_t* header = reinterpret_cast(code_buf); int gran_sgprs = std::max(0, (int(asic_shader->num_sgprs) - 1) / 8); int gran_vgprs = std::max(0, (int(asic_shader->num_vgprs) - 1) / 4); header->kernel_code_entry_byte_offset = sizeof(amd_kernel_code_t); AMD_HSA_BITS_SET(header->kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_KERNARG_SEGMENT_PTR, 1); AMD_HSA_BITS_SET(header->compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WAVEFRONT_SGPR_COUNT, gran_sgprs); AMD_HSA_BITS_SET(header->compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WORKITEM_VGPR_COUNT, gran_vgprs); AMD_HSA_BITS_SET(header->compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_DENORM_MODE_16_64, 3); AMD_HSA_BITS_SET(header->compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_ENABLE_IEEE_MODE, 1); AMD_HSA_BITS_SET(header->compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_USER_SGPR_COUNT, 2); AMD_HSA_BITS_SET(header->compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_ID_X, 1); if ((isa_->GetMajorVersion() == 9) && (isa_->GetMinorVersion() == 0) && (isa_->GetStepping() == 10)) { // Program COMPUTE_PGM_RSRC3.ACCUM_OFFSET for 0 ACC VGPRs on gfx90a. // FIXME: Assemble code objects from source at build time int gran_accvgprs = ((gran_vgprs + 1) * 8) / 4 - 1; header->max_scratch_backing_memory_byte_size = uint64_t(gran_accvgprs) << 32; } } // Copy shader code into the GPU-visible buffer. memcpy((void*)(uintptr_t(code_buf) + header_size), asic_shader->code, asic_shader->size); } void GpuAgent::ReleaseShader(void* code_buf, size_t code_buf_size) const { system_deallocator()(code_buf); } void GpuAgent::InitRegionList() { const bool is_apu_node = (properties_.NumCPUCores > 0); std::vector mem_props(properties_.NumMemoryBanks); if (HSAKMT_STATUS_SUCCESS == hsaKmtGetNodeMemoryProperties(node_id(), properties_.NumMemoryBanks, &mem_props[0])) { for (uint32_t mem_idx = 0; mem_idx < properties_.NumMemoryBanks; ++mem_idx) { // Ignore the one(s) with unknown size. if (mem_props[mem_idx].SizeInBytes == 0) { continue; } switch (mem_props[mem_idx].HeapType) { case HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE: case HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC: if (!is_apu_node) { mem_props[mem_idx].VirtualBaseAddress = 0; } memory_bus_width_ = mem_props[mem_idx].Width; memory_max_frequency_ = mem_props[mem_idx].MemoryClockMax; case HSA_HEAPTYPE_GPU_LDS: case HSA_HEAPTYPE_GPU_SCRATCH: { MemoryRegion* region = new MemoryRegion(false, false, false, this, mem_props[mem_idx]); regions_.push_back(region); if (region->IsLocalMemory()) { // Expose VRAM as uncached/fine grain over PCIe (if enabled) or XGMI. if ((properties_.HiveID != 0) || (core::Runtime::runtime_singleton_->flag().fine_grain_pcie())) { regions_.push_back(new MemoryRegion(true, false, false, this, mem_props[mem_idx])); } } break; } case HSA_HEAPTYPE_SYSTEM: if (is_apu_node) { memory_bus_width_ = mem_props[mem_idx].Width; memory_max_frequency_ = mem_props[mem_idx].MemoryClockMax; } break; case HSA_HEAPTYPE_MMIO_REMAP: // Remap offsets defined in kfd_ioctl.h HDP_flush_.HDP_MEM_FLUSH_CNTL = (uint32_t*)mem_props[mem_idx].VirtualBaseAddress; HDP_flush_.HDP_REG_FLUSH_CNTL = HDP_flush_.HDP_MEM_FLUSH_CNTL + 1; break; default: continue; } } } } void GpuAgent::InitScratchPool() { HsaMemFlags flags; flags.Value = 0; flags.ui32.Scratch = 1; flags.ui32.HostAccess = 1; scratch_per_thread_ = core::Runtime::runtime_singleton_->flag().scratch_mem_size(); if (scratch_per_thread_ == 0) scratch_per_thread_ = DEFAULT_SCRATCH_BYTES_PER_THREAD; // Scratch length is: waves/CU * threads/wave * queues * #CUs * // scratch/thread const uint32_t num_cu = properties_.NumFComputeCores / properties_.NumSIMDPerCU; queue_scratch_len_ = AlignUp(32 * 64 * num_cu * scratch_per_thread_, 65536); size_t max_scratch_len = queue_scratch_len_ * max_queues_; #if defined(HSA_LARGE_MODEL) && defined(__linux__) // For 64-bit linux use max queues unless otherwise specified if ((max_scratch_len == 0) || (max_scratch_len > 4294967296)) { max_scratch_len = 4294967296; // 4GB apeture max } #endif void* scratch_base; HSAKMT_STATUS err = hsaKmtAllocMemory(node_id(), max_scratch_len, flags, &scratch_base); assert(err == HSAKMT_STATUS_SUCCESS && "hsaKmtAllocMemory(Scratch) failed"); assert(IsMultipleOf(scratch_base, 0x1000) && "Scratch base is not page aligned!"); scratch_pool_. ~SmallHeap(); if (HSAKMT_STATUS_SUCCESS == err) { new (&scratch_pool_) SmallHeap(scratch_base, max_scratch_len); } else { new (&scratch_pool_) SmallHeap(); } } void GpuAgent::InitCacheList() { // Get GPU cache information. // Similar to getting CPU cache but here we use FComputeIdLo. cache_props_.resize(properties_.NumCaches); if (HSAKMT_STATUS_SUCCESS != hsaKmtGetNodeCacheProperties(node_id(), properties_.FComputeIdLo, properties_.NumCaches, &cache_props_[0])) { cache_props_.clear(); } else { // Only store GPU D-cache. for (size_t cache_id = 0; cache_id < cache_props_.size(); ++cache_id) { const HsaCacheType type = cache_props_[cache_id].CacheType; if (type.ui32.HSACU != 1 || type.ui32.Instruction == 1) { cache_props_.erase(cache_props_.begin() + cache_id); --cache_id; } } } // Update cache objects caches_.clear(); caches_.resize(cache_props_.size()); char name[64]; GetInfo(HSA_AGENT_INFO_NAME, name); std::string deviceName = name; for (size_t i = 0; i < caches_.size(); i++) caches_[i].reset(new core::Cache(deviceName + " L" + std::to_string(cache_props_[i].CacheLevel), cache_props_[i].CacheLevel, cache_props_[i].CacheSize)); } hsa_status_t GpuAgent::IterateRegion( hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const { return VisitRegion(true, callback, data); } hsa_status_t GpuAgent::IterateCache(hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* data) const { AMD::callback_t call(callback); for (size_t i = 0; i < caches_.size(); i++) { hsa_status_t stat = call(core::Cache::Convert(caches_[i].get()), data); if (stat != HSA_STATUS_SUCCESS) return stat; } return HSA_STATUS_SUCCESS; } hsa_status_t GpuAgent::VisitRegion(bool include_peer, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const { if (include_peer) { // Only expose system, local, and LDS memory of the blit agent. if (this->node_id() == core::Runtime::runtime_singleton_->region_gpu()->node_id()) { hsa_status_t stat = VisitRegion(regions_, callback, data); if (stat != HSA_STATUS_SUCCESS) { return stat; } } // Also expose system regions accessible by this agent. hsa_status_t stat = VisitRegion(core::Runtime::runtime_singleton_->system_regions_fine(), callback, data); if (stat != HSA_STATUS_SUCCESS) { return stat; } return VisitRegion( core::Runtime::runtime_singleton_->system_regions_coarse(), callback, data); } // Only expose system, local, and LDS memory of this agent. return VisitRegion(regions_, callback, data); } hsa_status_t GpuAgent::VisitRegion( const std::vector& regions, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) const { AMD::callback_t call(callback); for (const core::MemoryRegion* region : regions) { const AMD::MemoryRegion* amd_region = reinterpret_cast(region); // Only expose system, local, and LDS memory. if (amd_region->IsSystem() || amd_region->IsLocalMemory() || amd_region->IsLDS()) { hsa_region_t region_handle = core::MemoryRegion::Convert(region); hsa_status_t status = call(region_handle, data); if (status != HSA_STATUS_SUCCESS) { return status; } } } return HSA_STATUS_SUCCESS; } core::Queue* GpuAgent::CreateInterceptibleQueue(void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data) { // Disabled intercept of internal queues pending tools updates. core::Queue* queue = nullptr; QueueCreate(minAqlSize_, HSA_QUEUE_TYPE_MULTI, callback, data, 0, 0, &queue); if (queue != nullptr) core::Runtime::runtime_singleton_->InternalQueueCreateNotify(core::Queue::Convert(queue), this->public_handle()); return queue; } core::Blit* GpuAgent::CreateBlitSdma(bool use_xgmi) { AMD::BlitSdmaBase* sdma; switch (isa_->GetMajorVersion()) { case 7: case 8: sdma = new BlitSdmaV2V3(); break; case 9: sdma = new BlitSdmaV4(); break; case 10: sdma = new BlitSdmaV5(); break; default: assert(false && "Unexpected device major version."); return nullptr; } if (sdma->Initialize(*this, use_xgmi) != HSA_STATUS_SUCCESS) { sdma->Destroy(*this); delete sdma; sdma = nullptr; } return sdma; } core::Blit* GpuAgent::CreateBlitKernel(core::Queue* queue) { AMD::BlitKernel* kernl = new AMD::BlitKernel(queue); if (kernl->Initialize(*this) != HSA_STATUS_SUCCESS) { kernl->Destroy(*this); delete kernl; kernl = NULL; } return kernl; } void GpuAgent::InitDma() { // Setup lazy init pointers on queues and blits. auto queue_lambda = [this]() { auto ret = CreateInterceptibleQueue(); if (ret == nullptr) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Internal queue creation failed."); return ret; }; // Dedicated compute queue for host-to-device blits. queues_[QueueBlitOnly].reset(queue_lambda); // Share utility queue with device-to-host blits. queues_[QueueUtility].reset(queue_lambda); // Decide which engine to use for blits. auto blit_lambda = [this](bool use_xgmi, lazy_ptr& queue, bool isHostToDev) { Flag::SDMA_OVERRIDE sdma_override = core::Runtime::runtime_singleton_->flag().enable_sdma(); // User SDMA queues are unstable on gfx8 and unsupported on gfx1013. bool use_sdma = ((isa_->GetMajorVersion() != 8) && (isa_->GetVersion() != std::make_tuple(10, 1, 3))); if (sdma_override != Flag::SDMA_DEFAULT) use_sdma = (sdma_override == Flag::SDMA_ENABLE); if (use_sdma && (HSA_PROFILE_BASE == profile_)) { // On gfx90a ensure that HostToDevice queue is created first and so is placed on SDMA0. if ((!use_xgmi) && (!isHostToDev) && (isa_->GetMajorVersion() == 9) && (isa_->GetMinorVersion() == 0) && (isa_->GetStepping() == 10)) { *blits_[BlitHostToDev]; } auto ret = CreateBlitSdma(use_xgmi); if (ret != nullptr) return ret; } auto ret = CreateBlitKernel((*queue).get()); if (ret == nullptr) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Blit creation failed."); return ret; }; // Determine and instantiate the number of blit objects to // engage. The total number is sum of three plus number of // sdma-xgmi engines uint32_t blit_cnt_ = DefaultBlitCount + properties_.NumSdmaXgmiEngines; blits_.resize(blit_cnt_); // Initialize blit objects used for D2D, H2D, D2H, and // P2P copy operations. // -- Blit at index BlitDevToDev(0) deals with copies within // local framebuffer and always engages a Blit Kernel // -- Blit at index BlitHostToDev(1) deals with copies from // Host to Device (H2D) and could engage either a Blit // Kernel or sDMA // -- Blit at index BlitDevToHost(2) deals with copies from // Device to Host (D2H) and Peer to Peer (P2P) over PCIe. // It could engage either a Blit Kernel or sDMA // -- Blit at index DefaultBlitCount(3) and beyond deal // exclusively P2P over xGMI links blits_[BlitDevToDev].reset([this]() { auto ret = CreateBlitKernel((*queues_[QueueUtility]).get()); if (ret == nullptr) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Blit creation failed."); return ret; }); blits_[BlitHostToDev].reset( [blit_lambda, this]() { return blit_lambda(false, queues_[QueueBlitOnly], true); }); blits_[BlitDevToHost].reset( [blit_lambda, this]() { return blit_lambda(false, queues_[QueueUtility], false); }); // XGMI engines. for (uint32_t idx = DefaultBlitCount; idx < blit_cnt_; idx++) { blits_[idx].reset( [blit_lambda, this]() { return blit_lambda(true, queues_[QueueUtility], false); }); } // GWS queues. InitGWS(); } void GpuAgent::InitGWS() { gws_queue_.queue_.reset([this]() { if (properties_.NumGws == 0) return (core::Queue*)nullptr; std::unique_ptr queue(CreateInterceptibleQueue()); if (queue == nullptr) throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Internal queue creation failed."); auto err = static_cast(queue.get())->EnableGWS(1); if (err != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(err, "GWS allocation failed."); gws_queue_.ref_ct_ = 0; return queue.release(); }); } void GpuAgent::GWSRelease() { ScopedAcquire lock(&gws_queue_.lock_); gws_queue_.ref_ct_--; if (gws_queue_.ref_ct_ != 0) return; InitGWS(); } void GpuAgent::PreloadBlits() { for (auto& blit : blits_) { blit.touch(); } } hsa_status_t GpuAgent::PostToolsInit() { // Defer memory allocation until agents have been discovered. InitNumaAllocator(); InitScratchPool(); BindTrapHandler(); InitDma(); return HSA_STATUS_SUCCESS; } hsa_status_t GpuAgent::DmaCopy(void* dst, const void* src, size_t size) { return blits_[BlitDevToDev]->SubmitLinearCopyCommand(dst, src, size); } hsa_status_t GpuAgent::DmaCopy(void* dst, core::Agent& dst_agent, const void* src, core::Agent& src_agent, size_t size, std::vector& dep_signals, core::Signal& out_signal) { // Bind the Blit object that will drive this copy operation lazy_ptr& blit = GetBlitObject(dst_agent, src_agent, size); if (profiling_enabled()) { // Track the agent so we could translate the resulting timestamp to system // domain correctly. out_signal.async_copy_agent(core::Agent::Convert(this->public_handle())); } hsa_status_t stat = blit->SubmitLinearCopyCommand(dst, src, size, dep_signals, out_signal); return stat; } hsa_status_t GpuAgent::DmaCopyRect(const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, hsa_amd_copy_direction_t dir, std::vector& dep_signals, core::Signal& out_signal) { if (isa_->GetMajorVersion() < 9) return HSA_STATUS_ERROR_INVALID_AGENT; lazy_ptr& blit = (dir == hsaHostToDevice) ? blits_[BlitHostToDev] : blits_[BlitDevToHost]; if (!blit->isSDMA()) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; if (profiling_enabled()) { // Track the agent so we could translate the resulting timestamp to system // domain correctly. out_signal.async_copy_agent(core::Agent::Convert(this->public_handle())); } BlitSdmaBase* sdmaBlit = static_cast((*blit).get()); hsa_status_t stat = sdmaBlit->SubmitCopyRectCommand(dst, dst_offset, src, src_offset, range, dep_signals, out_signal); return stat; } hsa_status_t GpuAgent::DmaFill(void* ptr, uint32_t value, size_t count) { return blits_[BlitDevToDev]->SubmitLinearFillCommand(ptr, value, count); } hsa_status_t GpuAgent::EnableDmaProfiling(bool enable) { for (auto& blit : blits_) { if (!blit.empty()) { const hsa_status_t stat = blit->EnableProfiling(enable); if (stat != HSA_STATUS_SUCCESS) { return stat; } } } return HSA_STATUS_SUCCESS; } hsa_status_t GpuAgent::GetInfo(hsa_agent_info_t attribute, void* value) const { // agent, and vendor name size limit const size_t attribute_u = static_cast(attribute); // agent, and vendor name length limit excluding terminating nul character. constexpr size_t hsa_name_size = 63; switch (attribute_u) { case HSA_AGENT_INFO_NAME: { std::string name = isa_->GetProcessorName(); assert(name.size() <= hsa_name_size); std::memset(value, 0, hsa_name_size); char* temp = reinterpret_cast(value); std::strcpy(temp, name.c_str()); break; } case HSA_AGENT_INFO_VENDOR_NAME: std::memset(value, 0, hsa_name_size); std::memcpy(value, "AMD", sizeof("AMD")); break; case HSA_AGENT_INFO_FEATURE: *((hsa_agent_feature_t*)value) = HSA_AGENT_FEATURE_KERNEL_DISPATCH; break; case HSA_AGENT_INFO_MACHINE_MODEL: #if defined(HSA_LARGE_MODEL) *((hsa_machine_model_t*)value) = HSA_MACHINE_MODEL_LARGE; #else *((hsa_machine_model_t*)value) = HSA_MACHINE_MODEL_SMALL; #endif break; case HSA_AGENT_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES: case HSA_AGENT_INFO_DEFAULT_FLOAT_ROUNDING_MODE: *((hsa_default_float_rounding_mode_t*)value) = HSA_DEFAULT_FLOAT_ROUNDING_MODE_NEAR; break; case HSA_AGENT_INFO_FAST_F16_OPERATION: if (isa_->GetMajorVersion() >= 8) { *((bool*)value) = true; } else { *((bool*)value) = false; } break; case HSA_AGENT_INFO_PROFILE: *((hsa_profile_t*)value) = profile_; break; case HSA_AGENT_INFO_WAVEFRONT_SIZE: *((uint32_t*)value) = properties_.WaveFrontSize; break; case HSA_AGENT_INFO_WORKGROUP_MAX_DIM: { // TODO: must be per-device const uint16_t group_size[3] = {1024, 1024, 1024}; std::memcpy(value, group_size, sizeof(group_size)); } break; case HSA_AGENT_INFO_WORKGROUP_MAX_SIZE: // TODO: must be per-device *((uint32_t*)value) = 1024; break; case HSA_AGENT_INFO_GRID_MAX_DIM: { const hsa_dim3_t grid_size = {UINT32_MAX, UINT32_MAX, UINT32_MAX}; std::memcpy(value, &grid_size, sizeof(hsa_dim3_t)); } break; case HSA_AGENT_INFO_GRID_MAX_SIZE: *((uint32_t*)value) = UINT32_MAX; break; case HSA_AGENT_INFO_FBARRIER_MAX_SIZE: // TODO: to confirm *((uint32_t*)value) = 32; break; case HSA_AGENT_INFO_QUEUES_MAX: *((uint32_t*)value) = max_queues_; break; case HSA_AGENT_INFO_QUEUE_MIN_SIZE: *((uint32_t*)value) = minAqlSize_; break; case HSA_AGENT_INFO_QUEUE_MAX_SIZE: *((uint32_t*)value) = maxAqlSize_; break; case HSA_AGENT_INFO_QUEUE_TYPE: *((hsa_queue_type32_t*)value) = HSA_QUEUE_TYPE_MULTI; break; case HSA_AGENT_INFO_NODE: // TODO: associate with OS NUMA support (numactl / GetNumaProcessorNode). *((uint32_t*)value) = node_id(); break; case HSA_AGENT_INFO_DEVICE: *((hsa_device_type_t*)value) = HSA_DEVICE_TYPE_GPU; break; case HSA_AGENT_INFO_CACHE_SIZE: { std::memset(value, 0, sizeof(uint32_t) * 4); assert(cache_props_.size() > 0 && "GPU cache info missing."); const size_t num_cache = cache_props_.size(); for (size_t i = 0; i < num_cache; ++i) { const uint32_t line_level = cache_props_[i].CacheLevel; if (reinterpret_cast(value)[line_level - 1] == 0) reinterpret_cast(value)[line_level - 1] = cache_props_[i].CacheSize * 1024; } } break; case HSA_AGENT_INFO_ISA: *((hsa_isa_t*)value) = core::Isa::Handle(isa_); break; case HSA_AGENT_INFO_EXTENSIONS: { memset(value, 0, sizeof(uint8_t) * 128); auto setFlag = [&](uint32_t bit) { assert(bit < 128 * 8 && "Extension value exceeds extension bitmask"); uint index = bit / 8; uint subBit = bit % 8; ((uint8_t*)value)[index] |= 1 << subBit; }; if (core::hsa_internal_api_table_.finalizer_api.hsa_ext_program_finalize_fn != NULL) { setFlag(HSA_EXTENSION_FINALIZER); } if (core::hsa_internal_api_table_.image_api.hsa_ext_image_create_fn != NULL) { setFlag(HSA_EXTENSION_IMAGES); } if (os::LibHandle lib = os::LoadLib(kAqlProfileLib)) { os::CloseLib(lib); setFlag(HSA_EXTENSION_AMD_AQLPROFILE); } setFlag(HSA_EXTENSION_AMD_PROFILER); break; } case HSA_AGENT_INFO_VERSION_MAJOR: *((uint16_t*)value) = 1; break; case HSA_AGENT_INFO_VERSION_MINOR: *((uint16_t*)value) = 1; break; case HSA_EXT_AGENT_INFO_IMAGE_1D_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_1DA_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_1DB_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_2D_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_2DA_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_2DDEPTH_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_2DADEPTH_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_3D_MAX_ELEMENTS: case HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS: return hsa_amd_image_get_info_max_dim(public_handle(), attribute, value); case HSA_EXT_AGENT_INFO_MAX_IMAGE_RD_HANDLES: // TODO: hardcode based on OCL constants. *((uint32_t*)value) = 128; break; case HSA_EXT_AGENT_INFO_MAX_IMAGE_RORW_HANDLES: // TODO: hardcode based on OCL constants. *((uint32_t*)value) = 64; break; case HSA_EXT_AGENT_INFO_MAX_SAMPLER_HANDLERS: // TODO: hardcode based on OCL constants. *((uint32_t*)value) = 16; case HSA_AMD_AGENT_INFO_CHIP_ID: *((uint32_t*)value) = properties_.DeviceId; break; case HSA_AMD_AGENT_INFO_CACHELINE_SIZE: // TODO: hardcode for now. // GCN whitepaper: cache line size is 64 byte long. *((uint32_t*)value) = 64; break; case HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT: *((uint32_t*)value) = (properties_.NumFComputeCores / properties_.NumSIMDPerCU); break; case HSA_AMD_AGENT_INFO_MAX_CLOCK_FREQUENCY: *((uint32_t*)value) = properties_.MaxEngineClockMhzFCompute; break; case HSA_AMD_AGENT_INFO_DRIVER_NODE_ID: *((uint32_t*)value) = node_id(); break; case HSA_AMD_AGENT_INFO_MAX_ADDRESS_WATCH_POINTS: *((uint32_t*)value) = static_cast( 1 << properties_.Capability.ui32.WatchPointsTotalBits); break; case HSA_AMD_AGENT_INFO_BDFID: *((uint32_t*)value) = static_cast(properties_.LocationId); break; case HSA_AMD_AGENT_INFO_MEMORY_WIDTH: *((uint32_t*)value) = memory_bus_width_; break; case HSA_AMD_AGENT_INFO_MEMORY_MAX_FREQUENCY: *((uint32_t*)value) = memory_max_frequency_; break; // The code copies HsaNodeProperties.MarketingName a Unicode string // which is encoded in UTF-16 as a 7-bit ASCII string case HSA_AMD_AGENT_INFO_PRODUCT_NAME: { std::memset(value, 0, HSA_PUBLIC_NAME_SIZE); char* temp = reinterpret_cast(value); for (uint32_t idx = 0; properties_.MarketingName[idx] != 0 && idx < HSA_PUBLIC_NAME_SIZE - 1; idx++) { temp[idx] = (uint8_t)properties_.MarketingName[idx]; } break; } case HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU: *((uint32_t*)value) = static_cast( properties_.NumSIMDPerCU * properties_.MaxWavesPerSIMD); break; case HSA_AMD_AGENT_INFO_NUM_SIMDS_PER_CU: *((uint32_t*)value) = properties_.NumSIMDPerCU; break; case HSA_AMD_AGENT_INFO_NUM_SHADER_ENGINES: *((uint32_t*)value) = properties_.NumShaderBanks; break; case HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE: *((uint32_t*)value) = properties_.NumArrays; break; case HSA_AMD_AGENT_INFO_HDP_FLUSH: *((hsa_amd_hdp_flush_t*)value) = HDP_flush_; break; case HSA_AMD_AGENT_INFO_DOMAIN: *((uint32_t*)value) = static_cast(properties_.Domain); break; case HSA_AMD_AGENT_INFO_COOPERATIVE_QUEUES: *((bool*)value) = properties_.NumGws != 0; break; case HSA_AMD_AGENT_INFO_UUID: { uint64_t uuid_value = static_cast(properties_.UniqueID); // Either device does not support UUID e.g. a Gfx8 device, // or runtime is using an older thunk library that does not // support UUID's if (uuid_value == 0) { char uuid_tmp[] = "GPU-XX"; snprintf((char*)value, sizeof(uuid_tmp), "%s", uuid_tmp); break; } // Device supports UUID, build UUID string to return std::stringstream ss; ss << "GPU-" << std::setfill('0') << std::setw(sizeof(uint64_t) * 2) << std::hex << uuid_value; snprintf((char*)value, (ss.str().length() + 1), "%s", (char*)ss.str().c_str()); break; } case HSA_AMD_AGENT_INFO_ASIC_REVISION: *((uint32_t*)value) = static_cast(properties_.Capability.ui32.ASICRevision); break; case HSA_AMD_AGENT_INFO_SVM_DIRECT_HOST_ACCESS: assert(regions_.size() != 0 && "No device local memory found!"); *((bool*)value) = properties_.Capability.ui32.CoherentHostAccess == 1; case HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT: if (core::Runtime::runtime_singleton_->flag().coop_cu_count() && (isa_->GetMajorVersion() == 9) && (isa_->GetMinorVersion() == 0) && (isa_->GetStepping() == 10)) { uint32_t count = 0; hsa_status_t err = GetInfo((hsa_agent_info_t)HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, &count); assert(err == HSA_STATUS_SUCCESS && "CU count query failed."); *((uint32_t*)value) = (count & 0xFFFFFFF8) - 8; // value = floor(count/8)*8-8 break; } return GetInfo((hsa_agent_info_t)HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, value); default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; break; } return HSA_STATUS_SUCCESS; } hsa_status_t GpuAgent::QueueCreate(size_t size, hsa_queue_type32_t queue_type, core::HsaEventCallback event_callback, void* data, uint32_t private_segment_size, uint32_t group_segment_size, core::Queue** queue) { // Handle GWS queues. if (queue_type == HSA_QUEUE_TYPE_COOPERATIVE) { ScopedAcquire lock(&gws_queue_.lock_); auto ret = (*gws_queue_.queue_).get(); if (ret != nullptr) { gws_queue_.ref_ct_++; *queue = ret; return HSA_STATUS_SUCCESS; } return HSA_STATUS_ERROR_INVALID_QUEUE_CREATION; } // AQL queues must be a power of two in length. if (!IsPowerOfTwo(size)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } // Enforce max size if (size > maxAqlSize_) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } // Allocate scratch memory ScratchInfo scratch = {0}; if (private_segment_size == UINT_MAX) { private_segment_size = (profile_ == HSA_PROFILE_BASE) ? 0 : scratch_per_thread_; } if (private_segment_size > 262128) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } scratch.lanes_per_wave = 64; scratch.size_per_thread = AlignUp(private_segment_size, 1024 / scratch.lanes_per_wave); if (scratch.size_per_thread > 262128) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } scratch.size_per_thread = private_segment_size; const uint32_t num_cu = properties_.NumFComputeCores / properties_.NumSIMDPerCU; scratch.size = scratch.size_per_thread * properties_.MaxSlotsScratchCU * scratch.lanes_per_wave * num_cu; scratch.queue_base = nullptr; scratch.queue_process_offset = 0; MAKE_NAMED_SCOPE_GUARD(scratchGuard, [&]() { ReleaseQueueScratch(scratch); }); if (scratch.size != 0) { AcquireQueueScratch(scratch); if (scratch.queue_base == nullptr) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } } // Ensure utility queue has been created. // Deferring longer risks exhausting queue count before ISA upload and invalidation capability is // ensured. queues_[QueueUtility].touch(); // Create an HW AQL queue auto aql_queue = new AqlQueue(this, size, node_id(), scratch, event_callback, data, is_kv_device_); *queue = aql_queue; if (doorbell_queue_map_) { // Calculate index of the queue doorbell within the doorbell aperture. auto doorbell_addr = uintptr_t(aql_queue->signal_.hardware_doorbell_ptr); auto doorbell_idx = (doorbell_addr >> 3) & (MAX_NUM_DOORBELLS - 1); doorbell_queue_map_[doorbell_idx] = &aql_queue->amd_queue_; } scratchGuard.Dismiss(); return HSA_STATUS_SUCCESS; } void GpuAgent::AcquireQueueScratch(ScratchInfo& scratch) { assert(scratch.queue_base == nullptr && "AcquireQueueScratch called while holding scratch."); bool need_queue_scratch_base = (isa_->GetMajorVersion() > 8); if (scratch.size == 0) { scratch.size = queue_scratch_len_; scratch.size_per_thread = scratch_per_thread_; } scratch.retry = false; // Fail scratch allocation if per wave limits are exceeded. uint64_t size_per_wave = AlignUp(scratch.size_per_thread * properties_.WaveFrontSize, 1024); if (size_per_wave > MAX_WAVE_SCRATCH) return; /* Determine size class needed. Scratch allocations come in two flavors based on how it is retired. Small allocations may be kept bound to a queue and reused by firmware. This memory can not be reclaimed by the runtime on demand so must be kept small to avoid egregious OOM conditions. Other allocations, aka large, may be used by firmware only for one dispatch and are then surrendered to the runtime. This has significant latency so we don't want to make all scratch allocations large (ie single use). Note that the designation "large" is for contrast with "small", which must really be small amounts of memory, and does not always imply a large quantity of memory is needed. Other properties of the allocation may require single use and so qualify the allocation or use as "large". Here we decide on the boundaries for small scratch allocations. Both the largest small single allocation and the maximum amount of memory bound by small allocations are limited. Additionally some legacy devices do not support large scratch. For small scratch we must allocate enough memory for every physical scratch slot. For large scratch compute the minimum memory needed to run the dispatch without limiting occupancy. Limit total bound small scratch allocations to 1/8th of scratch pool and 1/4 of that for a single allocation. */ ScopedAcquire lock(&scratch_lock_); size_t small_limit = scratch_pool_.size() >> 3; // Lift limit for 2.10 release RCCL workaround. size_t single_limit = 146800640; //small_limit >> 2; bool use_reclaim = true; bool large = (scratch.size > single_limit) || (scratch_pool_.size() - scratch_pool_.remaining() - scratch_cache_.free_bytes() + scratch.size > small_limit); if ((isa_->GetMajorVersion() < 8) || core::Runtime::runtime_singleton_->flag().no_scratch_reclaim()) { large = false; use_reclaim = false; } // If large is selected then the scratch will not be retained. // In that case allocate the minimum necessary for the dispatch since we don't need all slots. if (large) scratch.size = scratch.dispatch_size; // Ensure mapping will be in whole pages. scratch.size = AlignUp(scratch.size, 4096); /* Sequence of attempts is: check cache attempt a new allocation trim unused blocks from cache attempt a new allocation check cache for sufficient used block, steal and wait (not implemented) trim used blocks from cache, evaluate retry reduce occupancy */ // Lambda called in place. // Used to allow exit from nested loops. [&]() { // Check scratch cache scratch.large = large; if (scratch_cache_.alloc(scratch)) return; // Attempt new allocation. for (int i = 0; i < 2; i++) { if (large) scratch.queue_base = scratch_pool_.alloc_high(scratch.size); else scratch.queue_base = scratch_pool_.alloc(scratch.size); scratch.large = large | (scratch.queue_base > scratch_pool_.high_split()); assert(((!scratch.large) | use_reclaim) && "Large scratch used with reclaim disabled."); if (scratch.queue_base != nullptr) { HSAuint64 alternate_va; if ((profile_ == HSA_PROFILE_FULL) || (hsaKmtMapMemoryToGPU(scratch.queue_base, scratch.size, &alternate_va) == HSAKMT_STATUS_SUCCESS)) { if (scratch.large) scratch_used_large_ += scratch.size; scratch_cache_.insert(scratch); return; } } // Scratch request failed allocation or mapping. scratch_pool_.free(scratch.queue_base); scratch.queue_base = nullptr; // Release cached scratch and retry. // First iteration trims unused blocks, second trims all. scratch_cache_.trim(i == 1); } // Retry if large may yield needed space. if (scratch_used_large_ != 0) { if (AddScratchNotifier(scratch.queue_retry, 0x8000000000000000ull)) scratch.retry = true; return; } // Fail scratch allocation if reducing occupancy is disabled. if ((!use_reclaim) || core::Runtime::runtime_singleton_->flag().no_scratch_thread_limiter()) return; // Attempt to trim the maximum number of concurrent waves to allow scratch to fit. if (core::Runtime::runtime_singleton_->flag().enable_queue_fault_message()) debug_print("Failed to map requested scratch (%ld) - reducing queue occupancy.\n", scratch.size); const uint64_t num_cus = properties_.NumFComputeCores / properties_.NumSIMDPerCU; const uint64_t total_waves = scratch.size / size_per_wave; uint64_t waves_per_cu = total_waves / num_cus; while (waves_per_cu != 0) { size_t size = waves_per_cu * num_cus * size_per_wave; void* base = scratch_pool_.alloc_high(size); HSAuint64 alternate_va; if ((base != nullptr) && ((profile_ == HSA_PROFILE_FULL) || (hsaKmtMapMemoryToGPU(base, size, &alternate_va) == HSAKMT_STATUS_SUCCESS))) { // Scratch allocated and either full profile or map succeeded. scratch.queue_base = base; scratch.size = size; scratch.large = true; scratch_used_large_ += scratch.size; scratch_cache_.insert(scratch); if (core::Runtime::runtime_singleton_->flag().enable_queue_fault_message()) debug_print(" %ld scratch mapped, %.2f%% occupancy.\n", scratch.size, float(waves_per_cu * num_cus) / scratch.wanted_slots * 100.0f); return; } scratch_pool_.free(base); waves_per_cu = waves_per_cu - scratch.waves_per_group; } // Failed to allocate minimal scratch assert(scratch.queue_base == nullptr && "bad scratch data"); if (core::Runtime::runtime_singleton_->flag().enable_queue_fault_message()) debug_print(" Could not allocate scratch for one wave per CU.\n"); return; }(); scratch.queue_process_offset = need_queue_scratch_base ? uintptr_t(scratch.queue_base) : uintptr_t(scratch.queue_base) - uintptr_t(scratch_pool_.base()); } void GpuAgent::ReleaseQueueScratch(ScratchInfo& scratch) { if (scratch.queue_base == nullptr) return; ScopedAcquire lock(&scratch_lock_); scratch_cache_.free(scratch); scratch.queue_base = nullptr; } void GpuAgent::ReleaseScratch(void* base, size_t size, bool large) { if (profile_ == HSA_PROFILE_BASE) { if (HSAKMT_STATUS_SUCCESS != hsaKmtUnmapMemoryToGPU(base)) { assert(false && "Unmap scratch subrange failed!"); } } scratch_pool_.free(base); if (large) scratch_used_large_ -= size; // Notify waiters that additional scratch may be available. for (auto notifier : scratch_notifiers_) { HSA::hsa_signal_or_relaxed(notifier.first, notifier.second); } ClearScratchNotifiers(); } void GpuAgent::TranslateTime(core::Signal* signal, hsa_amd_profiling_dispatch_time_t& time) { uint64_t start, end; signal->GetRawTs(false, start, end); // Order is important, we want to translate the end time first to ensure that packet duration is // not impacted by clock measurement latency jitter. time.end = TranslateTime(end); time.start = TranslateTime(start); if ((start == 0) || (end == 0) || (start < t0_.GPUClockCounter) || (end < t0_.GPUClockCounter)) debug_print("Signal %p time stamps may be invalid.\n", &signal->signal_); } void GpuAgent::TranslateTime(core::Signal* signal, hsa_amd_profiling_async_copy_time_t& time) { uint64_t start, end; signal->GetRawTs(true, start, end); // Order is important, we want to translate the end time first to ensure that packet duration is // not impacted by clock measurement latency jitter. time.end = TranslateTime(end); time.start = TranslateTime(start); if ((start == 0) || (end == 0) || (start < t0_.GPUClockCounter) || (end < t0_.GPUClockCounter)) debug_print("Signal %p time stamps may be invalid.\n", &signal->signal_); } /* Times during program execution are interpolated to adjust for relative clock drift. Interval timing may appear as ticks well before process start, leading to large errors due to frequency adjustment (ie the profiling with NTP problem). This is fixed by using a fixed frequency for early times. Intervals larger than t0_ will be frequency adjusted. This admits a numerical error of not more than twice the frequency stability (~10^-5). */ uint64_t GpuAgent::TranslateTime(uint64_t tick) { // Only allow short (error bounded) extrapolation for times during program execution. // Limit errors due to relative frequency drift to ~0.5us. Sync clocks at 16Hz. const int64_t max_extrapolation = core::Runtime::runtime_singleton_->sys_clock_freq() >> 4; ScopedAcquire lock(&t1_lock_); // Limit errors due to correlated pair certainty to ~0.5us. // extrapolated time < (0.5us / half clock read certainty) * delay between clock measures // clock read certainty is <4us. if (((t1_.GPUClockCounter - t0_.GPUClockCounter) >> 2) + t1_.GPUClockCounter < tick) SyncClocks(); // Good for ~300 yrs // uint64_t sysdelta = t1_.SystemClockCounter - t0_.SystemClockCounter; // uint64_t gpudelta = t1_.GPUClockCounter - t0_.GPUClockCounter; // int64_t offtick = int64_t(tick - t1_.GPUClockCounter); //__int128 num = __int128(sysdelta)*__int128(offtick) + //__int128(gpudelta)*__int128(t1_.SystemClockCounter); //__int128 sysLarge = num / __int128(gpudelta); // return sysLarge; // Good for ~3.5 months. uint64_t system_tick = 0; int64_t elapsed = 0; double ratio; // Valid ticks only need at most one SyncClocks. for (int i = 0; i < 2; i++) { ratio = double(t1_.SystemClockCounter - t0_.SystemClockCounter) / double(t1_.GPUClockCounter - t0_.GPUClockCounter); elapsed = int64_t(ratio * double(int64_t(tick - t1_.GPUClockCounter))); // Skip clock sync if under the extrapolation limit. if (elapsed < max_extrapolation) break; SyncClocks(); } system_tick = uint64_t(elapsed) + t1_.SystemClockCounter; // tick predates HSA startup - extrapolate with fixed clock ratio if (tick < t0_.GPUClockCounter) { if (historical_clock_ratio_ == 0.0) historical_clock_ratio_ = ratio; system_tick = uint64_t(historical_clock_ratio_ * double(int64_t(tick - t0_.GPUClockCounter))) + t0_.SystemClockCounter; } return system_tick; } bool GpuAgent::current_coherency_type(hsa_amd_coherency_type_t type) { if (!is_kv_device_) { current_coherency_type_ = type; return true; } ScopedAcquire Lock(&coherency_lock_); if (ape1_base_ == 0 && ape1_size_ == 0) { static const size_t kApe1Alignment = 64 * 1024; ape1_size_ = kApe1Alignment; ape1_base_ = reinterpret_cast( _aligned_malloc(ape1_size_, kApe1Alignment)); assert((ape1_base_ != 0) && ("APE1 allocation failed")); } else if (type == current_coherency_type_) { return true; } HSA_CACHING_TYPE type0, type1; if (type == HSA_AMD_COHERENCY_TYPE_COHERENT) { type0 = HSA_CACHING_CACHED; type1 = HSA_CACHING_NONCACHED; } else { type0 = HSA_CACHING_NONCACHED; type1 = HSA_CACHING_CACHED; } if (hsaKmtSetMemoryPolicy(node_id(), type0, type1, reinterpret_cast(ape1_base_), ape1_size_) != HSAKMT_STATUS_SUCCESS) { return false; } current_coherency_type_ = type; return true; } uint16_t GpuAgent::GetMicrocodeVersion() const { return (properties_.EngineId.ui32.uCode); } uint16_t GpuAgent::GetSdmaMicrocodeVersion() const { return (properties_.uCodeEngineVersions.uCodeSDMA); } void GpuAgent::SyncClocks() { HSAKMT_STATUS err = hsaKmtGetClockCounters(node_id(), &t1_); assert(err == HSAKMT_STATUS_SUCCESS && "hsaGetClockCounters error"); } void GpuAgent::BindTrapHandler() { if (isa_->GetMajorVersion() == 7) { // No trap handler support on Gfx7, soft error. return; } // Assemble the trap handler source code. void* tma_addr = nullptr; uint64_t tma_size = 0; if (core::Runtime::runtime_singleton_->KfdVersion().supports_exception_debugging) { AssembleShader("TrapHandlerKfdExceptions", AssembleTarget::ISA, trap_code_buf_, trap_code_buf_size_); } else { AssembleShader("TrapHandler", AssembleTarget::ISA, trap_code_buf_, trap_code_buf_size_); // Make an empty map from doorbell index to queue. // The trap handler uses this to retrieve a wave's amd_queue_t*. auto doorbell_queue_map_size = MAX_NUM_DOORBELLS * sizeof(amd_queue_t*); doorbell_queue_map_ = (amd_queue_t**)system_allocator()(doorbell_queue_map_size, 0x1000, 0); assert(doorbell_queue_map_ != NULL && "Doorbell queue map allocation failed"); memset(doorbell_queue_map_, 0, doorbell_queue_map_size); tma_addr = doorbell_queue_map_; tma_size = doorbell_queue_map_size; } // Bind the trap handler to this node. HSAKMT_STATUS err = hsaKmtSetTrapHandler(node_id(), trap_code_buf_, trap_code_buf_size_, tma_addr, tma_size); assert(err == HSAKMT_STATUS_SUCCESS && "hsaKmtSetTrapHandler() failed"); } void GpuAgent::InvalidateCodeCaches() { // Check for microcode cache invalidation support. // This is deprecated in later microcode builds. if (isa_->GetMajorVersion() == 7) { if (properties_.EngineId.ui32.uCode < 420) { // Microcode is handling code cache invalidation. return; } } else if (isa_->GetMajorVersion() == 8 && isa_->GetMinorVersion() == 0) { if (properties_.EngineId.ui32.uCode < 685) { // Microcode is handling code cache invalidation. return; } } else if (isa_->GetMajorVersion() > 10) { assert(false && "Code cache invalidation not implemented for this agent"); } // Invalidate caches which may hold lines of code object allocation. uint32_t cache_inv[8] = {0}; uint32_t cache_inv_size_dw; if (isa_->GetMajorVersion() < 10) { cache_inv[1] = PM4_ACQUIRE_MEM_DW1_COHER_CNTL( PM4_ACQUIRE_MEM_COHER_CNTL_SH_ICACHE_ACTION_ENA | PM4_ACQUIRE_MEM_COHER_CNTL_SH_KCACHE_ACTION_ENA | PM4_ACQUIRE_MEM_COHER_CNTL_TC_ACTION_ENA | PM4_ACQUIRE_MEM_COHER_CNTL_TC_WB_ACTION_ENA); cache_inv_size_dw = 7; } else { cache_inv[7] = PM4_ACQUIRE_MEM_DW7_GCR_CNTL( PM4_ACQUIRE_MEM_GCR_CNTL_GLI_INV(1) | PM4_ACQUIRE_MEM_GCR_CNTL_GLK_INV | PM4_ACQUIRE_MEM_GCR_CNTL_GLV_INV | PM4_ACQUIRE_MEM_GCR_CNTL_GL1_INV | PM4_ACQUIRE_MEM_GCR_CNTL_GL2_INV); cache_inv_size_dw = 8; } cache_inv[0] = PM4_HDR(PM4_HDR_IT_OPCODE_ACQUIRE_MEM, cache_inv_size_dw, isa_->GetMajorVersion()); cache_inv[2] = PM4_ACQUIRE_MEM_DW2_COHER_SIZE(0xFFFFFFFF); cache_inv[3] = PM4_ACQUIRE_MEM_DW3_COHER_SIZE_HI(0xFF); // Submit the command to the utility queue and wait for it to complete. queues_[QueueUtility]->ExecutePM4(cache_inv, cache_inv_size_dw * sizeof(uint32_t)); } lazy_ptr& GpuAgent::GetXgmiBlit(const core::Agent& dst_agent) { // Determine if destination is a member xgmi peers list uint32_t xgmi_engine_cnt = properties_.NumSdmaXgmiEngines; assert((xgmi_engine_cnt > 0) && ("Illegal condition, should not happen")); ScopedAcquire lock(&xgmi_peer_list_lock_); for (uint32_t idx = 0; idx < xgmi_peer_list_.size(); idx++) { uint64_t dst_handle = dst_agent.public_handle().handle; uint64_t peer_handle = xgmi_peer_list_[idx]->public_handle().handle; if (peer_handle == dst_handle) { return blits_[(idx % xgmi_engine_cnt) + DefaultBlitCount]; } } // Add agent to the xGMI neighbours list xgmi_peer_list_.push_back(&dst_agent); return blits_[((xgmi_peer_list_.size() - 1) % xgmi_engine_cnt) + DefaultBlitCount]; } lazy_ptr& GpuAgent::GetPcieBlit(const core::Agent& dst_agent, const core::Agent& src_agent) { lazy_ptr& blit = (src_agent.device_type() == core::Agent::kAmdCpuDevice && dst_agent.device_type() == core::Agent::kAmdGpuDevice) ? blits_[BlitHostToDev] // CPU->GPU transfer. : (src_agent.device_type() == core::Agent::kAmdGpuDevice && dst_agent.device_type() == core::Agent::kAmdCpuDevice) ? blits_[BlitDevToHost] // GPU->CPU transfer. : blits_[BlitDevToHost]; // GPU->GPU transfer. return blit; } lazy_ptr& GpuAgent::GetBlitObject(const core::Agent& dst_agent, const core::Agent& src_agent, const size_t size) { // At this point it is guaranteed that one of // the two devices is a GPU, potentially both assert(((src_agent.device_type() == core::Agent::kAmdGpuDevice) || (dst_agent.device_type() == core::Agent::kAmdGpuDevice)) && ("Both devices are CPU agents which is not expected")); // Determine if Src and Dst devices are same if ((src_agent.public_handle().handle) == (dst_agent.public_handle().handle)) { // If the copy is very small then cache flush overheads can dominate. // Choose a (potentially) SDMA enabled engine to avoid cache flushing. if (size < core::Runtime::runtime_singleton_->flag().force_sdma_size()) { return blits_[BlitDevToHost]; } return blits_[BlitDevToDev]; } // Acquire Hive Id of Src and Dst devices uint64_t src_hive_id = src_agent.HiveId(); uint64_t dst_hive_id = dst_agent.HiveId(); // Bind to a PCIe facing Blit object if the two // devices have different Hive Ids. This can occur // for following scenarios: // // Neither device claims membership in a Hive // srcId = 0 <-> dstId = 0; // // Src device claims membership in a Hive // srcId = 0x1926 <-> dstId = 0; // // Dst device claims membership in a Hive // srcId = 0 <-> dstId = 0x1123; // // Both device claims membership in a Hive // and the Hives are different // srcId = 0x1926 <-> dstId = 0x1123; // if ((dst_hive_id != src_hive_id) || (dst_hive_id == 0)) { return GetPcieBlit(dst_agent, src_agent); } // Accommodates platforms where devices have xGMI // links but without sdmaXgmiEngines e.g. Vega 20 if (properties_.NumSdmaXgmiEngines == 0) { return GetPcieBlit(dst_agent, src_agent); } return GetXgmiBlit(dst_agent); } void GpuAgent::Trim() { Agent::Trim(); ScopedAcquire lock(&scratch_lock_); scratch_cache_.trim(false); } void GpuAgent::InitNumaAllocator() { Agent* nearCpu = nullptr; uint32_t dist = -1u; for (auto cpu : core::Runtime::runtime_singleton_->cpu_agents()) { const core::Runtime::LinkInfo link_info = core::Runtime::runtime_singleton_->GetLinkInfo(node_id(), cpu->node_id()); if (link_info.info.numa_distance < dist) { dist = link_info.info.numa_distance; nearCpu = cpu; } } for (auto pool : nearCpu->regions()) { if (pool->kernarg()) { system_allocator_ = [pool](size_t size, size_t alignment, MemoryRegion::AllocateFlags alloc_flags) -> void* { assert(alignment <= 4096); void* ptr = nullptr; return (HSA_STATUS_SUCCESS == core::Runtime::runtime_singleton_->AllocateMemory(pool, size, alloc_flags, &ptr)) ? ptr : nullptr; }; system_deallocator_ = [](void* ptr) { core::Runtime::runtime_singleton_->FreeMemory(ptr); }; return; } } assert(false && "Nearest NUMA node did not have a kernarg pool."); } } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_hsa_loader.cpp000066400000000000000000000265471420110115200234710ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_hsa_loader.hpp" #include "core/inc/runtime.h" #include #include #include #include #include #include #include #include #include #include #include namespace { #if !defined(_WIN32) && !defined(_WIN64) uintptr_t PAGE_SIZE_MASK{ [] () { uintptr_t page_size = sysconf(_SC_PAGE_SIZE); if (page_size == -1) { page_size = 1 << 12; // Default page size to 4KiB. } return ~(page_size - 1); } () }; #endif std::string EncodePathname(const char *file_path) { std::ostringstream ss; unsigned char c; ss.fill('0'); ss << "file://"; while ((c = *file_path++) != '\0') { if (isalnum(c) || c == '/' || c == '-' || c == '_' || c == '.' || c == '~') { ss << c; } else { ss << std::uppercase; ss << '%' << std::hex << std::setw(2) << static_cast(c); ss << std::nouppercase; } } return ss.str(); } std::string GetUriFromMemoryAddress(const void *memory, size_t size) { pid_t pid = getpid(); std::ostringstream uri_stream; uri_stream << "memory://" << pid << "#offset=0x" << std::hex << (uintptr_t)memory << std::dec << "&size=" << size; return uri_stream.str(); } std::string GetUriFromMemoryInExecutableFile(const void *memory, size_t size) { #if !defined(_WIN32) && !defined(_WIN64) uintptr_t address = reinterpret_cast(memory); struct callback_data_s { ElfW(Addr) address; size_t callback_num; const char *file_path; size_t file_offset; } callback_data{address, 0, nullptr, 0}; // Iterate the loaded shared objects program headers to see if the ELF binary // is allocated in a mapped file. if (dl_iterate_phdr([](struct dl_phdr_info *info, size_t size, void *ptr) -> int { struct callback_data_s *callback_data = (struct callback_data_s *) ptr; const ElfW(Addr) elf_address = callback_data->address - info->dlpi_addr; int n = info->dlpi_phnum; while (--n >= 0) { if (info->dlpi_phdr[n].p_type == PT_LOAD && elf_address - info->dlpi_phdr[n].p_vaddr >= 0 && elf_address - info->dlpi_phdr[n].p_vaddr < info->dlpi_phdr[n].p_memsz) { // The first callback is always the program executable. if (!info->dlpi_name[0] && callback_data->callback_num == 0) { static char argv0[PATH_MAX] = {0}; if (!argv0[0] && readlink("/proc/self/exe", argv0, sizeof(argv0)) == -1) return 0; callback_data->file_path = argv0; } else { callback_data->file_path = info->dlpi_name; } callback_data->file_offset = elf_address - info->dlpi_phdr[n].p_vaddr + info->dlpi_phdr[n].p_offset; return 1; } } ++callback_data->callback_num; return 0; }, &callback_data)) { if (!callback_data.file_path || callback_data.file_path[0] == '\0') { return GetUriFromMemoryAddress(memory, size); } std::ostringstream uri_stream; uri_stream << EncodePathname(callback_data.file_path); uri_stream << "#offset=" << callback_data.file_offset; uri_stream << "&size=" << size; return uri_stream.str(); } #endif // !defined(_WIN32) && !defined(_WIN64) return GetUriFromMemoryAddress(memory, size); } std::string GetUriFromMemoryInMmapedFile(const void *memory, size_t size) { #if !defined(_WIN32) && !defined(_WIN64) std::ifstream proc_maps; proc_maps.open("/proc/self/maps", std::ifstream::in); if (!proc_maps.is_open() || !proc_maps.good()) { return GetUriFromMemoryAddress(memory, size); } std::string line; while (std::getline(proc_maps, line)) { std::stringstream tokens(line); uintptr_t low_address, high_address; char dash; tokens >> std::hex >> low_address >> std::dec >> dash >> std::hex >> high_address >> std::dec; if (dash != '-') { continue; } uintptr_t address = reinterpret_cast(memory); if (!(address >= low_address && (address + size) <= high_address)) { continue; } std::string permissions, device, uri_file_path; size_t offset; uint64_t inode; tokens >> permissions >> std::hex >> offset >> std::dec >> device >> inode >> uri_file_path; if (inode == 0 || uri_file_path.empty()) { return GetUriFromMemoryAddress(memory, size); } size_t uri_offset = offset + address - low_address; bool is_complete_file = false; if (uri_offset == 0) { std::ifstream uri_file(uri_file_path, std::ios::binary); if (uri_file) { uri_file.seekg(0, std::ios::end); is_complete_file = uri_file.tellg() == size; } } std::ostringstream uri_stream; uri_stream << EncodePathname(uri_file_path.c_str()); if (!is_complete_file) { uri_stream << "#offset=" << uri_offset; uri_stream << "&size=" << size; } return uri_stream.str(); } #endif // !defined(_WIN32) && !defined(_WIN64) return GetUriFromMemoryAddress(memory, size); } std::string GetUriFromFile(int file_descriptor, size_t offset, size_t size, bool is_complete_file, const void *memory) { #if !defined(_WIN32) && !defined(_WIN64) std::ostringstream proc_fd_path; proc_fd_path << "/proc/self/fd/" << file_descriptor; char uri_file_path[PATH_MAX]; memset(uri_file_path, 0, PATH_MAX); if (readlink(proc_fd_path.str().c_str(), uri_file_path, PATH_MAX) == -1) { return GetUriFromMemoryAddress(memory, size); } if (uri_file_path[0] == '\0') { return GetUriFromMemoryAddress(memory, size); } std::ostringstream uri_stream; uri_stream << EncodePathname(uri_file_path); if (!is_complete_file) { uri_stream << "#offset=" << offset; uri_stream << "&size=" << size; } return uri_stream.str(); #else return GetUriFromMemoryAddress(memory, size); #endif // !defined(_WIN32) && !defined(_WIN64) } } // namespace namespace rocr { namespace amd { namespace hsa { namespace loader { /// @brief Default destructor. CodeObjectReaderImpl::~CodeObjectReaderImpl() { if (is_mmap) { #if !defined(_WIN32) && !defined(_WIN64) uintptr_t address = reinterpret_cast(code_object_memory); uintptr_t adjusted_address = address & PAGE_SIZE_MASK; size_t adjusted_size = code_object_size + (address - adjusted_address); munmap(reinterpret_cast(adjusted_address), adjusted_size); #else delete [] code_object_memory; #endif // !defined(_WIN32) && !defined(_WIN64) } } hsa_status_t CodeObjectReaderImpl::SetFile( hsa_file_t _code_object_file_descriptor, size_t _code_object_offset, size_t _code_object_size) { assert(!code_object_memory && "Code object reader wrapper is already set"); if (_code_object_file_descriptor == -1) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } off_t file_size = __lseek__(_code_object_file_descriptor, 0, SEEK_END); if (file_size == (off_t)-1) { return HSA_STATUS_ERROR_INVALID_FILE; } if (file_size <= _code_object_offset) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (_code_object_size == 0) { _code_object_size = file_size - _code_object_offset; } bool is_complete_file = _code_object_offset == 0 && _code_object_size == file_size; #if !defined(_WIN32) && !defined(_WIN64) off_t adjusted_offset = _code_object_offset & PAGE_SIZE_MASK; size_t adjusted_size = _code_object_size + (_code_object_offset - adjusted_offset); void *memory = mmap(nullptr, adjusted_size, PROT_READ, MAP_PRIVATE, _code_object_file_descriptor, adjusted_offset); if (memory == (void *) -1) { return HSA_STATUS_ERROR_INVALID_FILE; } code_object_memory = reinterpret_cast(memory) + (_code_object_offset & ~PAGE_SIZE_MASK); code_object_size = _code_object_size; is_mmap = true; #else if (__lseek__(_code_object_file_descriptor, 0, SEEK_SET) == (off_t)-1) { return HSA_STATUS_ERROR_INVALID_FILE; } std::unique_ptr memory(new unsigned char[_code_object_size]); if (!memory) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } if (__read__(_code_object_file_descriptor, mmap_memory, _code_object_size) != _code_object_size) { return HSA_STATUS_ERROR_INVALID_FILE; } mmap_memory = memory.release(); mmap_size = _code_object_size; code_object_memory = memory; code_object_size = _code_object_size; #endif // !defined(_WIN32) && !defined(_WIN64) uri = GetUriFromFile(_code_object_file_descriptor, _code_object_offset, _code_object_size, is_complete_file, code_object_memory); return HSA_STATUS_SUCCESS; } hsa_status_t CodeObjectReaderImpl::SetMemory( const void *_code_object_memory, size_t _code_object_size) { assert(!code_object_memory && "Code object reader wrapper is already set"); if (!_code_object_memory || _code_object_size == 0) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } code_object_memory = _code_object_memory; code_object_size = _code_object_size; bool loader_enable_mmap_uri = core::Runtime::runtime_singleton_->flag().loader_enable_mmap_uri(); if (loader_enable_mmap_uri) { uri = GetUriFromMemoryInMmapedFile(_code_object_memory, _code_object_size); } else { uri = GetUriFromMemoryInExecutableFile(_code_object_memory, _code_object_size); } return HSA_STATUS_SUCCESS; } } // namespace loader } // namespace hsa } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_loader_context.cpp000066400000000000000000000431451420110115200243730ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_loader_context.hpp" #include #include #include #include "core/inc/amd_gpu_agent.h" #include "core/inc/amd_memory_region.h" #include "core/util/os.h" #include #include #include "core/inc/hsa_internal.h" #include "core/util/utils.h" #include "inc/hsa_ext_amd.h" #if defined(_WIN32) || defined(_WIN64) #include #else #include #endif namespace rocr { namespace { bool IsLocalRegion(const core::MemoryRegion *region) { const AMD::MemoryRegion *amd_region = (AMD::MemoryRegion*)region; if (nullptr == amd_region || !amd_region->IsLocalMemory()) { return false; } return true; } bool IsDebuggerRegistered() { return false; // Leaving code commented as it will be used later on //return ((core::Runtime::runtime_singleton_->flag().emulate_aql()) && // (0 != // core::Runtime::runtime_singleton_->flag().tools_lib_names().size())); } class SegmentMemory { public: virtual ~SegmentMemory() {} virtual void* Address(size_t offset = 0) const = 0; virtual void* HostAddress(size_t offset = 0) const = 0; virtual bool Allocated() const = 0; virtual bool Allocate(size_t size, size_t align, bool zero) = 0; virtual bool Copy(size_t offset, const void *src, size_t size) = 0; virtual void Free() = 0; virtual bool Freeze() = 0; protected: SegmentMemory() {} private: SegmentMemory(const SegmentMemory&); SegmentMemory& operator=(const SegmentMemory&); }; class MallocedMemory final: public SegmentMemory { public: MallocedMemory(): SegmentMemory(), ptr_(nullptr), size_(0) {} ~MallocedMemory() {} void* Address(size_t offset = 0) const override { assert(this->Allocated()); return (char*)ptr_ + offset; } void* HostAddress(size_t offset = 0) const override { return this->Address(offset); } bool Allocated() const override { return nullptr != ptr_; } bool Allocate(size_t size, size_t align, bool zero) override; bool Copy(size_t offset, const void *src, size_t size) override; void Free() override; bool Freeze() override; private: MallocedMemory(const MallocedMemory&); MallocedMemory& operator=(const MallocedMemory&); void *ptr_; size_t size_; }; bool MallocedMemory::Allocate(size_t size, size_t align, bool zero) { assert(!this->Allocated()); assert(0 < size); assert(0 < align && 0 == (align & (align - 1))); ptr_ = _aligned_malloc(size, align); if (nullptr == ptr_) { return false; } if (HSA_STATUS_SUCCESS != HSA::hsa_memory_register(ptr_, size)) { _aligned_free(ptr_); ptr_ = nullptr; return false; } if (zero) { memset(ptr_, 0x0, size); } size_ = size; return true; } bool MallocedMemory::Copy(size_t offset, const void *src, size_t size) { assert(this->Allocated()); assert(nullptr != src); assert(0 < size); memcpy(this->Address(offset), src, size); return true; } void MallocedMemory::Free() { assert(this->Allocated()); HSA::hsa_memory_deregister(ptr_, size_); _aligned_free(ptr_); ptr_ = nullptr; size_ = 0; } bool MallocedMemory::Freeze() { assert(this->Allocated()); return true; } class MappedMemory final: public SegmentMemory { public: MappedMemory(bool is_kv = false): SegmentMemory(), is_kv_(is_kv), ptr_(nullptr), size_(0) {} ~MappedMemory() {} void* Address(size_t offset = 0) const override { assert(this->Allocated()); return (char*)ptr_ + offset; } void* HostAddress(size_t offset = 0) const override { return this->Address(offset); } bool Allocated() const override { return nullptr != ptr_; } bool Allocate(size_t size, size_t align, bool zero) override; bool Copy(size_t offset, const void *src, size_t size) override; void Free() override; bool Freeze() override; private: MappedMemory(const MappedMemory&); MappedMemory& operator=(const MappedMemory&); bool is_kv_; void *ptr_; size_t size_; }; bool MappedMemory::Allocate(size_t size, size_t align, bool zero) { assert(!this->Allocated()); assert(0 < size); assert(0 < align && 0 == (align & (align - 1))); #if defined(_WIN32) || defined(_WIN64) ptr_ = (void*)VirtualAlloc(nullptr, size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); #else ptr_ = is_kv_ ? mmap(nullptr, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0) : mmap(nullptr, size, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_NORESERVE | MAP_PRIVATE, -1, 0); #endif // _WIN32 || _WIN64 if (nullptr == ptr_) { return false; } assert(0 == ((uintptr_t)ptr_) % align); if (HSA_STATUS_SUCCESS != HSA::hsa_memory_register(ptr_, size)) { #if defined(_WIN32) || defined(_WIN64) VirtualFree(ptr_, size, MEM_DECOMMIT); VirtualFree(ptr_, 0, MEM_RELEASE); #else munmap(ptr_, size); #endif // _WIN32 || _WIN64 ptr_ = nullptr; return false; } if (zero) { memset(ptr_, 0x0, size); } size_ = size; return true; } bool MappedMemory::Copy(size_t offset, const void *src, size_t size) { assert(this->Allocated()); assert(nullptr != src); assert(0 < size); memcpy(this->Address(offset), src, size); return true; } void MappedMemory::Free() { assert(this->Allocated()); HSA::hsa_memory_deregister(ptr_, size_); #if defined(_WIN32) || defined(_WIN64) VirtualFree(ptr_, size_, MEM_DECOMMIT); VirtualFree(ptr_, 0, MEM_RELEASE); #else munmap(ptr_, size_); #endif // _WIN32 || _WIN64 ptr_ = nullptr; size_ = 0; } bool MappedMemory::Freeze() { assert(this->Allocated()); return true; } class RegionMemory final: public SegmentMemory { public: static hsa_region_t AgentLocal(hsa_agent_t agent); static hsa_region_t System(); RegionMemory(hsa_region_t region): SegmentMemory(), region_(region), ptr_(nullptr), host_ptr_(nullptr), size_(0) {} ~RegionMemory() {} void* Address(size_t offset = 0) const override { assert(this->Allocated()); return (char*)ptr_ + offset; } void* HostAddress(size_t offset = 0) const override { assert(this->Allocated()); return (char*)host_ptr_ + offset; } bool Allocated() const override { return nullptr != ptr_; } bool Allocate(size_t size, size_t align, bool zero) override; bool Copy(size_t offset, const void *src, size_t size) override; void Free() override; bool Freeze() override; private: RegionMemory(const RegionMemory&); RegionMemory& operator=(const RegionMemory&); hsa_region_t region_; void *ptr_; void *host_ptr_; size_t size_; }; hsa_region_t RegionMemory::AgentLocal(hsa_agent_t agent) { hsa_region_t invalid_region; invalid_region.handle = 0; AMD::GpuAgent *amd_agent = (AMD::GpuAgent*)core::Agent::Convert(agent); if (nullptr == amd_agent) { return invalid_region; } auto agent_local_region = std::find_if(amd_agent->regions().begin(), amd_agent->regions().end(), IsLocalRegion); return agent_local_region == amd_agent->regions().end() ? invalid_region : core::MemoryRegion::Convert(*agent_local_region); } hsa_region_t RegionMemory::System() { const core::MemoryRegion* default_system_region = core::Runtime::runtime_singleton_->system_regions_fine()[0]; assert(default_system_region != NULL); return core::MemoryRegion::Convert(default_system_region); } bool RegionMemory::Allocate(size_t size, size_t align, bool zero) { assert(!this->Allocated()); assert(0 < size); assert(0 < align && 0 == (align & (align - 1))); if (HSA_STATUS_SUCCESS != HSA::hsa_memory_allocate(region_, size, &ptr_)) { ptr_ = nullptr; return false; } assert(0 == ((uintptr_t)ptr_) % align); if (HSA_STATUS_SUCCESS != HSA::hsa_memory_allocate(RegionMemory::System(), size, &host_ptr_)) { HSA::hsa_memory_free(ptr_); ptr_ = nullptr; host_ptr_ = nullptr; return false; } if (zero) { memset(host_ptr_, 0x0, size); } size_ = size; return true; } bool RegionMemory::Copy(size_t offset, const void *src, size_t size) { assert(this->Allocated() && nullptr != host_ptr_); assert(nullptr != src); assert(0 < size); memcpy((char*)host_ptr_ + offset, src, size); return true; } void RegionMemory::Free() { assert(this->Allocated()); HSA::hsa_memory_free(ptr_); if (nullptr != host_ptr_) { HSA::hsa_memory_free(host_ptr_); } ptr_ = nullptr; host_ptr_ = nullptr; size_ = 0; } bool RegionMemory::Freeze() { assert(this->Allocated() && nullptr != host_ptr_); core::Agent* agent = reinterpret_cast( core::MemoryRegion::Convert(region_))->owner(); if (agent != NULL && agent->device_type() == core::Agent::kAmdGpuDevice) { if (HSA_STATUS_SUCCESS != agent->DmaCopy(ptr_, host_ptr_, size_)) { return false; } } else { memcpy(ptr_, host_ptr_, size_); } return true; } } // namespace anonymous namespace amd { hsa_isa_t LoaderContext::IsaFromName(const char *name) { assert(name); hsa_status_t hsa_status = HSA_STATUS_SUCCESS; hsa_isa_t isa_handle; isa_handle.handle = 0; hsa_status = HSA::hsa_isa_from_name(name, &isa_handle); if (HSA_STATUS_SUCCESS != hsa_status) { isa_handle.handle = 0; return isa_handle; } return isa_handle; } bool LoaderContext::IsaSupportedByAgent(hsa_agent_t agent, hsa_isa_t code_object_isa) { std::pair comparison_data(code_object_isa, false); auto IsIsaEquivalent = [](hsa_isa_t agent_isa_h, void *data) { assert(data); std::pair *data_pair = reinterpret_cast(data); assert(data_pair); assert(data_pair->second != true); const core::Isa *agent_isa = core::Isa::Object(agent_isa_h); assert(agent_isa); const core::Isa *code_object_isa = core::Isa::Object(data_pair->first); assert(code_object_isa); data_pair->second = core::Isa::IsCompatible(*code_object_isa, *agent_isa); return data_pair->second ? HSA_STATUS_INFO_BREAK : HSA_STATUS_SUCCESS; }; hsa_status_t status = HSA::hsa_agent_iterate_isas(agent, IsIsaEquivalent, &comparison_data); if (status != HSA_STATUS_SUCCESS && status != HSA_STATUS_INFO_BREAK) { return false; } return comparison_data.second; } void* LoaderContext::SegmentAlloc(amdgpu_hsa_elf_segment_t segment, hsa_agent_t agent, size_t size, size_t align, bool zero) { assert(0 < size); assert(0 < align && 0 == (align & (align - 1))); SegmentMemory *mem = nullptr; switch (segment) { case AMDGPU_HSA_SEGMENT_GLOBAL_AGENT: case AMDGPU_HSA_SEGMENT_READONLY_AGENT: { hsa_profile_t agent_profile; if (HSA_STATUS_SUCCESS != HSA::hsa_agent_get_info(agent, HSA_AGENT_INFO_PROFILE, &agent_profile)) { return nullptr; } switch (agent_profile) { case HSA_PROFILE_BASE: mem = new (std::nothrow) RegionMemory(RegionMemory::AgentLocal(agent)); break; case HSA_PROFILE_FULL: mem = new (std::nothrow) RegionMemory(RegionMemory::System()); break; default: assert(false); } break; } case AMDGPU_HSA_SEGMENT_GLOBAL_PROGRAM: { mem = new (std::nothrow) RegionMemory(RegionMemory::System()); break; } case AMDGPU_HSA_SEGMENT_CODE_AGENT: { hsa_profile_t agent_profile; if (HSA_STATUS_SUCCESS != HSA::hsa_agent_get_info(agent, HSA_AGENT_INFO_PROFILE, &agent_profile)) { return nullptr; } switch (agent_profile) { case HSA_PROFILE_BASE: mem = new (std::nothrow) RegionMemory(IsDebuggerRegistered() ? RegionMemory::System() : RegionMemory::AgentLocal(agent)); break; case HSA_PROFILE_FULL: mem = new (std::nothrow) MappedMemory(((AMD::GpuAgentInt*)core::Agent::Convert(agent))->is_kv_device()); break; default: assert(false); } // Invalidate agent caches which may hold lines of the new allocation. ((AMD::GpuAgentInt*)core::Agent::Convert(agent))->InvalidateCodeCaches(); break; } default: assert(false); } if (nullptr == mem) { return nullptr; } if (!mem->Allocate(size, align, zero)) { delete mem; return nullptr; } return mem; } bool LoaderContext::SegmentCopy(amdgpu_hsa_elf_segment_t segment, // not used. hsa_agent_t agent, // not used. void* dst, size_t offset, const void* src, size_t size) { assert(nullptr != dst); return ((SegmentMemory*)dst)->Copy(offset, src, size); } void LoaderContext::SegmentFree(amdgpu_hsa_elf_segment_t segment, // not used. hsa_agent_t agent, // not used. void* seg, size_t size) // not used. { assert(nullptr != seg); SegmentMemory *mem = (SegmentMemory*)seg; mem->Free(); delete mem; mem = nullptr; } void* LoaderContext::SegmentAddress(amdgpu_hsa_elf_segment_t segment, // not used. hsa_agent_t agent, // not used. void* seg, size_t offset) { assert(nullptr != seg); return ((SegmentMemory*)seg)->Address(offset); } void* LoaderContext::SegmentHostAddress(amdgpu_hsa_elf_segment_t segment, // not used. hsa_agent_t agent, // not used. void* seg, size_t offset) { assert(nullptr != seg); return ((SegmentMemory*)seg)->HostAddress(offset); } bool LoaderContext::SegmentFreeze(amdgpu_hsa_elf_segment_t segment, // not used. hsa_agent_t agent, // not used. void* seg, size_t size) // not used. { assert(nullptr != seg); return ((SegmentMemory*)seg)->Freeze(); } bool LoaderContext::ImageExtensionSupported() { hsa_status_t hsa_status = HSA_STATUS_SUCCESS; bool result = false; hsa_status = HSA::hsa_system_extension_supported(HSA_EXTENSION_IMAGES, 1, 0, &result); if (HSA_STATUS_SUCCESS != hsa_status) { return false; } return result; } hsa_status_t LoaderContext::ImageCreate( hsa_agent_t agent, hsa_access_permission_t image_permission, const hsa_ext_image_descriptor_t *image_descriptor, const void *image_data, hsa_ext_image_t *image_handle) { assert(agent.handle); assert(image_descriptor); assert(image_data); assert(image_handle); assert(ImageExtensionSupported()); return hsa_ext_image_create(agent, image_descriptor, image_data, image_permission, image_handle); } hsa_status_t LoaderContext::ImageDestroy(hsa_agent_t agent, hsa_ext_image_t image_handle) { assert(agent.handle); assert(image_handle.handle); assert(ImageExtensionSupported()); return hsa_ext_image_destroy(agent, image_handle); } hsa_status_t LoaderContext::SamplerCreate( hsa_agent_t agent, const hsa_ext_sampler_descriptor_t *sampler_descriptor, hsa_ext_sampler_t *sampler_handle) { assert(agent.handle); assert(sampler_descriptor); assert(sampler_handle); assert(ImageExtensionSupported()); return hsa_ext_sampler_create(agent, sampler_descriptor, sampler_handle); } hsa_status_t LoaderContext::SamplerDestroy(hsa_agent_t agent, hsa_ext_sampler_t sampler_handle) { assert(agent.handle); assert(sampler_handle.handle); assert(ImageExtensionSupported()); return hsa_ext_sampler_destroy(agent, sampler_handle); } } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_memory_region.cpp000066400000000000000000000617671420110115200242460ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_memory_region.h" #include #include #include "core/inc/runtime.h" #include "core/inc/amd_cpu_agent.h" #include "core/inc/amd_gpu_agent.h" #include "core/util/utils.h" #include "core/inc/exceptions.h" namespace rocr { namespace AMD { // Tracks aggregate size of system memory available on platform size_t MemoryRegion::max_sysmem_alloc_size_ = 0; void* MemoryRegion::AllocateKfdMemory(const HsaMemFlags& flag, HSAuint32 node_id, size_t size) { void* ret = NULL; const HSAKMT_STATUS status = hsaKmtAllocMemory(node_id, size, flag, &ret); return (status == HSAKMT_STATUS_SUCCESS) ? ret : NULL; } void MemoryRegion::FreeKfdMemory(void* ptr, size_t size) { if (ptr == NULL || size == 0) { return; } HSAKMT_STATUS status = hsaKmtFreeMemory(ptr, size); assert(status == HSAKMT_STATUS_SUCCESS); } bool MemoryRegion::RegisterMemory(void* ptr, size_t size, const HsaMemFlags& MemFlags) { assert(ptr != NULL); assert(size != 0); const HSAKMT_STATUS status = hsaKmtRegisterMemoryWithFlags(ptr, size, MemFlags); return (status == HSAKMT_STATUS_SUCCESS); } void MemoryRegion::DeregisterMemory(void* ptr) { hsaKmtDeregisterMemory(ptr); } bool MemoryRegion::MakeKfdMemoryResident(size_t num_node, const uint32_t* nodes, const void* ptr, size_t size, uint64_t* alternate_va, HsaMemMapFlags map_flag) { assert(num_node > 0); assert(nodes != NULL); *alternate_va = 0; const HSAKMT_STATUS status = hsaKmtMapMemoryToGPUNodes( const_cast(ptr), size, alternate_va, map_flag, num_node, const_cast(nodes)); return (status == HSAKMT_STATUS_SUCCESS); } void MemoryRegion::MakeKfdMemoryUnresident(const void* ptr) { hsaKmtUnmapMemoryToGPU(const_cast(ptr)); } MemoryRegion::MemoryRegion(bool fine_grain, bool kernarg, bool full_profile, core::Agent* owner, const HsaMemoryProperties& mem_props) : core::MemoryRegion(fine_grain, kernarg, full_profile, owner), mem_props_(mem_props), max_single_alloc_size_(0), virtual_size_(0), fragment_allocator_(BlockAllocator(*this)) { virtual_size_ = GetPhysicalSize(); mem_flag_.Value = 0; map_flag_.Value = 0; static const HSAuint64 kGpuVmSize = (1ULL << 40); if (IsLocalMemory()) { mem_flag_.ui32.PageSize = HSA_PAGE_SIZE_4KB; mem_flag_.ui32.NoSubstitute = 1; mem_flag_.ui32.HostAccess = (mem_props_.HeapType == HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE) ? 0 : 1; mem_flag_.ui32.NonPaged = 1; virtual_size_ = kGpuVmSize; } else if (IsSystem()) { mem_flag_.ui32.PageSize = HSA_PAGE_SIZE_4KB; mem_flag_.ui32.NoSubstitute = 0; mem_flag_.ui32.HostAccess = 1; mem_flag_.ui32.CachePolicy = HSA_CACHING_CACHED; if (kernarg) mem_flag_.ui32.Uncached = 1; virtual_size_ = (full_profile) ? os::GetUserModeVirtualMemorySize() : kGpuVmSize; } // Bind if memory region is coarse or fine grain mem_flag_.ui32.CoarseGrain = (fine_grain) ? 0 : 1; // Adjust allocatable size per page align max_single_alloc_size_ = AlignDown(static_cast(GetPhysicalSize()), kPageSize_); // Keep track of total system memory available // @note: System memory is surfaced as both coarse // and fine grain memory regions. To track total system // memory only fine grain is considered as it avoids // double counting if (IsSystem() && (fine_grain)) { max_sysmem_alloc_size_ += max_single_alloc_size_; } assert(GetVirtualSize() != 0); assert(GetPhysicalSize() <= GetVirtualSize()); assert(IsMultipleOf(max_single_alloc_size_, kPageSize_)); } MemoryRegion::~MemoryRegion() {} hsa_status_t MemoryRegion::Allocate(size_t& size, AllocateFlags alloc_flags, void** address) const { ScopedAcquire lock(&owner()->agent_memory_lock_); return AllocateImpl(size, alloc_flags, address); } hsa_status_t MemoryRegion::AllocateImpl(size_t& size, AllocateFlags alloc_flags, void** address) const { if (address == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (!IsSystem() && !IsLocalMemory()) { return HSA_STATUS_ERROR_INVALID_ALLOCATION; } // Alocation requests for system memory considers aggregate // memory available on all CPU devices if (size > ((IsSystem() ? max_sysmem_alloc_size_ : max_single_alloc_size_))) { return HSA_STATUS_ERROR_INVALID_ALLOCATION; } size = AlignUp(size, kPageSize_); HsaMemFlags kmt_alloc_flags(mem_flag_); kmt_alloc_flags.ui32.ExecuteAccess = (alloc_flags & AllocateExecutable ? 1 : 0); kmt_alloc_flags.ui32.AQLQueueMemory = (alloc_flags & AllocateDoubleMap ? 1 : 0); if (IsSystem() && (alloc_flags & AllocateIPC)) kmt_alloc_flags.ui32.NonPaged = 1; // Only allow using the suballocator for ordinary VRAM. if (IsLocalMemory()) { bool subAllocEnabled = !core::Runtime::runtime_singleton_->flag().disable_fragment_alloc(); // Avoid modifying executable or queue allocations. bool useSubAlloc = subAllocEnabled; useSubAlloc &= ((alloc_flags & (~AllocateRestrict)) == 0); if (useSubAlloc) { *address = fragment_allocator_.alloc(size); return HSA_STATUS_SUCCESS; } } // Allocate memory. // If it fails attempt to release memory from the block allocator and retry. *address = AllocateKfdMemory(kmt_alloc_flags, owner()->node_id(), size); if (*address == nullptr) { owner()->Trim(); *address = AllocateKfdMemory(kmt_alloc_flags, owner()->node_id(), size); } if (*address != nullptr) { // Commit the memory. // For system memory, on non-restricted allocation, map it to all GPUs. On // restricted allocation, only CPU is allowed to access by default, so // no need to map // For local memory, only map it to the owning GPU. Mapping to other GPU, // if the access is allowed, is performed on AllowAccess. HsaMemMapFlags map_flag = map_flag_; size_t map_node_count = 1; const uint32_t owner_node_id = owner()->node_id(); const uint32_t* map_node_id = &owner_node_id; if (IsSystem()) { if ((alloc_flags & AllocateRestrict) == 0) { // Map to all GPU agents. map_node_count = core::Runtime::runtime_singleton_->gpu_ids().size(); if (map_node_count == 0) { // No need to pin since no GPU in the platform. return HSA_STATUS_SUCCESS; } map_node_id = &core::Runtime::runtime_singleton_->gpu_ids()[0]; } else { // No need to pin it for CPU exclusive access. return HSA_STATUS_SUCCESS; } } uint64_t alternate_va = 0; const bool is_resident = MakeKfdMemoryResident( map_node_count, map_node_id, *address, size, &alternate_va, map_flag); const bool require_pinning = (!full_profile() || IsLocalMemory() || IsScratch()); if (require_pinning && !is_resident) { FreeKfdMemory(*address, size); *address = NULL; return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } return HSA_STATUS_SUCCESS; } return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } hsa_status_t MemoryRegion::Free(void* address, size_t size) const { ScopedAcquire lock(&owner()->agent_memory_lock_); return FreeImpl(address, size); } hsa_status_t MemoryRegion::FreeImpl(void* address, size_t size) const { if (fragment_allocator_.free(address)) return HSA_STATUS_SUCCESS; MakeKfdMemoryUnresident(address); FreeKfdMemory(address, size); return HSA_STATUS_SUCCESS; } // TODO: Look into a better name and/or making this process transparent to exporting. hsa_status_t MemoryRegion::IPCFragmentExport(void* address) const { ScopedAcquire lock(&owner()->agent_memory_lock_); if (!fragment_allocator_.discardBlock(address)) return HSA_STATUS_ERROR_INVALID_ALLOCATION; return HSA_STATUS_SUCCESS; } hsa_status_t MemoryRegion::GetInfo(hsa_region_info_t attribute, void* value) const { switch (attribute) { case HSA_REGION_INFO_SEGMENT: switch (mem_props_.HeapType) { case HSA_HEAPTYPE_SYSTEM: case HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE: case HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC: *((hsa_region_segment_t*)value) = HSA_REGION_SEGMENT_GLOBAL; break; case HSA_HEAPTYPE_GPU_LDS: *((hsa_region_segment_t*)value) = HSA_REGION_SEGMENT_GROUP; break; default: assert(false && "Memory region should only be global, group"); break; } break; case HSA_REGION_INFO_GLOBAL_FLAGS: switch (mem_props_.HeapType) { case HSA_HEAPTYPE_SYSTEM: case HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC: case HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE: { uint32_t ret = fine_grain() ? HSA_REGION_GLOBAL_FLAG_FINE_GRAINED : HSA_REGION_GLOBAL_FLAG_COARSE_GRAINED; if (kernarg()) ret |= HSA_REGION_GLOBAL_FLAG_KERNARG; *((uint32_t*)value) = ret; break; } default: *((uint32_t*)value) = 0; break; } break; case HSA_REGION_INFO_SIZE: *((size_t*)value) = static_cast(GetPhysicalSize()); break; case HSA_REGION_INFO_ALLOC_MAX_SIZE: switch (mem_props_.HeapType) { case HSA_HEAPTYPE_SYSTEM: *((size_t*)value) = max_sysmem_alloc_size_; break; case HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE: case HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC: case HSA_HEAPTYPE_GPU_SCRATCH: *((size_t*)value) = max_single_alloc_size_; break; default: *((size_t*)value) = 0; } break; case HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED: switch (mem_props_.HeapType) { case HSA_HEAPTYPE_SYSTEM: case HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE: case HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC: *((bool*)value) = true; break; default: *((bool*)value) = false; break; } break; case HSA_REGION_INFO_RUNTIME_ALLOC_GRANULE: switch (mem_props_.HeapType) { case HSA_HEAPTYPE_SYSTEM: case HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE: case HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC: *((size_t*)value) = kPageSize_; break; default: *((size_t*)value) = 0; break; } break; case HSA_REGION_INFO_RUNTIME_ALLOC_ALIGNMENT: switch (mem_props_.HeapType) { case HSA_HEAPTYPE_SYSTEM: case HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE: case HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC: *((size_t*)value) = kPageSize_; break; default: *((size_t*)value) = 0; break; } break; default: switch ((hsa_amd_region_info_t)attribute) { case HSA_AMD_REGION_INFO_HOST_ACCESSIBLE: *((bool*)value) = (mem_props_.HeapType == HSA_HEAPTYPE_SYSTEM) ? true : false; break; case HSA_AMD_REGION_INFO_BASE: *((void**)value) = reinterpret_cast(GetBaseAddress()); break; case HSA_AMD_REGION_INFO_BUS_WIDTH: *((uint32_t*)value) = BusWidth(); break; case HSA_AMD_REGION_INFO_MAX_CLOCK_FREQUENCY: *((uint32_t*)value) = MaxMemCloc(); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; break; } break; } return HSA_STATUS_SUCCESS; } hsa_status_t MemoryRegion::GetPoolInfo(hsa_amd_memory_pool_info_t attribute, void* value) const { switch (attribute) { case HSA_AMD_MEMORY_POOL_INFO_SEGMENT: case HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS: case HSA_AMD_MEMORY_POOL_INFO_SIZE: case HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED: case HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_GRANULE: case HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALIGNMENT: return GetInfo(static_cast(attribute), value); case HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL: *((bool*)value) = IsSystem() ? true : false; break; case HSA_AMD_MEMORY_POOL_INFO_ALLOC_MAX_SIZE: switch (mem_props_.HeapType) { case HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE: case HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC: case HSA_HEAPTYPE_GPU_SCRATCH: return GetInfo(HSA_REGION_INFO_ALLOC_MAX_SIZE, value); case HSA_HEAPTYPE_SYSTEM: // Aggregate size available for allocation *((size_t*)value) = max_sysmem_alloc_size_; break; default: *((size_t*)value) = 0; } break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; } hsa_amd_memory_pool_access_t MemoryRegion::GetAccessInfo( const core::Agent& agent, const core::Runtime::LinkInfo& link_info) const { // Return allowed by default if memory pool is owned by requesting device if (agent.public_handle().handle == owner()->public_handle().handle) { return HSA_AMD_MEMORY_POOL_ACCESS_ALLOWED_BY_DEFAULT; } // Requesting device does not have a link if (link_info.num_hop < 1) { return HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED; } // Determine access to fine and coarse grained system memory // Return allowed by default if requesting device is a CPU // Return disallowed by default if requesting device is not a CPU if (IsSystem()) { return (agent.device_type() == core::Agent::kAmdCpuDevice) ? (HSA_AMD_MEMORY_POOL_ACCESS_ALLOWED_BY_DEFAULT) : (HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT); } // Determine access type for device local memory which is // guaranteed to be HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC // Return disallowed by default if framebuffer is coarse grained // without regard to type of requesting device (CPU / GPU) // Return disallowed by default if framebuffer is fine grained // and requesting device is connected via xGMI link // Return never allowed if framebuffer is fine grained and // requesting device is connected via PCIe link if (IsLocalMemory()) { // Return disallowed by default if memory is coarse // grained without regard to link type if (fine_grain() == false) { return HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT; } // Determine if pool is pseudo fine-grained due to env flag // Return disallowed by default if (core::Runtime::runtime_singleton_->flag().fine_grain_pcie()) { return HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT; } // Return disallowed by default if memory is fine // grained and link type is xGMI. if (agent.HiveId() == owner()->HiveId()) { return HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT; } // Return never allowed if memory is fine grained // link type is not xGMI i.e. link is PCIe return HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED; } // Return never allowed if above conditions are not satisified // This can happen when memory pool references neither system // or device local memory return HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED; } hsa_status_t MemoryRegion::GetAgentPoolInfo( const core::Agent& agent, hsa_amd_agent_memory_pool_info_t attribute, void* value) const { const uint32_t node_id_from = agent.node_id(); const uint32_t node_id_to = owner()->node_id(); const core::Runtime::LinkInfo link_info = core::Runtime::runtime_singleton_->GetLinkInfo(node_id_from, node_id_to); const hsa_amd_memory_pool_access_t access_type = GetAccessInfo(agent, link_info); switch (attribute) { case HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS: *((hsa_amd_memory_pool_access_t*)value) = access_type; break; case HSA_AMD_AGENT_MEMORY_POOL_INFO_NUM_LINK_HOPS: *((uint32_t*)value) = (access_type != HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED) ? link_info.num_hop : 0; break; case HSA_AMD_AGENT_MEMORY_POOL_INFO_LINK_INFO: memset(value, 0, sizeof(hsa_amd_memory_pool_link_info_t)); if ((access_type != HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED) && (link_info.num_hop > 0)) { memcpy(value, &link_info.info, sizeof(hsa_amd_memory_pool_link_info_t)); } break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; } hsa_status_t MemoryRegion::AllowAccess(uint32_t num_agents, const hsa_agent_t* agents, const void* ptr, size_t size) const { if (num_agents == 0 || agents == NULL || ptr == NULL || size == 0) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (!IsSystem() && !IsLocalMemory()) { return HSA_STATUS_ERROR; } // Adjust for fragments. Make accessibility sticky for fragments since this will satisfy the // union of accessible agents between the fragments in the block. hsa_amd_pointer_info_t info; uint32_t agent_count = 0; hsa_agent_t* accessible = nullptr; MAKE_SCOPE_GUARD([&]() { free(accessible); }); core::Runtime::PtrInfoBlockData blockInfo; std::vector union_agents; info.size = sizeof(info); ScopedAcquire lock(&access_lock_); if (core::Runtime::runtime_singleton_->PtrInfo(const_cast(ptr), &info, malloc, &agent_count, &accessible, &blockInfo) == HSA_STATUS_SUCCESS) { if (blockInfo.length != size || info.sizeInBytes != size) { for (int i = 0; i < num_agents; i++) union_agents.push_back(agents[i].handle); for (int i = 0; i < agent_count; i++) union_agents.push_back(accessible[i].handle); std::sort(union_agents.begin(), union_agents.end()); const auto& last = std::unique(union_agents.begin(), union_agents.end()); union_agents.erase(last, union_agents.end()); agents = reinterpret_cast(&union_agents[0]); num_agents = union_agents.size(); size = blockInfo.length; ptr = blockInfo.base; } } bool cpu_in_list = false; std::set whitelist_gpus; std::vector whitelist_nodes; for (uint32_t i = 0; i < num_agents; ++i) { core::Agent* agent = core::Agent::Convert(agents[i]); if (agent == NULL || !agent->IsValid()) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (agent->device_type() == core::Agent::kAmdGpuDevice) { whitelist_nodes.push_back(agent->node_id()); whitelist_gpus.insert(reinterpret_cast(agent)); } else { cpu_in_list = true; } } if (whitelist_nodes.size() == 0 && IsSystem()) { assert(cpu_in_list); // This is a system region and only CPU agents in the whitelist. // Remove old mappings. AMD::MemoryRegion::MakeKfdMemoryUnresident(ptr); return HSA_STATUS_SUCCESS; } // If this is a local memory region, the owning gpu always needs to be in // the whitelist. if (IsLocalMemory() && std::find(whitelist_nodes.begin(), whitelist_nodes.end(), owner()->node_id()) == whitelist_nodes.end()) { whitelist_nodes.push_back(owner()->node_id()); whitelist_gpus.insert(reinterpret_cast(owner())); } HsaMemMapFlags map_flag = map_flag_; map_flag.ui32.HostAccess |= (cpu_in_list) ? 1 : 0; { // Sequence with pointer info since queries to other fragments of the block may be adjusted by // this call. ScopedAcquire lock( core::Runtime::runtime_singleton_->memory_lock_.shared()); uint64_t alternate_va = 0; if (!AMD::MemoryRegion::MakeKfdMemoryResident( whitelist_nodes.size(), &whitelist_nodes[0], ptr, size, &alternate_va, map_flag)) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } } return HSA_STATUS_SUCCESS; } hsa_status_t MemoryRegion::CanMigrate(const MemoryRegion& dst, bool& result) const { // TODO: not implemented yet. result = false; return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } hsa_status_t MemoryRegion::Migrate(uint32_t flag, const void* ptr) const { // TODO: not implemented yet. return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } hsa_status_t MemoryRegion::Lock(uint32_t num_agents, const hsa_agent_t* agents, void* host_ptr, size_t size, void** agent_ptr) const { if (!IsSystem()) { return HSA_STATUS_ERROR; } if (full_profile()) { // For APU, any host pointer is always accessible by the gpu. *agent_ptr = host_ptr; return HSA_STATUS_SUCCESS; } std::set whitelist_gpus; std::vector whitelist_nodes; if (num_agents == 0 || agents == NULL) { // Map to all GPU agents. whitelist_nodes = core::Runtime::runtime_singleton_->gpu_ids(); whitelist_gpus.insert( core::Runtime::runtime_singleton_->gpu_agents().begin(), core::Runtime::runtime_singleton_->gpu_agents().end()); } else { for (uint32_t i = 0; i < num_agents; ++i) { core::Agent* agent = core::Agent::Convert(agents[i]); if (agent == NULL || !agent->IsValid()) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (agent->device_type() == core::Agent::kAmdGpuDevice) { whitelist_nodes.push_back(agent->node_id()); whitelist_gpus.insert(agent); } } } if (whitelist_nodes.size() == 0) { // No GPU agents in the whitelist. So no need to register and map since the // platform only has CPUs. *agent_ptr = host_ptr; return HSA_STATUS_SUCCESS; } // Call kernel driver to register and pin the memory. if (RegisterMemory(host_ptr, size, mem_flag_)) { uint64_t alternate_va = 0; if (MakeKfdMemoryResident(whitelist_nodes.size(), &whitelist_nodes[0], host_ptr, size, &alternate_va, map_flag_)) { if (alternate_va != 0) { *agent_ptr = reinterpret_cast(alternate_va); } else { *agent_ptr = host_ptr; } return HSA_STATUS_SUCCESS; } AMD::MemoryRegion::DeregisterMemory(host_ptr); return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } return HSA_STATUS_ERROR; } hsa_status_t MemoryRegion::Unlock(void* host_ptr) const { if (!IsSystem()) { return HSA_STATUS_ERROR; } if (full_profile()) { return HSA_STATUS_SUCCESS; } MakeKfdMemoryUnresident(host_ptr); DeregisterMemory(host_ptr); return HSA_STATUS_SUCCESS; } hsa_status_t MemoryRegion::AssignAgent(void* ptr, size_t size, const core::Agent& agent, hsa_access_permission_t access) const { return HSA_STATUS_SUCCESS; } void MemoryRegion::Trim() const { fragment_allocator_.trim(); } void* MemoryRegion::BlockAllocator::alloc(size_t request_size, size_t& allocated_size) const { void* ret; size_t bsize = AlignUp(request_size, block_size()); hsa_status_t err = region_.AllocateImpl( bsize, core::MemoryRegion::AllocateRestrict | core::MemoryRegion::AllocateDirect, &ret); if (err != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(err, "MemoryRegion::BlockAllocator::alloc failed."); assert(ret != nullptr && "Region returned nullptr on success."); allocated_size = bsize; return ret; } } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/amd_topology.cpp000066400000000000000000000331121420110115200232260ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_topology.h" #include "core/inc/amd_filter_device.h" #include #include #include #include #include #include #include #ifndef NDBEUG #include #endif #include "hsakmt.h" #include "core/inc/runtime.h" #include "core/inc/amd_cpu_agent.h" #include "core/inc/amd_gpu_agent.h" #include "core/inc/amd_memory_region.h" #include "core/util/utils.h" extern r_debug _amdgpu_r_debug; namespace rocr { namespace AMD { // Minimum acceptable KFD version numbers static const uint kKfdVersionMajor = 0; static const uint kKfdVersionMinor = 99; // Query for user preference and use that to determine Xnack mode of ROCm system. // Return true if Xnack mode is ON or false if OFF. Xnack mode of a system is // orthogonal to devices that do not support Xnack mode. It is legal for a // system with Xnack ON to have devices that do not support Xnack functionality. bool BindXnackMode() { // Get users' preference for Xnack mode of ROCm platform HSAint32 mode; mode = core::Runtime::runtime_singleton_->flag().xnack(); bool config_xnack = (core::Runtime::runtime_singleton_->flag().xnack() != Flag::XNACK_REQUEST::XNACK_UNCHANGED); // Indicate to driver users' preference for Xnack mode // Call to driver can fail and is a supported feature HSAKMT_STATUS status = HSAKMT_STATUS_ERROR; if (config_xnack) { status = hsaKmtSetXNACKMode(mode); if (status == HSAKMT_STATUS_SUCCESS) { return mode; } } // Get Xnack mode of devices bound by driver. This could happen // when a call to SET Xnack mode fails or user has no particular // preference status = hsaKmtGetXNACKMode((HSAint32*)&mode); if(status != HSAKMT_STATUS_SUCCESS) { debug_print("KFD does not support xnack mode query.\nROCr must assume xnack is disabled.\n"); return false; } return mode; } CpuAgent* DiscoverCpu(HSAuint32 node_id, HsaNodeProperties& node_prop) { if (node_prop.NumCPUCores == 0) { return nullptr; } CpuAgent* cpu = new CpuAgent(node_id, node_prop); core::Runtime::runtime_singleton_->RegisterAgent(cpu); return cpu; } GpuAgent* DiscoverGpu(HSAuint32 node_id, HsaNodeProperties& node_prop, bool xnack_mode) { GpuAgent* gpu = nullptr; if (node_prop.NumFComputeCores == 0) { // Ignore non GPUs. return nullptr; } try { gpu = new GpuAgent(node_id, node_prop, xnack_mode, core::Runtime::runtime_singleton_->gpu_agents().size()); const HsaVersionInfo& kfd_version = core::Runtime::runtime_singleton_->KfdVersion().version; // Check for sramecc incompatibility due to sramecc not being reported correctly in kfd before // 1.4. if (gpu->isa()->IsSrameccSupported() && (kfd_version.KernelInterfaceMajorVersion <= 1 && kfd_version.KernelInterfaceMinorVersion < 4)) { // gfx906 has both sramecc modes in use. Suppress the device. if ((gpu->isa()->GetProcessorName() == "gfx906") && core::Runtime::runtime_singleton_->flag().check_sramecc_validity()) { char name[64]; gpu->GetInfo((hsa_agent_info_t)HSA_AMD_AGENT_INFO_PRODUCT_NAME, name); name[63] = '\0'; fprintf(stderr, "HSA Error: Incompatible kernel and userspace, %s disabled. Upgrade amdgpu.\n", name); delete gpu; return nullptr; } // gfx908 always has sramecc set to on in vbios. Set mode bit to on and recreate the device. if (gpu->isa()->GetProcessorName() == "gfx908") { node_prop.Capability.ui32.SRAM_EDCSupport = 1; delete gpu; gpu = new GpuAgent(node_id, node_prop, xnack_mode, core::Runtime::runtime_singleton_->gpu_agents().size()); } } } catch (const hsa_exception& e) { if(e.error_code() == HSA_STATUS_ERROR_INVALID_ISA) { ifdebug { if (!strIsEmpty(e.what())) debug_print("Warning: %s\n", e.what()); } // Ignore unrecognized GPUs. return nullptr; } else { // Rethrow remaining exceptions. throw; } } core::Runtime::runtime_singleton_->RegisterAgent(gpu); return gpu; } void RegisterLinkInfo(uint32_t node_id, uint32_t num_link) { // Register connectivity links for this agent to the runtime. if (num_link == 0) { return; } std::vector links(num_link); if (HSAKMT_STATUS_SUCCESS != hsaKmtGetNodeIoLinkProperties(node_id, num_link, &links[0])) { return; } for (HsaIoLinkProperties io_link : links) { // Populate link info with thunk property. hsa_amd_memory_pool_link_info_t link_info = {0}; switch (io_link.IoLinkType) { case HSA_IOLINKTYPE_HYPERTRANSPORT: link_info.link_type = HSA_AMD_LINK_INFO_TYPE_HYPERTRANSPORT; link_info.atomic_support_32bit = true; link_info.atomic_support_64bit = true; link_info.coherent_support = true; break; case HSA_IOLINKTYPE_PCIEXPRESS: link_info.link_type = HSA_AMD_LINK_INFO_TYPE_PCIE; link_info.atomic_support_32bit = true; link_info.atomic_support_64bit = true; link_info.coherent_support = true; break; case HSA_IOLINK_TYPE_QPI_1_1: link_info.link_type = HSA_AMD_LINK_INFO_TYPE_QPI; link_info.atomic_support_32bit = true; link_info.atomic_support_64bit = true; link_info.coherent_support = true; break; case HSA_IOLINK_TYPE_INFINIBAND: link_info.link_type = HSA_AMD_LINK_INFO_TYPE_INFINBAND; debug_print("IOLINK is missing atomic and coherency defaults.\n"); break; case HSA_IOLINK_TYPE_XGMI: link_info.link_type = HSA_AMD_LINK_INFO_TYPE_XGMI; link_info.atomic_support_32bit = true; link_info.atomic_support_64bit = true; link_info.coherent_support = true; break; default: debug_print("Unrecognized IOLINK type.\n"); break; } // KFD is reporting wrong override status for XGMI. Disallow override for bringup. if (io_link.Flags.ui32.Override == 1) { if (io_link.Flags.ui32.NoPeerToPeerDMA == 1) { // Ignore this link since peer to peer is not allowed. continue; } link_info.atomic_support_32bit = (io_link.Flags.ui32.NoAtomics32bit == 0); link_info.atomic_support_64bit = (io_link.Flags.ui32.NoAtomics64bit == 0); link_info.coherent_support = (io_link.Flags.ui32.NonCoherent == 0); } link_info.max_bandwidth = io_link.MaximumBandwidth; link_info.max_latency = io_link.MaximumLatency; link_info.min_bandwidth = io_link.MinimumBandwidth; link_info.min_latency = io_link.MinimumLatency; link_info.numa_distance = io_link.Weight; core::Runtime::runtime_singleton_->RegisterLinkInfo( io_link.NodeFrom, io_link.NodeTo, io_link.Weight, link_info); } } /** * Process the list of Gpus that are surfaced to user */ static void SurfaceGpuList(std::vector& gpu_list, bool xnack_mode) { // Process user visible Gpu devices int32_t invalidIdx = -1; int32_t list_sz = gpu_list.size(); HsaNodeProperties node_prop = {0}; for (int32_t idx = 0; idx < list_sz; idx++) { if (gpu_list[idx] == invalidIdx) { break; } // Obtain properties of the node HSAKMT_STATUS err_val = hsaKmtGetNodeProperties(gpu_list[idx], &node_prop); assert(err_val == HSAKMT_STATUS_SUCCESS && "Error in getting Node Properties"); // Instantiate a Gpu device. The IO links // of this node have already been registered assert((node_prop.NumFComputeCores != 0) && "Improper node used for GPU device discovery."); DiscoverGpu(gpu_list[idx], node_prop, xnack_mode); } } /// @brief Calls Kfd thunk to get the snapshot of the topology of the system, /// which includes associations between, node, devices, memory and caches. void BuildTopology() { HsaVersionInfo kfd_version; if (hsaKmtGetVersion(&kfd_version) != HSAKMT_STATUS_SUCCESS) { return; } if (kfd_version.KernelInterfaceMajorVersion == kKfdVersionMajor && kfd_version.KernelInterfaceMinorVersion < kKfdVersionMinor) { return; } // Disable KFD event support when using open source KFD if (kfd_version.KernelInterfaceMajorVersion == 1 && kfd_version.KernelInterfaceMinorVersion == 0) { core::g_use_interrupt_wait = false; } core::Runtime::runtime_singleton_->KfdVersion(kfd_version); HsaSystemProperties props; hsaKmtReleaseSystemProperties(); if (hsaKmtAcquireSystemProperties(&props) != HSAKMT_STATUS_SUCCESS) { return; } core::Runtime::runtime_singleton_->SetLinkCount(props.NumNodes); // Query if env ROCR_VISIBLE_DEVICES is defined. If defined // determine number and order of GPU devices to be surfaced RvdFilter rvdFilter; int32_t invalidIdx = -1; uint32_t visibleCnt = 0; std::vector gpu_usr_list; bool filter = RvdFilter::FilterDevices(); if (filter) { rvdFilter.BuildRvdTokenList(); rvdFilter.BuildDeviceUuidList(props.NumNodes); visibleCnt = rvdFilter.BuildUsrDeviceList(); for (int32_t idx = 0; idx < visibleCnt; idx++) { gpu_usr_list.push_back(invalidIdx); } } // Discover agents on every node in the platform. int32_t kfdIdx = 0; for (HSAuint32 node_id = 0; node_id < props.NumNodes; node_id++) { HsaNodeProperties node_prop = {0}; if (hsaKmtGetNodeProperties(node_id, &node_prop) != HSAKMT_STATUS_SUCCESS) { continue; } // Instantiate a Cpu device const CpuAgent* cpu = DiscoverCpu(node_id, node_prop); assert(((node_prop.NumCPUCores == 0) || (cpu != nullptr)) && "CPU device failed discovery."); // Current node is either a dGpu or Apu and might belong // to user visible list. Process node if present in usr // visible list, continue if not found if (node_prop.NumFComputeCores != 0) { if (filter) { int32_t devRank = rvdFilter.GetUsrDeviceRank(kfdIdx); if (devRank != (-1)) { gpu_usr_list[devRank] = node_id; } } else { gpu_usr_list.push_back(node_id); } kfdIdx++; } // Register IO links of node without regard to // it being visible to user or not. It is not // possible to access links of nodes that are // not visible RegisterLinkInfo(node_id, node_prop.NumIOLinks); } // Determine the Xnack mode to be bound for system bool xnack_mode = BindXnackMode(); // Instantiate ROCr objects to encapsulate Gpu devices SurfaceGpuList(gpu_usr_list, xnack_mode); // Parse HSA_CU_MASK with GPU and CU count limits. uint32_t maxGpu = core::Runtime::runtime_singleton_->gpu_agents().size(); uint32_t maxCu = 0; uint32_t cus; for (auto& gpu : core::Runtime::runtime_singleton_->gpu_agents()) { gpu->GetInfo((hsa_agent_info_t)HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, &cus); maxCu = Max(maxCu, cus); } const_cast(core::Runtime::runtime_singleton_->flag()).parse_masks(maxGpu, maxCu); } bool Load() { // Open connection to kernel driver. if (hsaKmtOpenKFD() != HSAKMT_STATUS_SUCCESS) { return false; } MAKE_NAMED_SCOPE_GUARD(kfd, [&]() { hsaKmtCloseKFD(); }); // Register runtime and optionally enable the debugger HSAKMT_STATUS err = hsaKmtRuntimeEnable(&_amdgpu_r_debug, core::Runtime::runtime_singleton_->flag().debug()); if ((err != HSAKMT_STATUS_SUCCESS) && (err != HSAKMT_STATUS_NOT_SUPPORTED)) return false; core::Runtime::runtime_singleton_->KfdVersion(err != HSAKMT_STATUS_NOT_SUPPORTED); // Build topology table. BuildTopology(); kfd.Dismiss(); return true; } bool Unload() { hsaKmtRuntimeDisable(); hsaKmtReleaseSystemProperties(); // Close connection to kernel driver. hsaKmtCloseKFD(); return true; } } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/cache.cpp000066400000000000000000000051521420110115200215770ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/cache.h" #include "assert.h" namespace rocr { namespace core { hsa_status_t Cache::GetInfo(hsa_cache_info_t attribute, void* value) { switch (attribute) { case HSA_CACHE_INFO_NAME_LENGTH: *(uint32_t*)value = name_.size(); break; case HSA_CACHE_INFO_NAME: *(const char**)value = name_.c_str(); break; case HSA_CACHE_INFO_LEVEL: *(uint8_t*)value = level_; break; case HSA_CACHE_INFO_SIZE: *(uint32_t*)value = size_; break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/default_signal.cpp000066400000000000000000000244041420110115200235160ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/default_signal.h" #include "core/util/timer.h" namespace rocr { namespace core { int DefaultSignal::rtti_id_ = 0; int BusyWaitSignal::rtti_id_ = 0; BusyWaitSignal::BusyWaitSignal(SharedSignal* abi_block, bool enableIPC) : Signal(abi_block, enableIPC) { signal_.kind = AMD_SIGNAL_KIND_USER; signal_.event_mailbox_ptr = NULL; } hsa_signal_value_t BusyWaitSignal::LoadRelaxed() { return hsa_signal_value_t( atomic::Load(&signal_.value, std::memory_order_relaxed)); } hsa_signal_value_t BusyWaitSignal::LoadAcquire() { return hsa_signal_value_t( atomic::Load(&signal_.value, std::memory_order_acquire)); } void BusyWaitSignal::StoreRelaxed(hsa_signal_value_t value) { atomic::Store(&signal_.value, int64_t(value), std::memory_order_relaxed); } void BusyWaitSignal::StoreRelease(hsa_signal_value_t value) { atomic::Store(&signal_.value, int64_t(value), std::memory_order_release); } hsa_signal_value_t BusyWaitSignal::WaitRelaxed(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint) { Retain(); MAKE_SCOPE_GUARD([&]() { Release(); }); waiting_++; MAKE_SCOPE_GUARD([&]() { waiting_--; }); bool condition_met = false; int64_t value; debug_warning_n((!g_use_interrupt_wait || isIPC()) && "Use of non-host signal in host signal wait API.", 10); timer::fast_clock::time_point start_time, time; start_time = timer::fast_clock::now(); // Set a polling timeout value // Should be a few times bigger than null kernel latency const timer::fast_clock::duration kMaxElapsed = std::chrono::microseconds(200); uint64_t hsa_freq; HSA::hsa_system_get_info(HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY, &hsa_freq); const timer::fast_clock::duration fast_timeout = timer::duration_from_seconds( double(timeout) / double(hsa_freq)); while (true) { if (!IsValid()) return 0; value = atomic::Load(&signal_.value, std::memory_order_relaxed); switch (condition) { case HSA_SIGNAL_CONDITION_EQ: { condition_met = (value == compare_value); break; } case HSA_SIGNAL_CONDITION_NE: { condition_met = (value != compare_value); break; } case HSA_SIGNAL_CONDITION_GTE: { condition_met = (value >= compare_value); break; } case HSA_SIGNAL_CONDITION_LT: { condition_met = (value < compare_value); break; } default: return 0; } if (condition_met) return hsa_signal_value_t(value); time = timer::fast_clock::now(); if (time - start_time > fast_timeout) { value = atomic::Load(&signal_.value, std::memory_order_relaxed); return hsa_signal_value_t(value); } if (time - start_time > kMaxElapsed) { os::uSleep(20); } } } hsa_signal_value_t BusyWaitSignal::WaitAcquire(hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint) { hsa_signal_value_t ret = WaitRelaxed(condition, compare_value, timeout, wait_hint); std::atomic_thread_fence(std::memory_order_acquire); return ret; } void BusyWaitSignal::AndRelaxed(hsa_signal_value_t value) { atomic::And(&signal_.value, int64_t(value), std::memory_order_relaxed); } void BusyWaitSignal::AndAcquire(hsa_signal_value_t value) { atomic::And(&signal_.value, int64_t(value), std::memory_order_acquire); } void BusyWaitSignal::AndRelease(hsa_signal_value_t value) { atomic::And(&signal_.value, int64_t(value), std::memory_order_release); } void BusyWaitSignal::AndAcqRel(hsa_signal_value_t value) { atomic::And(&signal_.value, int64_t(value), std::memory_order_acq_rel); } void BusyWaitSignal::OrRelaxed(hsa_signal_value_t value) { atomic::Or(&signal_.value, int64_t(value), std::memory_order_relaxed); } void BusyWaitSignal::OrAcquire(hsa_signal_value_t value) { atomic::Or(&signal_.value, int64_t(value), std::memory_order_acquire); } void BusyWaitSignal::OrRelease(hsa_signal_value_t value) { atomic::Or(&signal_.value, int64_t(value), std::memory_order_release); } void BusyWaitSignal::OrAcqRel(hsa_signal_value_t value) { atomic::Or(&signal_.value, int64_t(value), std::memory_order_acq_rel); } void BusyWaitSignal::XorRelaxed(hsa_signal_value_t value) { atomic::Xor(&signal_.value, int64_t(value), std::memory_order_relaxed); } void BusyWaitSignal::XorAcquire(hsa_signal_value_t value) { atomic::Xor(&signal_.value, int64_t(value), std::memory_order_acquire); } void BusyWaitSignal::XorRelease(hsa_signal_value_t value) { atomic::Xor(&signal_.value, int64_t(value), std::memory_order_release); } void BusyWaitSignal::XorAcqRel(hsa_signal_value_t value) { atomic::Xor(&signal_.value, int64_t(value), std::memory_order_acq_rel); } void BusyWaitSignal::AddRelaxed(hsa_signal_value_t value) { atomic::Add(&signal_.value, int64_t(value), std::memory_order_relaxed); } void BusyWaitSignal::AddAcquire(hsa_signal_value_t value) { atomic::Add(&signal_.value, int64_t(value), std::memory_order_acquire); } void BusyWaitSignal::AddRelease(hsa_signal_value_t value) { atomic::Add(&signal_.value, int64_t(value), std::memory_order_release); } void BusyWaitSignal::AddAcqRel(hsa_signal_value_t value) { atomic::Add(&signal_.value, int64_t(value), std::memory_order_acq_rel); } void BusyWaitSignal::SubRelaxed(hsa_signal_value_t value) { atomic::Sub(&signal_.value, int64_t(value), std::memory_order_relaxed); } void BusyWaitSignal::SubAcquire(hsa_signal_value_t value) { atomic::Sub(&signal_.value, int64_t(value), std::memory_order_acquire); } void BusyWaitSignal::SubRelease(hsa_signal_value_t value) { atomic::Sub(&signal_.value, int64_t(value), std::memory_order_release); } void BusyWaitSignal::SubAcqRel(hsa_signal_value_t value) { atomic::Sub(&signal_.value, int64_t(value), std::memory_order_acq_rel); } hsa_signal_value_t BusyWaitSignal::ExchRelaxed(hsa_signal_value_t value) { return hsa_signal_value_t(atomic::Exchange(&signal_.value, int64_t(value), std::memory_order_relaxed)); } hsa_signal_value_t BusyWaitSignal::ExchAcquire(hsa_signal_value_t value) { return hsa_signal_value_t(atomic::Exchange(&signal_.value, int64_t(value), std::memory_order_acquire)); } hsa_signal_value_t BusyWaitSignal::ExchRelease(hsa_signal_value_t value) { return hsa_signal_value_t(atomic::Exchange(&signal_.value, int64_t(value), std::memory_order_release)); } hsa_signal_value_t BusyWaitSignal::ExchAcqRel(hsa_signal_value_t value) { return hsa_signal_value_t(atomic::Exchange(&signal_.value, int64_t(value), std::memory_order_acq_rel)); } hsa_signal_value_t BusyWaitSignal::CasRelaxed(hsa_signal_value_t expected, hsa_signal_value_t value) { return hsa_signal_value_t(atomic::Cas(&signal_.value, int64_t(value), int64_t(expected), std::memory_order_relaxed)); } hsa_signal_value_t BusyWaitSignal::CasAcquire(hsa_signal_value_t expected, hsa_signal_value_t value) { return hsa_signal_value_t(atomic::Cas(&signal_.value, int64_t(value), int64_t(expected), std::memory_order_acquire)); } hsa_signal_value_t BusyWaitSignal::CasRelease(hsa_signal_value_t expected, hsa_signal_value_t value) { return hsa_signal_value_t(atomic::Cas(&signal_.value, int64_t(value), int64_t(expected), std::memory_order_release)); } hsa_signal_value_t BusyWaitSignal::CasAcqRel(hsa_signal_value_t expected, hsa_signal_value_t value) { return hsa_signal_value_t(atomic::Cas(&signal_.value, int64_t(value), int64_t(expected), std::memory_order_acq_rel)); } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/host_queue.cpp000066400000000000000000000100631420110115200227120ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/host_queue.h" #include "core/inc/runtime.h" #include "core/util/utils.h" namespace rocr { namespace core { int HostQueue::rtti_id_ = 0; std::atomic HostQueue::queue_count_(0x80000000); HostQueue::HostQueue(hsa_region_t region, uint32_t ring_size, hsa_queue_type32_t type, uint32_t features, hsa_signal_t doorbell_signal) : Queue(), size_(ring_size) { HSA::hsa_memory_register(this, sizeof(HostQueue)); MAKE_NAMED_SCOPE_GUARD(registerGuard, [&]() { HSA::hsa_memory_deregister(this, sizeof(HostQueue)); }); const size_t queue_buffer_size = size_ * sizeof(AqlPacket); if (HSA_STATUS_SUCCESS != HSA::hsa_memory_allocate(region, queue_buffer_size, &ring_)) { throw AMD::hsa_exception(HSA_STATUS_ERROR_OUT_OF_RESOURCES, "Host queue buffer alloc failed\n"); } MAKE_NAMED_SCOPE_GUARD(bufferGuard, [&]() { HSA::hsa_memory_free(&ring_); }); assert(IsMultipleOf(ring_, kRingAlignment)); assert(ring_ != NULL); // Fill the ring buffer with invalid packet headers. // Leave packet content uninitialized to help track errors. for (uint32_t pkt_id = 0; pkt_id < size_; pkt_id++) { (((AqlPacket*)ring_)[pkt_id]).dispatch.header = HSA_PACKET_TYPE_INVALID; } amd_queue_.hsa_queue.base_address = ring_; amd_queue_.hsa_queue.size = size_; amd_queue_.hsa_queue.doorbell_signal = doorbell_signal; amd_queue_.hsa_queue.id = this->GetQueueId(); amd_queue_.hsa_queue.type = type; amd_queue_.hsa_queue.features = features; #ifdef HSA_LARGE_MODEL AMD_HSA_BITS_SET( amd_queue_.queue_properties, AMD_QUEUE_PROPERTIES_IS_PTR64, 1); #else AMD_HSA_BITS_SET( amd_queue_.queue_properties, AMD_QUEUE_PROPERTIES_IS_PTR64, 0); #endif amd_queue_.write_dispatch_id = amd_queue_.read_dispatch_id = 0; AMD_HSA_BITS_SET( amd_queue_.queue_properties, AMD_QUEUE_PROPERTIES_ENABLE_PROFILING, 0); bufferGuard.Dismiss(); registerGuard.Dismiss(); } HostQueue::~HostQueue() { HSA::hsa_memory_free(ring_); HSA::hsa_memory_deregister(this, sizeof(HostQueue)); } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/hsa.cpp000066400000000000000000002556761420110115200213310ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA C to C++ interface implementation. // This file does argument checking and conversion to C++. #include #include #include #include #include "core/inc/runtime.h" #include "core/inc/agent.h" #include "core/inc/host_queue.h" #include "core/inc/isa.h" #include "core/inc/memory_region.h" #include "core/inc/queue.h" #include "core/inc/signal.h" #include "core/inc/cache.h" #include "core/inc/amd_elf_image.hpp" #include "core/inc/amd_hsa_loader.hpp" #include "core/inc/amd_loader_context.hpp" #include "core/inc/hsa_ven_amd_loader_impl.h" #include "inc/hsa_ven_amd_aqlprofile.h" #include "core/inc/hsa_ext_amd_impl.h" namespace rocr { using namespace amd::hsa; template struct ValidityError; template <> struct ValidityError { enum { kValue = HSA_STATUS_ERROR_INVALID_SIGNAL }; }; template <> struct ValidityError { enum { kValue = HSA_STATUS_ERROR_INVALID_SIGNAL_GROUP }; }; template <> struct ValidityError { enum { kValue = HSA_STATUS_ERROR_INVALID_AGENT }; }; template <> struct ValidityError { enum { kValue = HSA_STATUS_ERROR_INVALID_REGION }; }; template <> struct ValidityError { enum { kValue = HSA_STATUS_ERROR_INVALID_QUEUE }; }; template <> struct ValidityError { enum { kValue = HSA_STATUS_ERROR_INVALID_CACHE }; }; template <> struct ValidityError { enum { kValue = HSA_STATUS_ERROR_INVALID_ISA }; }; template struct ValidityError { enum { kValue = ValidityError::kValue }; }; #define IS_BAD_PTR(ptr) \ do { \ if ((ptr) == nullptr) return HSA_STATUS_ERROR_INVALID_ARGUMENT; \ } while (false) #define IS_BAD_PROFILE(profile) \ do { \ if (profile != HSA_PROFILE_BASE && \ profile != HSA_PROFILE_FULL) { \ return HSA_STATUS_ERROR_INVALID_ARGUMENT; \ } \ } while (false) #define IS_BAD_EXECUTABLE_STATE(executable_state) \ do { \ if (executable_state != HSA_EXECUTABLE_STATE_FROZEN && \ executable_state != HSA_EXECUTABLE_STATE_UNFROZEN) { \ return HSA_STATUS_ERROR_INVALID_ARGUMENT; \ } \ } while (false) #define IS_BAD_ROUNDING_MODE(rounding_mode) \ do { \ if (rounding_mode != HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT && \ rounding_mode != HSA_DEFAULT_FLOAT_ROUNDING_MODE_ZERO && \ rounding_mode != HSA_DEFAULT_FLOAT_ROUNDING_MODE_NEAR) { \ return HSA_STATUS_ERROR_INVALID_ARGUMENT; \ } \ } while (false) #define IS_BAD_FP_TYPE(fp_type) \ do { \ if (fp_type != HSA_FP_TYPE_16 && \ fp_type != HSA_FP_TYPE_32 && \ fp_type != HSA_FP_TYPE_64) { \ return HSA_STATUS_ERROR_INVALID_ARGUMENT; \ } \ } while (false) #define IS_BAD_FLUSH_MODE(flush_mode) \ do { \ if (flush_mode != HSA_FLUSH_MODE_FTZ && \ flush_mode != HSA_FLUSH_MODE_NON_FTZ) { \ return HSA_STATUS_ERROR_INVALID_ARGUMENT; \ } \ } while (false) #define IS_VALID(ptr) \ do { \ if (((ptr) == NULL) || !((ptr)->IsValid())) \ return hsa_status_t(ValidityError::kValue); \ } while (false) #define CHECK_STATUS(status) \ do { \ if ((status) != HSA_STATUS_SUCCESS) return status; \ } while (false) #define CHECK_ALLOC(ptr) \ do { \ if ((ptr) == nullptr) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; \ } while (false) #define IS_OPEN() \ do { \ if (!core::Runtime::runtime_singleton_->IsOpen()) \ return HSA_STATUS_ERROR_NOT_INITIALIZED; \ } while (false) template static __forceinline bool IsValid(T* ptr) { return (ptr == NULL) ? NULL : ptr->IsValid(); } namespace AMD { hsa_status_t handleException(); template static __forceinline T handleExceptionT() { handleException(); abort(); return T(); } } // namespace amd #define TRY try { #define CATCH } catch(...) { return AMD::handleException(); } #define CATCHRET(RETURN_TYPE) } catch(...) { return AMD::handleExceptionT(); } //----------------------------------------------------------------------------- // Basic Checks //----------------------------------------------------------------------------- static_assert(sizeof(hsa_barrier_and_packet_t) == sizeof(hsa_kernel_dispatch_packet_t), "AQL packet definitions have wrong sizes!"); static_assert(sizeof(hsa_barrier_and_packet_t) == sizeof(hsa_agent_dispatch_packet_t), "AQL packet definitions have wrong sizes!"); static_assert(sizeof(hsa_barrier_and_packet_t) == 64, "AQL packet definitions have wrong sizes!"); static_assert(sizeof(hsa_barrier_and_packet_t) == sizeof(hsa_barrier_or_packet_t), "AQL packet definitions have wrong sizes!"); #ifdef HSA_LARGE_MODEL static_assert(sizeof(void*) == 8, "HSA_LARGE_MODEL is set incorrectly!"); #else static_assert(sizeof(void*) == 4, "HSA_LARGE_MODEL is set incorrectly!"); #endif #if !defined(HSA_LARGE_MODEL) || !defined(__linux__) // static_assert(false, "Only HSA_LARGE_MODEL (64bit mode) and Linux supported."); #endif namespace HSA { //---------------------------------------------------------------------------// // Init/Shutdown routines //---------------------------------------------------------------------------// hsa_status_t hsa_init() { TRY; return core::Runtime::runtime_singleton_->Acquire(); CATCH; } hsa_status_t hsa_shut_down() { TRY; IS_OPEN(); return core::Runtime::runtime_singleton_->Release(); CATCH; } //---------------------------------------------------------------------------// // System //---------------------------------------------------------------------------// hsa_status_t hsa_system_get_info(hsa_system_info_t attribute, void* value) { TRY; IS_OPEN(); IS_BAD_PTR(value); return core::Runtime::runtime_singleton_->GetSystemInfo(attribute, value); CATCH; } hsa_status_t hsa_extension_get_name(uint16_t extension, const char** name) { TRY; IS_OPEN(); IS_BAD_PTR(name); switch (extension) { case HSA_EXTENSION_FINALIZER: *name = "HSA_EXTENSION_FINALIZER"; break; case HSA_EXTENSION_IMAGES: *name = "HSA_EXTENSION_IMAGES"; break; case HSA_EXTENSION_PERFORMANCE_COUNTERS: *name = "HSA_EXTENSION_PERFORMANCE_COUNTERS"; break; case HSA_EXTENSION_PROFILING_EVENTS: *name = "HSA_EXTENSION_PROFILING_EVENTS"; break; case HSA_EXTENSION_AMD_PROFILER: *name = "HSA_EXTENSION_AMD_PROFILER"; break; case HSA_EXTENSION_AMD_LOADER: *name = "HSA_EXTENSION_AMD_LOADER"; break; case HSA_EXTENSION_AMD_AQLPROFILE: *name = "HSA_EXTENSION_AMD_AQLPROFILE"; break; default: *name = "HSA_EXTENSION_INVALID"; return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_system_extension_supported(uint16_t extension, uint16_t version_major, uint16_t version_minor, bool* result) { TRY; IS_OPEN(); if ((extension > HSA_EXTENSION_STD_LAST && (extension < HSA_AMD_FIRST_EXTENSION || extension > HSA_AMD_LAST_EXTENSION)) || result == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } *result = false; if (extension == HSA_EXTENSION_PERFORMANCE_COUNTERS || extension == HSA_EXTENSION_PROFILING_EVENTS) return HSA_STATUS_SUCCESS; uint16_t system_version_major = 0; hsa_status_t status = core::Runtime::runtime_singleton_->GetSystemInfo( HSA_SYSTEM_INFO_VERSION_MAJOR, &system_version_major); assert(status == HSA_STATUS_SUCCESS); if (version_major <= system_version_major) { uint16_t system_version_minor = 0; if (version_minor <= system_version_minor) { *result = true; } } return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_system_major_extension_supported(uint16_t extension, uint16_t version_major, uint16_t* version_minor, bool* result) { TRY; IS_OPEN(); IS_BAD_PTR(version_minor); IS_BAD_PTR(result); if ((extension == HSA_EXTENSION_IMAGES) && (version_major == 1)) { *version_minor = 0; *result = true; return HSA_STATUS_SUCCESS; } if ((extension == HSA_EXTENSION_FINALIZER) && (version_major == 1)) { *version_minor = 0; *result = true; return HSA_STATUS_SUCCESS; } if ((extension == HSA_EXTENSION_AMD_LOADER) && (version_major == 1)) { *version_minor = 0; *result = true; return HSA_STATUS_SUCCESS; } if ((extension == HSA_EXTENSION_AMD_AQLPROFILE) && (version_major == 1)) { *version_minor = 0; *result = true; return HSA_STATUS_SUCCESS; } *result = false; return HSA_STATUS_SUCCESS; CATCH; } static size_t get_extension_table_length(uint16_t extension, uint16_t major, uint16_t minor) { // Table to convert from major/minor to major/length struct sizes_t { std::string name; size_t size; }; static sizes_t sizes[] = { {"hsa_ext_images_1_00_pfn_t", sizeof(hsa_ext_images_1_00_pfn_t)}, {"hsa_ext_finalizer_1_00_pfn_t", sizeof(hsa_ext_finalizer_1_00_pfn_t)}, {"hsa_ven_amd_loader_1_00_pfn_t", sizeof(hsa_ven_amd_loader_1_00_pfn_t)}, {"hsa_ven_amd_loader_1_01_pfn_t", sizeof(hsa_ven_amd_loader_1_01_pfn_t)}, {"hsa_ven_amd_loader_1_02_pfn_t", sizeof(hsa_ven_amd_loader_1_02_pfn_t)}, {"hsa_ven_amd_loader_1_03_pfn_t", sizeof(hsa_ven_amd_loader_1_03_pfn_t)}, {"hsa_ven_amd_aqlprofile_1_00_pfn_t", sizeof(hsa_ven_amd_aqlprofile_1_00_pfn_t)}}; static const size_t num_tables = sizeof(sizes) / sizeof(sizes_t); if (minor > 99) return 0; std::string name; switch (extension) { case HSA_EXTENSION_FINALIZER: name = "hsa_ext_finalizer_"; break; case HSA_EXTENSION_IMAGES: name = "hsa_ext_images_"; break; // case HSA_EXTENSION_PERFORMANCE_COUNTERS: // name = "hsa_ext_perf_counter_"; // break; // case HSA_EXTENSION_PROFILING_EVENTS: // name = "hsa_ext_profiling_event_"; // break; // case HSA_EXTENSION_AMD_PROFILER: // name = "hsa_ven_amd_profiler_"; // break; case HSA_EXTENSION_AMD_LOADER: name = "hsa_ven_amd_loader_"; break; case HSA_EXTENSION_AMD_AQLPROFILE: name = "hsa_ven_amd_aqlprofile_"; break; default: return 0; } char buff[6]; sprintf(buff, "%02u", minor); name += std::to_string(major) + "_" + buff + "_pfn_t"; for (size_t i = 0; i < num_tables; i++) { if (sizes[i].name == name) return sizes[i].size; } return 0; } hsa_status_t hsa_system_get_extension_table(uint16_t extension, uint16_t version_major, uint16_t version_minor, void* table) { TRY; return HSA::hsa_system_get_major_extension_table( extension, version_major, get_extension_table_length(extension, version_major, version_minor), table); CATCH; } hsa_status_t hsa_system_get_major_extension_table(uint16_t extension, uint16_t version_major, size_t table_length, void* table) { TRY; IS_OPEN(); IS_BAD_PTR(table); if (table_length == 0) return HSA_STATUS_ERROR_INVALID_ARGUMENT; if (extension == HSA_EXTENSION_IMAGES) { if (version_major != core::Runtime::runtime_singleton_->extensions_.image_api.version.major_id) { return HSA_STATUS_ERROR; } hsa_ext_images_1_pfn_t ext_table; ext_table.hsa_ext_image_clear = hsa_ext_image_clear; ext_table.hsa_ext_image_copy = hsa_ext_image_copy; ext_table.hsa_ext_image_create = hsa_ext_image_create; ext_table.hsa_ext_image_data_get_info = hsa_ext_image_data_get_info; ext_table.hsa_ext_image_destroy = hsa_ext_image_destroy; ext_table.hsa_ext_image_export = hsa_ext_image_export; ext_table.hsa_ext_image_get_capability = hsa_ext_image_get_capability; ext_table.hsa_ext_image_import = hsa_ext_image_import; ext_table.hsa_ext_sampler_create = hsa_ext_sampler_create; ext_table.hsa_ext_sampler_destroy = hsa_ext_sampler_destroy; ext_table.hsa_ext_image_get_capability_with_layout = hsa_ext_image_get_capability_with_layout; ext_table.hsa_ext_image_data_get_info_with_layout = hsa_ext_image_data_get_info_with_layout; ext_table.hsa_ext_image_create_with_layout = hsa_ext_image_create_with_layout; memcpy(table, &ext_table, Min(sizeof(ext_table), table_length)); return HSA_STATUS_SUCCESS; } if (extension == HSA_EXTENSION_FINALIZER) { if (version_major != core::Runtime::runtime_singleton_->extensions_.finalizer_api.version.major_id) { return HSA_STATUS_ERROR; } hsa_ext_finalizer_1_00_pfn_t ext_table; ext_table.hsa_ext_program_add_module = hsa_ext_program_add_module; ext_table.hsa_ext_program_create = hsa_ext_program_create; ext_table.hsa_ext_program_destroy = hsa_ext_program_destroy; ext_table.hsa_ext_program_finalize = hsa_ext_program_finalize; ext_table.hsa_ext_program_get_info = hsa_ext_program_get_info; ext_table.hsa_ext_program_iterate_modules = hsa_ext_program_iterate_modules; memcpy(table, &ext_table, Min(sizeof(ext_table), table_length)); return HSA_STATUS_SUCCESS; } if (extension == HSA_EXTENSION_AMD_LOADER) { if (version_major != 1) return HSA_STATUS_ERROR; hsa_ven_amd_loader_1_03_pfn_t ext_table; ext_table.hsa_ven_amd_loader_query_host_address = hsa_ven_amd_loader_query_host_address; ext_table.hsa_ven_amd_loader_query_segment_descriptors = hsa_ven_amd_loader_query_segment_descriptors; ext_table.hsa_ven_amd_loader_query_executable = hsa_ven_amd_loader_query_executable; ext_table.hsa_ven_amd_loader_executable_iterate_loaded_code_objects = hsa_ven_amd_loader_executable_iterate_loaded_code_objects; ext_table.hsa_ven_amd_loader_loaded_code_object_get_info = hsa_ven_amd_loader_loaded_code_object_get_info; ext_table.hsa_ven_amd_loader_code_object_reader_create_from_file_with_offset_size = hsa_ven_amd_loader_code_object_reader_create_from_file_with_offset_size; ext_table.hsa_ven_amd_loader_iterate_executables = hsa_ven_amd_loader_iterate_executables; memcpy(table, &ext_table, Min(sizeof(ext_table), table_length)); return HSA_STATUS_SUCCESS; } if (extension == HSA_EXTENSION_AMD_AQLPROFILE) { if (version_major != hsa_ven_amd_aqlprofile_VERSION_MAJOR) { debug_print("aqlprofile API incompatible ver %d, current ver %d\n", version_major, hsa_ven_amd_aqlprofile_VERSION_MAJOR); return HSA_STATUS_ERROR; } os::LibHandle lib = os::LoadLib(kAqlProfileLib); if (lib == NULL) { debug_print("Loading '%s' failed\n", kAqlProfileLib); return HSA_STATUS_ERROR; } hsa_ven_amd_aqlprofile_pfn_t ext_table; ext_table.hsa_ven_amd_aqlprofile_version_major = (decltype(::hsa_ven_amd_aqlprofile_version_major)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_version_major"); ext_table.hsa_ven_amd_aqlprofile_version_minor = (decltype(::hsa_ven_amd_aqlprofile_version_minor)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_version_minor"); ext_table.hsa_ven_amd_aqlprofile_error_string = (decltype(::hsa_ven_amd_aqlprofile_error_string)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_error_string"); ext_table.hsa_ven_amd_aqlprofile_validate_event = (decltype(::hsa_ven_amd_aqlprofile_validate_event)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_validate_event"); ext_table.hsa_ven_amd_aqlprofile_start = (decltype(::hsa_ven_amd_aqlprofile_start)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_start"); ext_table.hsa_ven_amd_aqlprofile_stop = (decltype(::hsa_ven_amd_aqlprofile_stop)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_stop"); ext_table.hsa_ven_amd_aqlprofile_read = (decltype(::hsa_ven_amd_aqlprofile_read)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_read"); ext_table.hsa_ven_amd_aqlprofile_legacy_get_pm4 = (decltype(::hsa_ven_amd_aqlprofile_legacy_get_pm4)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_legacy_get_pm4"); ext_table.hsa_ven_amd_aqlprofile_get_info = (decltype(::hsa_ven_amd_aqlprofile_get_info)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_get_info"); ext_table.hsa_ven_amd_aqlprofile_iterate_data = (decltype(::hsa_ven_amd_aqlprofile_iterate_data)*) os::GetExportAddress(lib, "hsa_ven_amd_aqlprofile_iterate_data"); bool version_incompatible = true; uint32_t version_curr = 0; version_major = HSA_AQLPROFILE_VERSION_MAJOR; if (ext_table.hsa_ven_amd_aqlprofile_version_major != NULL) { version_curr = ext_table.hsa_ven_amd_aqlprofile_version_major(); version_incompatible = (version_major != version_curr); } if (version_incompatible == true) { debug_print("Loading '%s' failed, incompatible ver %d, current ver %d\n", kAqlProfileLib, version_major, version_curr); return HSA_STATUS_ERROR; } memcpy(table, &ext_table, Min(sizeof(ext_table), table_length)); return HSA_STATUS_SUCCESS; } return HSA_STATUS_ERROR; CATCH; } //---------------------------------------------------------------------------// // Agent //---------------------------------------------------------------------------// hsa_status_t hsa_iterate_agents(hsa_status_t (*callback)(hsa_agent_t agent, void* data), void* data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); return core::Runtime::runtime_singleton_->IterateAgent(callback, data); CATCH; } hsa_status_t hsa_agent_get_info(hsa_agent_t agent_handle, hsa_agent_info_t attribute, void* value) { TRY; IS_OPEN(); IS_BAD_PTR(value); const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); return agent->GetInfo(attribute, value); CATCH; } hsa_status_t hsa_agent_get_exception_policies(hsa_agent_t agent_handle, hsa_profile_t profile, uint16_t* mask) { TRY; IS_OPEN(); IS_BAD_PTR(mask); IS_BAD_PROFILE(profile); const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); *mask = 0; return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_cache_get_info(hsa_cache_t cache, hsa_cache_info_t attribute, void* value) { TRY; IS_OPEN(); core::Cache* Cache = core::Cache::Convert(cache); IS_VALID(Cache); IS_BAD_PTR(value); return Cache->GetInfo(attribute, value); CATCH; } hsa_status_t hsa_agent_iterate_caches(hsa_agent_t agent_handle, hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* data) { TRY; IS_OPEN(); const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); IS_BAD_PTR(callback); return agent->IterateCache(callback, data); CATCH; } hsa_status_t hsa_agent_extension_supported(uint16_t extension, hsa_agent_t agent_handle, uint16_t version_major, uint16_t version_minor, bool* result) { TRY; IS_OPEN(); if ((extension > HSA_EXTENSION_STD_LAST && (extension < HSA_AMD_FIRST_EXTENSION || extension > HSA_AMD_LAST_EXTENSION)) || result == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } *result = false; const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); if (agent->device_type() == core::Agent::kAmdGpuDevice) { uint16_t agent_version_major = 0; hsa_status_t status = agent->GetInfo(HSA_AGENT_INFO_VERSION_MAJOR, &agent_version_major); assert(status == HSA_STATUS_SUCCESS); if (version_major <= agent_version_major) { uint16_t agent_version_minor = 0; if (version_minor <= agent_version_minor) { *result = true; } } } return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_agent_major_extension_supported(uint16_t extension, hsa_agent_t agent_handle, uint16_t version_major, uint16_t* version_minor, bool* result) { TRY; IS_OPEN(); if ((extension > HSA_EXTENSION_STD_LAST && (extension < HSA_AMD_FIRST_EXTENSION || extension > HSA_AMD_LAST_EXTENSION)) || result == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } *result = false; const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); if (agent->device_type() == core::Agent::kAmdGpuDevice) { uint16_t agent_version_major = 0; hsa_status_t status = agent->GetInfo(HSA_AGENT_INFO_VERSION_MAJOR, &agent_version_major); assert(status == HSA_STATUS_SUCCESS); if (version_major <= agent_version_major) { *version_minor = 0; *result = true; } } return HSA_STATUS_SUCCESS; CATCH; } /// @brief Api to create a user mode queue. /// /// @param agent Hsa Agent which will execute Aql commands /// /// @param size Size of Queue in terms of Aql packet size /// /// @param type of Queue Single Writer or Multiple Writer /// /// @param callback Callback function to register in case Quee /// encounters an error /// /// @param service_queue Pointer to a service queue /// /// @param queue Output parameter updated with a pointer to the /// queue being created /// /// @return hsa_status hsa_status_t hsa_queue_create( hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type, void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data, uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue) { TRY; IS_OPEN(); if ((queue == nullptr) || (size == 0) || (!IsPowerOfTwo(size)) || (type > HSA_QUEUE_TYPE_COOPERATIVE)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); hsa_queue_type32_t agent_queue_type = HSA_QUEUE_TYPE_MULTI; hsa_status_t status = agent->GetInfo(HSA_AGENT_INFO_QUEUE_TYPE, &agent_queue_type); assert(HSA_STATUS_SUCCESS == status); if ((agent_queue_type == HSA_QUEUE_TYPE_SINGLE) && (type != HSA_QUEUE_TYPE_SINGLE)) { return HSA_STATUS_ERROR_INVALID_QUEUE_CREATION; } if (callback == nullptr) callback = core::Queue::DefaultErrorHandler; core::Queue* cmd_queue = nullptr; status = agent->QueueCreate(size, type, callback, data, private_segment_size, group_segment_size, &cmd_queue); if (status != HSA_STATUS_SUCCESS) return status; assert(cmd_queue != nullptr && "Queue not returned but status was success.\n"); *queue = core::Queue::Convert(cmd_queue); return status; CATCH; } hsa_status_t hsa_soft_queue_create(hsa_region_t region, uint32_t size, hsa_queue_type32_t type, uint32_t features, hsa_signal_t doorbell_signal, hsa_queue_t** queue) { TRY; IS_OPEN(); if ((queue == NULL) || (region.handle == 0) || (doorbell_signal.handle == 0) || (size == 0) || (!IsPowerOfTwo(size)) || (type < HSA_QUEUE_TYPE_MULTI) || (type > HSA_QUEUE_TYPE_SINGLE) || (features == 0)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } const core::MemoryRegion* mem_region = core::MemoryRegion::Convert(region); IS_VALID(mem_region); const core::Signal* signal = core::Signal::Convert(doorbell_signal); IS_VALID(signal); core::HostQueue* host_queue = new core::HostQueue(region, size, type, features, doorbell_signal); *queue = core::Queue::Convert(host_queue); return HSA_STATUS_SUCCESS; CATCH; } /// @brief Api to destroy a user mode queue /// /// @param queue Pointer to the queue being destroyed /// /// @return hsa_status hsa_status_t hsa_queue_destroy(hsa_queue_t* queue) { TRY; IS_OPEN(); IS_BAD_PTR(queue); core::Queue* cmd_queue = core::Queue::Convert(queue); IS_VALID(cmd_queue); cmd_queue->Destroy(); return HSA_STATUS_SUCCESS; CATCH; } /// @brief Api to inactivate a user mode queue /// /// @param queue Pointer to the queue being inactivated /// /// @return hsa_status hsa_status_t hsa_queue_inactivate(hsa_queue_t* queue) { TRY; IS_OPEN(); IS_BAD_PTR(queue); core::Queue* cmd_queue = core::Queue::Convert(queue); IS_VALID(cmd_queue); cmd_queue->Inactivate(); return HSA_STATUS_SUCCESS; CATCH; } /// @brief Api to read the Read Index of Queue using Acquire semantics /// /// @param queue Pointer to the queue whose read index is being read /// /// @return uint64_t Value of Read index uint64_t hsa_queue_load_read_index_scacquire(const hsa_queue_t* queue) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->LoadReadIndexAcquire(); CATCHRET(uint64_t); } /// @brief Api to read the Read Index of Queue using Relaxed semantics /// /// @param queue Pointer to the queue whose read index is being read /// /// @return uint64_t Value of Read index uint64_t hsa_queue_load_read_index_relaxed(const hsa_queue_t* queue) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->LoadReadIndexRelaxed(); CATCHRET(uint64_t); } /// @brief Api to read the Write Index of Queue using Acquire semantics /// /// @param queue Pointer to the queue whose write index is being read /// /// @return uint64_t Value of Write index uint64_t hsa_queue_load_write_index_scacquire(const hsa_queue_t* queue) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->LoadWriteIndexAcquire(); CATCHRET(uint64_t); } /// @brief Api to read the Write Index of Queue using Relaxed semantics /// /// @param queue Pointer to the queue whose write index is being read /// /// @return uint64_t Value of Write index uint64_t hsa_queue_load_write_index_relaxed(const hsa_queue_t* queue) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->LoadWriteIndexRelaxed(); CATCHRET(uint64_t); } /// @brief Api to store the Read Index of Queue using Relaxed semantics /// /// @param queue Pointer to the queue whose read index is being updated /// /// @param value Value of new read index void hsa_queue_store_read_index_relaxed(const hsa_queue_t* queue, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); cmd_queue->StoreReadIndexRelaxed(value); CATCHRET(void); } /// @brief Api to store the Read Index of Queue using Release semantics /// /// @param queue Pointer to the queue whose read index is being updated /// /// @param value Value of new read index void hsa_queue_store_read_index_screlease(const hsa_queue_t* queue, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); cmd_queue->StoreReadIndexRelease(value); CATCHRET(void); } /// @brief Api to store the Write Index of Queue using Relaxed semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param value Value of new write index void hsa_queue_store_write_index_relaxed(const hsa_queue_t* queue, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); cmd_queue->StoreWriteIndexRelaxed(value); CATCHRET(void); } /// @brief Api to store the Write Index of Queue using Release semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param value Value of new write index void hsa_queue_store_write_index_screlease(const hsa_queue_t* queue, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); cmd_queue->StoreWriteIndexRelease(value); CATCHRET(void); } /// @brief Api to compare and swap the Write Index of Queue using Acquire and /// Release semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t hsa_queue_cas_write_index_scacq_screl(const hsa_queue_t* queue, uint64_t expected, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->CasWriteIndexAcqRel(expected, value); CATCHRET(uint64_t); } /// @brief Api to compare and swap the Write Index of Queue using Acquire /// Semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t hsa_queue_cas_write_index_scacquire(const hsa_queue_t* queue, uint64_t expected, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->CasWriteIndexAcquire(expected, value); CATCHRET(uint64_t); } /// @brief Api to compare and swap the Write Index of Queue using Relaxed /// Semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t hsa_queue_cas_write_index_relaxed(const hsa_queue_t* queue, uint64_t expected, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->CasWriteIndexRelaxed(expected, value); CATCHRET(uint64_t); } /// @brief Api to compare and swap the Write Index of Queue using Release /// Semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param expected Current value of write index /// /// @param value Value of new write index /// /// @return uint64_t Value of write index before the update uint64_t hsa_queue_cas_write_index_screlease(const hsa_queue_t* queue, uint64_t expected, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->CasWriteIndexRelease(expected, value); CATCHRET(uint64_t); } /// @brief Api to Add to the Write Index of Queue using Acquire and Release /// Semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param value Value to add to write index /// /// @return uint64_t Value of write index before the update uint64_t hsa_queue_add_write_index_scacq_screl(const hsa_queue_t* queue, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->AddWriteIndexAcqRel(value); CATCHRET(uint64_t); } /// @brief Api to Add to the Write Index of Queue using Acquire Semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param value Value to add to write index /// /// @return uint64_t Value of write index before the update uint64_t hsa_queue_add_write_index_scacquire(const hsa_queue_t* queue, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->AddWriteIndexAcquire(value); CATCHRET(uint64_t); } /// @brief Api to Add to the Write Index of Queue using Relaxed Semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param value Value to add to write index /// /// @return uint64_t Value of write index before the update uint64_t hsa_queue_add_write_index_relaxed(const hsa_queue_t* queue, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->AddWriteIndexRelaxed(value); CATCHRET(uint64_t); } /// @brief Api to Add to the Write Index of Queue using Release Semantics /// /// @param queue Pointer to the queue whose write index is being updated /// /// @param value Value to add to write index /// /// @return uint64_t Value of write index before the update uint64_t hsa_queue_add_write_index_screlease(const hsa_queue_t* queue, uint64_t value) { TRY; core::Queue* cmd_queue = core::Queue::Convert(queue); assert(IsValid(cmd_queue)); return cmd_queue->AddWriteIndexRelease(value); CATCHRET(uint64_t); } //----------------------------------------------------------------------------- // Memory //----------------------------------------------------------------------------- hsa_status_t hsa_agent_iterate_regions( hsa_agent_t agent_handle, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); return agent->IterateRegion(callback, data); CATCH; } hsa_status_t hsa_region_get_info(hsa_region_t region, hsa_region_info_t attribute, void* value) { TRY; IS_OPEN(); IS_BAD_PTR(value); const core::MemoryRegion* mem_region = core::MemoryRegion::Convert(region); IS_VALID(mem_region); return mem_region->GetInfo(attribute, value); CATCH; } hsa_status_t hsa_memory_register(void* address, size_t size) { TRY; IS_OPEN(); if (size == 0 && address != NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_memory_deregister(void* address, size_t size) { TRY; IS_OPEN(); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_memory_allocate(hsa_region_t region, size_t size, void** ptr) { TRY; IS_OPEN(); if (size == 0 || ptr == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } const core::MemoryRegion* mem_region = core::MemoryRegion::Convert(region); IS_VALID(mem_region); return core::Runtime::runtime_singleton_->AllocateMemory( mem_region, size, core::MemoryRegion::AllocateNoFlags, ptr); CATCH; } hsa_status_t hsa_memory_free(void* ptr) { TRY; IS_OPEN(); if (ptr == NULL) { return HSA_STATUS_SUCCESS; } return core::Runtime::runtime_singleton_->FreeMemory(ptr); CATCH; } hsa_status_t hsa_memory_assign_agent(void* ptr, hsa_agent_t agent_handle, hsa_access_permission_t access) { TRY; IS_OPEN(); if ((ptr == NULL) || (access < HSA_ACCESS_PERMISSION_RO) || (access > HSA_ACCESS_PERMISSION_RW)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_memory_copy(void* dst, const void* src, size_t size) { TRY; IS_OPEN(); if (dst == NULL || src == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (size == 0) { return HSA_STATUS_SUCCESS; } return core::Runtime::runtime_singleton_->CopyMemory(dst, src, size); CATCH; } //----------------------------------------------------------------------------- // Signals //----------------------------------------------------------------------------- hsa_status_t hsa_signal_create(hsa_signal_value_t initial_value, uint32_t num_consumers, const hsa_agent_t* consumers, hsa_signal_t* hsa_signal) { return AMD::hsa_amd_signal_create(initial_value, num_consumers, consumers, 0, hsa_signal); } hsa_status_t hsa_signal_destroy(hsa_signal_t hsa_signal) { TRY; IS_OPEN(); core::Signal* signal = core::Signal::Convert(hsa_signal); signal->DestroySignal(); return HSA_STATUS_SUCCESS; CATCH; } hsa_signal_value_t hsa_signal_load_relaxed(hsa_signal_t hsa_signal) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->LoadRelaxed(); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_load_scacquire(hsa_signal_t hsa_signal) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->LoadAcquire(); CATCHRET(hsa_signal_value_t); } void hsa_signal_store_relaxed(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->StoreRelaxed(value); CATCHRET(void); } void hsa_signal_store_screlease(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->StoreRelease(value); CATCHRET(void); } hsa_signal_value_t hsa_signal_wait_relaxed(hsa_signal_t hsa_signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_state_hint) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->WaitRelaxed(condition, compare_value, timeout_hint, wait_state_hint); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_wait_scacquire(hsa_signal_t hsa_signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_state_hint) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->WaitAcquire(condition, compare_value, timeout_hint, wait_state_hint); CATCHRET(hsa_signal_value_t); } hsa_status_t hsa_signal_group_create(uint32_t num_signals, const hsa_signal_t* signals, uint32_t num_consumers, const hsa_agent_t* consumers, hsa_signal_group_t* signal_group) { TRY; IS_OPEN(); if (num_signals == 0) return HSA_STATUS_ERROR_INVALID_ARGUMENT; for (uint i = 0; i < num_signals; i++) IS_VALID(core::Signal::Convert(signals[i])); for (uint i = 0; i < num_consumers; i++) IS_VALID(core::Agent::Convert(consumers[i])); core::SignalGroup* group = new core::SignalGroup(num_signals, signals); CHECK_ALLOC(group); if (!group->IsValid()) { delete group; return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } *signal_group = core::SignalGroup::Convert(group); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_signal_group_destroy(hsa_signal_group_t signal_group) { TRY; IS_OPEN(); core::SignalGroup* group = core::SignalGroup::Convert(signal_group); IS_VALID(group); delete group; return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_signal_group_wait_any_relaxed(hsa_signal_group_t signal_group, const hsa_signal_condition_t* conditions, const hsa_signal_value_t* compare_values, hsa_wait_state_t wait_state_hint, hsa_signal_t* signal, hsa_signal_value_t* value) { TRY; IS_OPEN(); const core::SignalGroup* group = core::SignalGroup::Convert(signal_group); IS_VALID(group); const uint32_t index = AMD::hsa_amd_signal_wait_any( group->Count(), const_cast(group->List()), const_cast(conditions), const_cast(compare_values), uint64_t(-1), wait_state_hint, value); if (index >= group->Count()) return HSA_STATUS_ERROR_INVALID_ARGUMENT; *signal = group->List()[index]; return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_signal_group_wait_any_scacquire(hsa_signal_group_t signal_group, const hsa_signal_condition_t* conditions, const hsa_signal_value_t* compare_values, hsa_wait_state_t wait_state_hint, hsa_signal_t* signal, hsa_signal_value_t* value) { TRY; hsa_status_t ret = HSA::hsa_signal_group_wait_any_relaxed( signal_group, conditions, compare_values, wait_state_hint, signal, value); std::atomic_thread_fence(std::memory_order_acquire); return ret; CATCH; } void hsa_signal_and_relaxed(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->AndRelaxed(value); CATCHRET(void); } void hsa_signal_and_scacquire(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->AndAcquire(value); CATCHRET(void); } void hsa_signal_and_screlease(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->AndRelease(value); CATCHRET(void); } void hsa_signal_and_scacq_screl(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->AndAcqRel(value); CATCHRET(void); } void hsa_signal_or_relaxed(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->OrRelaxed(value); CATCHRET(void); } void hsa_signal_or_scacquire(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->OrAcquire(value); CATCHRET(void); } void hsa_signal_or_screlease(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->OrRelease(value); CATCHRET(void); } void hsa_signal_or_scacq_screl(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->OrAcqRel(value); CATCHRET(void); } void hsa_signal_xor_relaxed(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->XorRelaxed(value); CATCHRET(void); } void hsa_signal_xor_scacquire(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->XorAcquire(value); CATCHRET(void); } void hsa_signal_xor_screlease(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->XorRelease(value); CATCHRET(void); } void hsa_signal_xor_scacq_screl(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->XorAcqRel(value); CATCHRET(void); } void hsa_signal_add_relaxed(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->AddRelaxed(value); CATCHRET(void); } void hsa_signal_add_scacquire(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->AddAcquire(value); CATCHRET(void); } void hsa_signal_add_screlease(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->AddRelease(value); CATCHRET(void); } void hsa_signal_add_scacq_screl(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->AddAcqRel(value); CATCHRET(void); } void hsa_signal_subtract_relaxed(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->SubRelaxed(value); CATCHRET(void); } void hsa_signal_subtract_scacquire(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->SubAcquire(value); CATCHRET(void); } void hsa_signal_subtract_screlease(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->SubRelease(value); CATCHRET(void); } void hsa_signal_subtract_scacq_screl(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); signal->SubAcqRel(value); CATCHRET(void); } hsa_signal_value_t hsa_signal_exchange_relaxed(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->ExchRelaxed(value); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_exchange_scacquire(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->ExchAcquire(value); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_exchange_screlease(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->ExchRelease(value); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_exchange_scacq_screl(hsa_signal_t hsa_signal, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->ExchAcqRel(value); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_cas_relaxed(hsa_signal_t hsa_signal, hsa_signal_value_t expected, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->CasRelaxed(expected, value); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_cas_scacquire(hsa_signal_t hsa_signal, hsa_signal_value_t expected, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->CasAcquire(expected, value); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_cas_screlease(hsa_signal_t hsa_signal, hsa_signal_value_t expected, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->CasRelease(expected, value); CATCHRET(hsa_signal_value_t); } hsa_signal_value_t hsa_signal_cas_scacq_screl(hsa_signal_t hsa_signal, hsa_signal_value_t expected, hsa_signal_value_t value) { TRY; core::Signal* signal = core::Signal::Convert(hsa_signal); assert(IsValid(signal)); return signal->CasAcqRel(expected, value); CATCHRET(hsa_signal_value_t); } //===--- Instruction Set Architecture -------------------------------------===// using core::Isa; using core::IsaRegistry; using core::Wavefront; hsa_status_t hsa_isa_from_name( const char *name, hsa_isa_t *isa) { TRY; IS_OPEN(); IS_BAD_PTR(name); IS_BAD_PTR(isa); const Isa *isa_object = IsaRegistry::GetIsa(name); if (!isa_object) { return HSA_STATUS_ERROR_INVALID_ISA_NAME; } *isa = Isa::Handle(isa_object); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_agent_iterate_isas( hsa_agent_t agent, hsa_status_t (*callback)(hsa_isa_t isa, void *data), void *data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); const core::Agent *agent_object = core::Agent::Convert(agent); IS_VALID(agent_object); const Isa *isa_object = agent_object->isa(); if (!isa_object) { return HSA_STATUS_SUCCESS; } return callback(Isa::Handle(isa_object), data); CATCH; } /* deprecated */ hsa_status_t hsa_isa_get_info( hsa_isa_t isa, hsa_isa_info_t attribute, uint32_t index, void *value) { TRY; IS_OPEN(); IS_BAD_PTR(value); if (index != 0) { return HSA_STATUS_ERROR_INVALID_INDEX; } const Isa *isa_object = Isa::Object(isa); IS_VALID(isa_object); return isa_object->GetInfo(attribute, value) ? HSA_STATUS_SUCCESS : HSA_STATUS_ERROR_INVALID_ARGUMENT; CATCH; } hsa_status_t hsa_isa_get_info_alt( hsa_isa_t isa, hsa_isa_info_t attribute, void *value) { TRY; IS_OPEN(); IS_BAD_PTR(value); const Isa *isa_object = Isa::Object(isa); IS_VALID(isa_object); return isa_object->GetInfo(attribute, value) ? HSA_STATUS_SUCCESS : HSA_STATUS_ERROR_INVALID_ARGUMENT; CATCH; } hsa_status_t hsa_isa_get_exception_policies( hsa_isa_t isa, hsa_profile_t profile, uint16_t *mask) { TRY; IS_OPEN(); IS_BAD_PROFILE(profile); IS_BAD_PTR(mask); const Isa *isa_object = Isa::Object(isa); IS_VALID(isa_object); // FIXME: update when exception policies are supported. *mask = 0; return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_isa_get_round_method( hsa_isa_t isa, hsa_fp_type_t fp_type, hsa_flush_mode_t flush_mode, hsa_round_method_t *round_method) { TRY; IS_OPEN(); IS_BAD_FP_TYPE(fp_type); IS_BAD_FLUSH_MODE(flush_mode); IS_BAD_PTR(round_method); const Isa *isa_object = Isa::Object(isa); IS_VALID(isa_object); *round_method = isa_object->GetRoundMethod(fp_type, flush_mode); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_wavefront_get_info( hsa_wavefront_t wavefront, hsa_wavefront_info_t attribute, void *value) { TRY; IS_OPEN(); IS_BAD_PTR(value); const Wavefront *wavefront_object = Wavefront::Object(wavefront); if (!wavefront_object) { return HSA_STATUS_ERROR_INVALID_WAVEFRONT; } return wavefront_object->GetInfo(attribute, value) ? HSA_STATUS_SUCCESS : HSA_STATUS_ERROR_INVALID_ARGUMENT; CATCH; } hsa_status_t hsa_isa_iterate_wavefronts( hsa_isa_t isa, hsa_status_t (*callback)(hsa_wavefront_t wavefront, void *data), void *data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); const Isa *isa_object = Isa::Object(isa); IS_VALID(isa_object); const Wavefront &wavefront_object = isa_object->GetWavefront(); return callback(Wavefront::Handle(&wavefront_object), data); CATCH; } /* deprecated */ hsa_status_t hsa_isa_compatible( hsa_isa_t code_object_isa, hsa_isa_t agent_isa, bool *result) { TRY; IS_OPEN(); IS_BAD_PTR(result); const Isa *code_object_isa_object = Isa::Object(code_object_isa); IS_VALID(code_object_isa_object); const Isa *agent_isa_object = Isa::Object(agent_isa); IS_VALID(agent_isa_object); *result = Isa::IsCompatible(*code_object_isa_object, *agent_isa_object); return HSA_STATUS_SUCCESS; CATCH; } //===--- Code Objects (deprecated) ----------------------------------------===// namespace { hsa_status_t IsCodeObjectAllocRegion( hsa_region_t region, void *data) { assert(data); assert(((hsa_region_t*)data)->handle == 0); bool runtime_alloc_allowed = false; hsa_status_t status = HSA::hsa_region_get_info( region, HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED, &runtime_alloc_allowed); if (status != HSA_STATUS_SUCCESS) { return status; } if (runtime_alloc_allowed) { ((hsa_region_t*)data)->handle = region.handle; return HSA_STATUS_INFO_BREAK; } return HSA_STATUS_SUCCESS; } hsa_status_t FindCodeObjectAllocRegionForAgent( hsa_agent_t agent, void *data) { assert(data); assert(((hsa_region_t*)data)->handle == 0); hsa_device_type_t device = HSA_DEVICE_TYPE_CPU; hsa_status_t status = HSA::hsa_agent_get_info( agent, HSA_AGENT_INFO_DEVICE, &device); if (status != HSA_STATUS_SUCCESS) { return status; } if (device == HSA_DEVICE_TYPE_CPU) { return HSA::hsa_agent_iterate_regions(agent, IsCodeObjectAllocRegion, data); } return HSA_STATUS_SUCCESS; } hsa_status_t FindCodeObjectAllocRegion( void *data) { assert(data); assert(((hsa_region_t*)data)->handle == 0); return HSA::hsa_iterate_agents(FindCodeObjectAllocRegionForAgent, data); } amd::hsa::code::AmdHsaCodeManager *GetCodeManager() { return core::Runtime::runtime_singleton_->code_manager(); } } // namespace anonymous /* deprecated */ hsa_status_t hsa_code_object_serialize( hsa_code_object_t code_object, hsa_status_t (*alloc_callback)(size_t size, hsa_callback_data_t data, void **address), hsa_callback_data_t callback_data, const char *options, void **serialized_code_object, size_t *serialized_code_object_size) { TRY; IS_OPEN(); IS_BAD_PTR(alloc_callback); IS_BAD_PTR(serialized_code_object); IS_BAD_PTR(serialized_code_object_size); amd::hsa::code::AmdHsaCode *code = GetCodeManager()->FromHandle(code_object); if (!code) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } hsa_status_t status = alloc_callback( code->ElfSize(), callback_data, serialized_code_object); if (status != HSA_STATUS_SUCCESS) { return status; } assert(*serialized_code_object); memcpy(*serialized_code_object, code->ElfData(), code->ElfSize()); *serialized_code_object_size = code->ElfSize(); return HSA_STATUS_SUCCESS; CATCH; } /* deprecated */ hsa_status_t hsa_code_object_deserialize( void *serialized_code_object, size_t serialized_code_object_size, const char *options, hsa_code_object_t *code_object) { TRY; IS_OPEN(); IS_BAD_PTR(serialized_code_object); IS_BAD_PTR(code_object); if (serialized_code_object_size == 0) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } hsa_region_t code_object_alloc_region = {0}; hsa_status_t status = FindCodeObjectAllocRegion(&code_object_alloc_region); if (status != HSA_STATUS_SUCCESS && status != HSA_STATUS_INFO_BREAK) { return status; } assert(code_object_alloc_region.handle != 0); void *code_object_alloc_data = nullptr; status = HSA::hsa_memory_allocate( code_object_alloc_region, serialized_code_object_size, &code_object_alloc_data); if (status != HSA_STATUS_SUCCESS) { return status; } assert(code_object_alloc_data); memcpy( code_object_alloc_data, serialized_code_object, serialized_code_object_size); code_object->handle = reinterpret_cast(code_object_alloc_data); return HSA_STATUS_SUCCESS; CATCH; } /* deprecated */ hsa_status_t hsa_code_object_destroy( hsa_code_object_t code_object) { TRY; IS_OPEN(); void *code_object_data = reinterpret_cast(code_object.handle); if (!code_object_data) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (!GetCodeManager()->Destroy(code_object)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } HSA::hsa_memory_free(code_object_data); return HSA_STATUS_SUCCESS; CATCH; } static std::string ConvertOldTargetNameToNew( const std::string &OldName, bool IsFinalizer, uint32_t EFlags) { std::string NewName = ""; bool xnack_supported = false; // FIXME #1: Should 9:0:3 be completely (loader, sc, etc.) removed? // FIXME #2: What does PAL do with respect to boltzmann/usual fiji/tonga? if (OldName == "AMD:AMDGPU:6:0:0") NewName = "amdgcn-amd-amdhsa--gfx600"; else if (OldName == "AMD:AMDGPU:6:0:1") NewName = "amdgcn-amd-amdhsa--gfx601"; else if (OldName == "AMD:AMDGPU:6:0:2") NewName = "amdgcn-amd-amdhsa--gfx602"; else if (OldName == "AMD:AMDGPU:7:0:0") NewName = "amdgcn-amd-amdhsa--gfx700"; else if (OldName == "AMD:AMDGPU:7:0:1") NewName = "amdgcn-amd-amdhsa--gfx701"; else if (OldName == "AMD:AMDGPU:7:0:2") NewName = "amdgcn-amd-amdhsa--gfx702"; else if (OldName == "AMD:AMDGPU:7:0:3") NewName = "amdgcn-amd-amdhsa--gfx703"; else if (OldName == "AMD:AMDGPU:7:0:4") NewName = "amdgcn-amd-amdhsa--gfx704"; else if (OldName == "AMD:AMDGPU:7:0:5") NewName = "amdgcn-amd-amdhsa--gfx705"; else if (OldName == "AMD:AMDGPU:8:0:1") { NewName = "amdgcn-amd-amdhsa--gfx801"; xnack_supported = true; } else if (OldName == "AMD:AMDGPU:8:0:0" || OldName == "AMD:AMDGPU:8:0:2") NewName = "amdgcn-amd-amdhsa--gfx802"; else if (OldName == "AMD:AMDGPU:8:0:3" || OldName == "AMD:AMDGPU:8:0:4") NewName = "amdgcn-amd-amdhsa--gfx803"; else if (OldName == "AMD:AMDGPU:8:0:5") NewName = "amdgcn-amd-amdhsa--gfx805"; else if (OldName == "AMD:AMDGPU:8:1:0") { NewName = "amdgcn-amd-amdhsa--gfx810"; xnack_supported = true; } else if (OldName == "AMD:AMDGPU:9:0:0" || OldName == "AMD:AMDGPU:9:0:1") { NewName = "amdgcn-amd-amdhsa--gfx900"; xnack_supported = true; } else if (OldName == "AMD:AMDGPU:9:0:2" || OldName == "AMD:AMDGPU:9:0:3") { NewName = "amdgcn-amd-amdhsa--gfx902"; xnack_supported = true; } else if (OldName == "AMD:AMDGPU:9:0:4" || OldName == "AMD:AMDGPU:9:0:5") { NewName = "amdgcn-amd-amdhsa--gfx904"; xnack_supported = true; } else if (OldName == "AMD:AMDGPU:9:0:6" || OldName == "AMD:AMDGPU:9:0:7") { NewName = "amdgcn-amd-amdhsa--gfx906"; xnack_supported = true; } else if (OldName == "AMD:AMDGPU:9:0:12") { NewName = "amdgcn-amd-amdhsa--gfx90c"; xnack_supported = true; } else { // Code object v2 only supports asics up to gfx906 plus gfx90c. Do NOT add // handling of new asics into this if-else-if* block. return ""; } if (IsFinalizer) { if (EFlags & ELF::EF_AMDGPU_FEATURE_XNACK_V2) NewName = NewName + ":xnack+"; else if (xnack_supported) NewName = NewName + ":xnack-"; } else { if (OldName == "AMD:AMDGPU:8:0:1") NewName = NewName + ":xnack+"; else if (OldName == "AMD:AMDGPU:8:1:0") NewName = NewName + ":xnack+"; else if (OldName == "AMD:AMDGPU:9:0:1") NewName = NewName + ":xnack+"; else if (OldName == "AMD:AMDGPU:9:0:3") NewName = NewName + ":xnack+"; else if (OldName == "AMD:AMDGPU:9:0:5") NewName = NewName + ":xnack+"; else if (OldName == "AMD:AMDGPU:9:0:7") NewName = NewName + ":xnack+"; else if (xnack_supported) NewName = NewName + ":xnack-"; } return NewName; } /* deprecated */ hsa_status_t hsa_code_object_get_info( hsa_code_object_t code_object, hsa_code_object_info_t attribute, void *value) { TRY; IS_OPEN(); IS_BAD_PTR(value); amd::hsa::code::AmdHsaCode *code = GetCodeManager()->FromHandle(code_object); if (!code) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } switch (attribute) { case HSA_CODE_OBJECT_INFO_ISA: { char isa_name[64]; hsa_status_t status = code->GetInfo(attribute, &isa_name); if (status != HSA_STATUS_SUCCESS) { return status; } std::string isa_name_str(isa_name); bool IsFinalizer = true; uint32_t codeHsailMajor; uint32_t codeHsailMinor; hsa_profile_t codeProfile; hsa_machine_model_t codeMachineModel; hsa_default_float_rounding_mode_t codeRoundingMode; if (!code->GetNoteHsail(&codeHsailMajor, &codeHsailMinor, &codeProfile, &codeMachineModel, &codeRoundingMode)) { // Only finalizer generated the "HSAIL" note. IsFinalizer = false; } std::string new_isa_name_str = ConvertOldTargetNameToNew(isa_name_str, IsFinalizer, code->EFlags()); hsa_isa_t isa_handle = {0}; status = HSA::hsa_isa_from_name(new_isa_name_str.c_str(), &isa_handle); if (status != HSA_STATUS_SUCCESS) { return status; } *((hsa_isa_t*)value) = isa_handle; return HSA_STATUS_SUCCESS; } default: { return code->GetInfo(attribute, value); } } CATCH; } /* deprecated */ hsa_status_t hsa_code_object_get_symbol( hsa_code_object_t code_object, const char *symbol_name, hsa_code_symbol_t *symbol) { TRY; IS_OPEN(); IS_BAD_PTR(symbol_name); IS_BAD_PTR(symbol); amd::hsa::code::AmdHsaCode *code = GetCodeManager()->FromHandle(code_object); if (!code) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } return code->GetSymbol(nullptr, symbol_name, symbol); CATCH; } /* deprecated */ hsa_status_t hsa_code_object_get_symbol_from_name( hsa_code_object_t code_object, const char *module_name, const char *symbol_name, hsa_code_symbol_t *symbol) { TRY; IS_OPEN(); IS_BAD_PTR(symbol_name); IS_BAD_PTR(symbol); amd::hsa::code::AmdHsaCode *code = GetCodeManager()->FromHandle(code_object); if (!code) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } return code->GetSymbol(module_name, symbol_name, symbol); CATCH; } /* deprecated */ hsa_status_t hsa_code_symbol_get_info( hsa_code_symbol_t code_symbol, hsa_code_symbol_info_t attribute, void *value) { TRY; IS_OPEN(); IS_BAD_PTR(value); code::Symbol *symbol = code::Symbol::FromHandle(code_symbol); if (!symbol) { return HSA_STATUS_ERROR_INVALID_CODE_SYMBOL; } return symbol->GetInfo(attribute, value); CATCH; } /* deprecated */ hsa_status_t hsa_code_object_iterate_symbols( hsa_code_object_t code_object, hsa_status_t (*callback)(hsa_code_object_t code_object, hsa_code_symbol_t symbol, void *data), void *data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); amd::hsa::code::AmdHsaCode *code = GetCodeManager()->FromHandle(code_object); if (!code) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } return code->IterateSymbols(code_object, callback, data); CATCH; } //===--- Executable -------------------------------------------------------===// using amd::hsa::common::Signed; using amd::hsa::loader::Loader; using amd::hsa::loader::Executable; using amd::hsa::loader::CodeObjectReaderImpl; namespace { Loader *GetLoader() { return core::Runtime::runtime_singleton_->loader(); } } // namespace anonymous hsa_status_t hsa_code_object_reader_create_from_file( hsa_file_t file, hsa_code_object_reader_t *code_object_reader) { TRY; IS_OPEN(); IS_BAD_PTR(code_object_reader); std::unique_ptr reader( new (std::nothrow) CodeObjectReaderImpl()); CHECK_ALLOC(reader); hsa_status_t status = reader->SetFile(file); CHECK_STATUS(status); *code_object_reader = CodeObjectReaderImpl::Handle(reader.release()); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_code_object_reader_create_from_memory( const void *code_object, size_t size, hsa_code_object_reader_t *code_object_reader) { TRY; IS_OPEN(); IS_BAD_PTR(code_object); IS_BAD_PTR(code_object_reader); if (size == 0) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } std::unique_ptr reader( new (std::nothrow) CodeObjectReaderImpl()); CHECK_ALLOC(reader); hsa_status_t status = reader->SetMemory(code_object, size); CHECK_STATUS(status); *code_object_reader = CodeObjectReaderImpl::Handle(reader.release()); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_code_object_reader_destroy( hsa_code_object_reader_t code_object_reader) { TRY; IS_OPEN(); CodeObjectReaderImpl *reader = CodeObjectReaderImpl::Object(code_object_reader); if (!reader) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER; } delete reader; return HSA_STATUS_SUCCESS; CATCH; } /* deprecated */ hsa_status_t hsa_executable_create( hsa_profile_t profile, hsa_executable_state_t executable_state, const char *options, hsa_executable_t *executable) { TRY; IS_OPEN(); IS_BAD_PROFILE(profile); IS_BAD_EXECUTABLE_STATE(executable_state); IS_BAD_PTR(executable); // Invoke non-deprecated API. hsa_status_t status = HSA::hsa_executable_create_alt( profile, HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT, options, executable); if (status != HSA_STATUS_SUCCESS) { return status; } Executable *exec = Executable::Object(*executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } if (executable_state == HSA_EXECUTABLE_STATE_FROZEN) { exec->Freeze(nullptr); } return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_executable_create_alt( hsa_profile_t profile, hsa_default_float_rounding_mode_t default_float_rounding_mode, const char *options, hsa_executable_t *executable) { TRY; IS_OPEN(); IS_BAD_PROFILE(profile); IS_BAD_ROUNDING_MODE(default_float_rounding_mode); // NOTES: should we check // if default float // rounding mode is valid? // spec does not say so. IS_BAD_PTR(executable); Executable *exec = GetLoader()->CreateExecutable( profile, options, default_float_rounding_mode); CHECK_ALLOC(exec); *executable = Executable::Handle(exec); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_executable_destroy( hsa_executable_t executable) { TRY; IS_OPEN(); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } GetLoader()->DestroyExecutable(exec); return HSA_STATUS_SUCCESS; CATCH; } /* deprecated */ hsa_status_t hsa_executable_load_code_object( hsa_executable_t executable, hsa_agent_t agent, hsa_code_object_t code_object, const char *options) { TRY; IS_OPEN(); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } void *code_object_p = reinterpret_cast(code_object.handle); if (!code_object_p) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } CodeObjectReaderImpl reader; reader.SetMemory(code_object_p, amd::elf::ElfSize(code_object_p)); return exec->LoadCodeObject(agent, code_object, options, reader.GetUri()); CATCH; } hsa_status_t hsa_executable_load_program_code_object( hsa_executable_t executable, hsa_code_object_reader_t code_object_reader, const char *options, hsa_loaded_code_object_t *loaded_code_object) { TRY; IS_OPEN(); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } CodeObjectReaderImpl *reader = CodeObjectReaderImpl::Object( code_object_reader); if (!reader) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER; } hsa_code_object_t code_object = {reinterpret_cast(reader->GetCodeObjectMemory())}; return exec->LoadCodeObject( {0}, code_object, options, reader->GetUri(), loaded_code_object); CATCH; } hsa_status_t hsa_executable_load_agent_code_object( hsa_executable_t executable, hsa_agent_t agent, hsa_code_object_reader_t code_object_reader, const char *options, hsa_loaded_code_object_t *loaded_code_object) { TRY; IS_OPEN(); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } CodeObjectReaderImpl *reader = CodeObjectReaderImpl::Object( code_object_reader); if (!reader) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER; } hsa_code_object_t code_object = {reinterpret_cast(reader->GetCodeObjectMemory())}; return exec->LoadCodeObject( agent, code_object, options, reader->GetUri(), loaded_code_object); CATCH; } hsa_status_t hsa_executable_freeze( hsa_executable_t executable, const char *options) { TRY; IS_OPEN(); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return GetLoader()->FreezeExecutable(exec, options); CATCH; } hsa_status_t hsa_executable_get_info( hsa_executable_t executable, hsa_executable_info_t attribute, void *value) { TRY; IS_OPEN(); IS_BAD_PTR(value); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->GetInfo(attribute, value); CATCH; } hsa_status_t hsa_executable_global_variable_define( hsa_executable_t executable, const char *variable_name, void *address) { TRY; IS_OPEN(); IS_BAD_PTR(variable_name); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->DefineProgramExternalVariable(variable_name, address); CATCH; } hsa_status_t hsa_executable_agent_global_variable_define( hsa_executable_t executable, hsa_agent_t agent, const char *variable_name, void *address) { TRY; IS_OPEN(); IS_BAD_PTR(variable_name); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->DefineAgentExternalVariable( variable_name, agent, HSA_VARIABLE_SEGMENT_GLOBAL, address); CATCH; } hsa_status_t hsa_executable_readonly_variable_define( hsa_executable_t executable, hsa_agent_t agent, const char *variable_name, void *address) { TRY; IS_OPEN(); IS_BAD_PTR(variable_name); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->DefineAgentExternalVariable( variable_name, agent, HSA_VARIABLE_SEGMENT_READONLY, address); CATCH; } hsa_status_t hsa_executable_validate( hsa_executable_t executable, uint32_t *result) { TRY; IS_OPEN(); IS_BAD_PTR(result); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->Validate(result); CATCH; } hsa_status_t hsa_executable_validate_alt( hsa_executable_t executable, const char *options, uint32_t *result) { TRY; IS_OPEN(); IS_BAD_PTR(result); return HSA::hsa_executable_validate(executable, result); CATCH; } /* deprecated */ hsa_status_t hsa_executable_get_symbol( hsa_executable_t executable, const char *module_name, const char *symbol_name, hsa_agent_t agent, int32_t call_convention, hsa_executable_symbol_t *symbol) { TRY; IS_OPEN(); IS_BAD_PTR(symbol_name); IS_BAD_PTR(symbol); std::string mangled_name(symbol_name); if (mangled_name.empty()) { return HSA_STATUS_ERROR_INVALID_SYMBOL_NAME; } if (module_name && !std::string(module_name).empty()) { mangled_name.insert(0, "::"); mangled_name.insert(0, std::string(module_name)); } Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } // Invoke non-deprecated API. return HSA::hsa_executable_get_symbol_by_name( executable, mangled_name.c_str(), exec->IsProgramSymbol(mangled_name.c_str()) ? nullptr : &agent, symbol); CATCH; } hsa_status_t hsa_executable_get_symbol_by_name( hsa_executable_t executable, const char *symbol_name, const hsa_agent_t *agent, // NOTES: this is not consistent with the rest of // of the specification, but seems like a better // approach to distinguish program/agent symbols. hsa_executable_symbol_t *symbol) { TRY; IS_OPEN(); IS_BAD_PTR(symbol_name); IS_BAD_PTR(symbol); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } loader::Symbol *sym = exec->GetSymbol(symbol_name, agent); if (!sym) { return HSA_STATUS_ERROR_INVALID_SYMBOL_NAME; } *symbol = loader::Symbol::Handle(sym); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_executable_symbol_get_info( hsa_executable_symbol_t executable_symbol, hsa_executable_symbol_info_t attribute, void *value) { TRY; IS_OPEN(); IS_BAD_PTR(value); loader::Symbol *sym = loader::Symbol::Object(executable_symbol); if (!sym) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE_SYMBOL; } return sym->GetInfo(attribute, value) ? HSA_STATUS_SUCCESS : HSA_STATUS_ERROR_INVALID_ARGUMENT; CATCH; } /* deprecated */ hsa_status_t hsa_executable_iterate_symbols( hsa_executable_t executable, hsa_status_t (*callback)(hsa_executable_t executable, hsa_executable_symbol_t symbol, void *data), void *data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->IterateSymbols(callback, data); CATCH; } hsa_status_t hsa_executable_iterate_agent_symbols( hsa_executable_t executable, hsa_agent_t agent, hsa_status_t (*callback)(hsa_executable_t exec, hsa_agent_t agent, hsa_executable_symbol_t symbol, void *data), void *data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); // NOTES: should we check if agent is valid? spec does not say so. const core::Agent *agent_object = core::Agent::Convert(agent); IS_VALID(agent_object); Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->IterateAgentSymbols(agent, callback, data); CATCH; } hsa_status_t hsa_executable_iterate_program_symbols( hsa_executable_t executable, hsa_status_t (*callback)(hsa_executable_t exec, hsa_executable_symbol_t symbol, void *data), void *data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); amd::hsa::loader::Executable *exec = amd::hsa::loader::Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->IterateProgramSymbols(callback, data); CATCH; } //===--- Runtime Notifications --------------------------------------------===// hsa_status_t hsa_status_string( hsa_status_t status, const char **status_string) { IS_BAD_PTR(status_string); const size_t status_u = static_cast(status); switch (status_u) { case HSA_STATUS_SUCCESS: *status_string = "HSA_STATUS_SUCCESS: The function has been executed successfully."; break; case HSA_STATUS_INFO_BREAK: *status_string = "HSA_STATUS_INFO_BREAK: A traversal over a list of elements has been interrupted by the " "application before completing."; break; case HSA_STATUS_ERROR: *status_string = "HSA_STATUS_ERROR: A generic error has occurred."; break; case HSA_STATUS_ERROR_INVALID_ARGUMENT: *status_string = "HSA_STATUS_ERROR_INVALID_ARGUMENT: One of the actual arguments does not meet a " "precondition stated in the documentation of the corresponding formal argument."; break; case HSA_STATUS_ERROR_INVALID_QUEUE_CREATION: *status_string = "HSA_STATUS_ERROR_INVALID_QUEUE_CREATION: The requested queue creation is not valid."; break; case HSA_STATUS_ERROR_INVALID_ALLOCATION: *status_string = "HSA_STATUS_ERROR_INVALID_ALLOCATION: The requested allocation is not valid."; break; case HSA_STATUS_ERROR_INVALID_AGENT: *status_string = "HSA_STATUS_ERROR_INVALID_AGENT: The agent is invalid."; break; case HSA_STATUS_ERROR_INVALID_REGION: *status_string = "HSA_STATUS_ERROR_INVALID_REGION: The memory region is invalid."; break; case HSA_STATUS_ERROR_INVALID_SIGNAL: *status_string = "HSA_STATUS_ERROR_INVALID_SIGNAL: The signal is invalid."; break; case HSA_STATUS_ERROR_INVALID_QUEUE: *status_string = "HSA_STATUS_ERROR_INVALID_QUEUE: The queue is invalid."; break; case HSA_STATUS_ERROR_OUT_OF_RESOURCES: *status_string = "HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary " "resources. This error may also occur when the core runtime library needs to spawn " "threads or create internal OS-specific events."; break; case HSA_STATUS_ERROR_INVALID_PACKET_FORMAT: *status_string = "HSA_STATUS_ERROR_INVALID_PACKET_FORMAT: The AQL packet is malformed."; break; case HSA_STATUS_ERROR_RESOURCE_FREE: *status_string = "HSA_STATUS_ERROR_RESOURCE_FREE: An error has been detected while releasing a resource."; break; case HSA_STATUS_ERROR_NOT_INITIALIZED: *status_string = "HSA_STATUS_ERROR_NOT_INITIALIZED: An API other than hsa_init has been invoked while the " "reference count of the HSA runtime is zero."; break; case HSA_STATUS_ERROR_REFCOUNT_OVERFLOW: *status_string = "HSA_STATUS_ERROR_REFCOUNT_OVERFLOW: The maximum reference count for the object has been " "reached."; break; case HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS: *status_string = "HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS: The arguments passed to a functions are not " "compatible."; break; case HSA_STATUS_ERROR_INVALID_INDEX: *status_string = "HSA_STATUS_ERROR_INVALID_INDEX: The index is invalid."; break; case HSA_STATUS_ERROR_INVALID_ISA: *status_string = "HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid."; break; case HSA_STATUS_ERROR_INVALID_ISA_NAME: *status_string = "HSA_STATUS_ERROR_INVALID_ISA_NAME: The instruction set architecture name is invalid."; break; case HSA_STATUS_ERROR_INVALID_CODE_OBJECT: *status_string = "HSA_STATUS_ERROR_INVALID_CODE_OBJECT: The code object is invalid."; break; case HSA_STATUS_ERROR_INVALID_EXECUTABLE: *status_string = "HSA_STATUS_ERROR_INVALID_EXECUTABLE: The executable is invalid."; break; case HSA_STATUS_ERROR_FROZEN_EXECUTABLE: *status_string = "HSA_STATUS_ERROR_FROZEN_EXECUTABLE: The executable is frozen."; break; case HSA_STATUS_ERROR_INVALID_SYMBOL_NAME: *status_string = "HSA_STATUS_ERROR_INVALID_SYMBOL_NAME: There is no symbol with the given name."; break; case HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED: *status_string = "HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED: The variable is already defined."; break; case HSA_STATUS_ERROR_VARIABLE_UNDEFINED: *status_string = "HSA_STATUS_ERROR_VARIABLE_UNDEFINED: The variable is undefined."; break; case HSA_STATUS_ERROR_EXCEPTION: *status_string = "HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception."; break; case HSA_STATUS_ERROR_INVALID_CODE_SYMBOL: *status_string = "HSA_STATUS_ERROR_INVALID_CODE_SYMBOL: The code object symbol is invalid."; break; case HSA_STATUS_ERROR_INVALID_EXECUTABLE_SYMBOL: *status_string = "HSA_STATUS_ERROR_INVALID_EXECUTABLE_SYMBOL: The executable symbol is invalid."; break; case HSA_STATUS_ERROR_INVALID_FILE: *status_string = "HSA_STATUS_ERROR_INVALID_FILE: The file descriptor is invalid."; break; case HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER: *status_string = "HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER: The code object reader is invalid."; break; case HSA_STATUS_ERROR_INVALID_CACHE: *status_string = "HSA_STATUS_ERROR_INVALID_CACHE: The cache is invalid."; break; case HSA_STATUS_ERROR_INVALID_WAVEFRONT: *status_string = "HSA_STATUS_ERROR_INVALID_WAVEFRONT: The wavefront is invalid."; break; case HSA_STATUS_ERROR_INVALID_SIGNAL_GROUP: *status_string = "HSA_STATUS_ERROR_INVALID_SIGNAL_GROUP: The signal group is invalid."; break; case HSA_STATUS_ERROR_INVALID_RUNTIME_STATE: *status_string = "HSA_STATUS_ERROR_INVALID_RUNTIME_STATE: The HSA runtime is not in the configuration " "state."; break; case HSA_STATUS_ERROR_FATAL: *status_string = "HSA_STATUS_ERROR_FATAL: The queue received an error that may require process " "termination."; break; case HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: *status_string = "HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond " "the largest legal address."; break; case HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION: *status_string = "HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION: The agent attempted to execute an illegal shader " "instruction."; break; case HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED: *status_string = "HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED: Image format is not supported."; break; case HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED: *status_string = "HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED: Image size is not supported."; break; case HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED: *status_string = "HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED: Image pitch is not supported or invalid."; break; case HSA_EXT_STATUS_ERROR_SAMPLER_DESCRIPTOR_UNSUPPORTED: *status_string = "HSA_EXT_STATUS_ERROR_SAMPLER_DESCRIPTOR_UNSUPPORTED: Sampler descriptor is not " "supported or invalid."; break; case HSA_EXT_STATUS_ERROR_INVALID_PROGRAM: *status_string = "HSA_EXT_STATUS_ERROR_INVALID_PROGRAM: Invalid program"; break; case HSA_EXT_STATUS_ERROR_INVALID_MODULE: *status_string = "HSA_EXT_STATUS_ERROR_INVALID_MODULE: Invalid module"; break; case HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE: *status_string = "HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE: Incompatible module"; break; case HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED: *status_string = "HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED: Module already included"; break; case HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH: *status_string = "HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH: Symbol mismatch"; break; case HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED: *status_string = "HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED: Finalization failed"; break; case HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH: *status_string = "HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH: Directive mismatch"; break; case HSA_STATUS_ERROR_MEMORY_FAULT: *status_string = "HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address."; break; case HSA_STATUS_ERROR_INVALID_MEMORY_POOL: *status_string = "HSA_STATUS_ERROR_INVALID_MEMORY_POOL: The memory pool is invalid."; break; case HSA_STATUS_CU_MASK_REDUCED: *status_string = "HSA_STATUS_CU_MASK_REDUCED: The CU mask was successfully set but the mask attempted to " "enable a CU which was disabled for the process. CUs disabled for the process remain " "disabled."; break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; } } // namespace HSA } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/hsa_api_trace.cpp000066400000000000000000000510071420110115200233160ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/hsa_api_trace_int.h" #include "core/inc/runtime.h" #include "core/inc/hsa_ext_amd_impl.h" #include "core/inc/hsa_table_interface.h" #include // Tools only APIs. namespace rocr { namespace AMD { hsa_status_t hsa_amd_queue_intercept_register(hsa_queue_t* queue, hsa_amd_queue_intercept_handler callback, void* user_data); hsa_status_t hsa_amd_queue_intercept_create( hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type, void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data, uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue); hsa_status_t hsa_amd_runtime_queue_create_register(hsa_amd_runtime_queue_notifier callback, void* user_data); } // namespace amd namespace core { HsaApiTable hsa_api_table_; HsaApiTable hsa_internal_api_table_; HsaApiTable::HsaApiTable() { Init(); } // Initialize member fields for Hsa Core and Amd Extension Api's // Member fields for Finalizer and Image extensions will be // updated as part of Hsa Runtime initialization. void HsaApiTable::Init() { // Initialize Version of Api Table hsa_api.version.major_id = HSA_API_TABLE_MAJOR_VERSION; hsa_api.version.minor_id = sizeof(::HsaApiTable); hsa_api.version.step_id = HSA_API_TABLE_STEP_VERSION; // Update Api table for Core and its major id UpdateCore(); hsa_api.core_ = &core_api; // Update Api table for Amd Extensions and its major id UpdateAmdExts(); hsa_api.amd_ext_ = &amd_ext_api; // Initialize Api tables for Finalizer, Image to NULL // The tables are initialized as part // of Hsa Runtime initialization, including their major ids hsa_api.finalizer_ext_ = NULL; hsa_api.image_ext_ = NULL; } void HsaApiTable::Reset() { Init(); } void HsaApiTable::CloneExts(void* ext_table, uint32_t table_id) { assert(ext_table != NULL && "Invalid extension table linked."); // Update HSA Extension Finalizer Api table if (table_id == HSA_EXT_FINALIZER_API_TABLE_ID) { finalizer_api = *reinterpret_cast(ext_table); hsa_api.finalizer_ext_ = &finalizer_api; return; } // Update HSA Extension Image Api table if (table_id == HSA_EXT_IMAGE_API_TABLE_ID) { image_api = *reinterpret_cast(ext_table); hsa_api.image_ext_ = &image_api; return; } } void HsaApiTable::LinkExts(void* ext_table, uint32_t table_id) { assert(ext_table != NULL && "Invalid extension table linked."); // Update HSA Extension Finalizer Api table if (table_id == HSA_EXT_FINALIZER_API_TABLE_ID) { finalizer_api = *reinterpret_cast(ext_table); hsa_api.finalizer_ext_ = reinterpret_cast(ext_table); return; } // Update HSA Extension Image Api table if (table_id == HSA_EXT_IMAGE_API_TABLE_ID) { image_api = *reinterpret_cast(ext_table); hsa_api.image_ext_ = reinterpret_cast(ext_table); return; } } // Update Api table for Hsa Core Runtime void HsaApiTable::UpdateCore() { // Initialize Version of Api Table core_api.version.major_id = HSA_CORE_API_TABLE_MAJOR_VERSION; core_api.version.minor_id = sizeof(::CoreApiTable); core_api.version.step_id = HSA_CORE_API_TABLE_STEP_VERSION; // Initialize function pointers for Hsa Core Runtime Api's core_api.hsa_init_fn = HSA::hsa_init; core_api.hsa_shut_down_fn = HSA::hsa_shut_down; core_api.hsa_system_get_info_fn = HSA::hsa_system_get_info; core_api.hsa_system_extension_supported_fn = HSA::hsa_system_extension_supported; core_api.hsa_system_get_extension_table_fn = HSA::hsa_system_get_extension_table; core_api.hsa_iterate_agents_fn = HSA::hsa_iterate_agents; core_api.hsa_agent_get_info_fn = HSA::hsa_agent_get_info; core_api.hsa_agent_get_exception_policies_fn = HSA::hsa_agent_get_exception_policies; core_api.hsa_agent_extension_supported_fn = HSA::hsa_agent_extension_supported; core_api.hsa_queue_create_fn = HSA::hsa_queue_create; core_api.hsa_soft_queue_create_fn = HSA::hsa_soft_queue_create; core_api.hsa_queue_destroy_fn = HSA::hsa_queue_destroy; core_api.hsa_queue_inactivate_fn = HSA::hsa_queue_inactivate; core_api.hsa_queue_load_read_index_scacquire_fn = HSA::hsa_queue_load_read_index_scacquire; core_api.hsa_queue_load_read_index_relaxed_fn = HSA::hsa_queue_load_read_index_relaxed; core_api.hsa_queue_load_write_index_scacquire_fn = HSA::hsa_queue_load_write_index_scacquire; core_api.hsa_queue_load_write_index_relaxed_fn = HSA::hsa_queue_load_write_index_relaxed; core_api.hsa_queue_store_write_index_relaxed_fn = HSA::hsa_queue_store_write_index_relaxed; core_api.hsa_queue_store_write_index_screlease_fn = HSA::hsa_queue_store_write_index_screlease; core_api.hsa_queue_cas_write_index_scacq_screl_fn = HSA::hsa_queue_cas_write_index_scacq_screl; core_api.hsa_queue_cas_write_index_scacquire_fn = HSA::hsa_queue_cas_write_index_scacquire; core_api.hsa_queue_cas_write_index_relaxed_fn = HSA::hsa_queue_cas_write_index_relaxed; core_api.hsa_queue_cas_write_index_screlease_fn = HSA::hsa_queue_cas_write_index_screlease; core_api.hsa_queue_add_write_index_scacq_screl_fn = HSA::hsa_queue_add_write_index_scacq_screl; core_api.hsa_queue_add_write_index_scacquire_fn = HSA::hsa_queue_add_write_index_scacquire; core_api.hsa_queue_add_write_index_relaxed_fn = HSA::hsa_queue_add_write_index_relaxed; core_api.hsa_queue_add_write_index_screlease_fn = HSA::hsa_queue_add_write_index_screlease; core_api.hsa_queue_store_read_index_relaxed_fn = HSA::hsa_queue_store_read_index_relaxed; core_api.hsa_queue_store_read_index_screlease_fn = HSA::hsa_queue_store_read_index_screlease; core_api.hsa_agent_iterate_regions_fn = HSA::hsa_agent_iterate_regions; core_api.hsa_region_get_info_fn = HSA::hsa_region_get_info; core_api.hsa_memory_register_fn = HSA::hsa_memory_register; core_api.hsa_memory_deregister_fn = HSA::hsa_memory_deregister; core_api.hsa_memory_allocate_fn = HSA::hsa_memory_allocate; core_api.hsa_memory_free_fn = HSA::hsa_memory_free; core_api.hsa_memory_copy_fn = HSA::hsa_memory_copy; core_api.hsa_memory_assign_agent_fn = HSA::hsa_memory_assign_agent; core_api.hsa_signal_create_fn = HSA::hsa_signal_create; core_api.hsa_signal_destroy_fn = HSA::hsa_signal_destroy; core_api.hsa_signal_load_relaxed_fn = HSA::hsa_signal_load_relaxed; core_api.hsa_signal_load_scacquire_fn = HSA::hsa_signal_load_scacquire; core_api.hsa_signal_store_relaxed_fn = HSA::hsa_signal_store_relaxed; core_api.hsa_signal_store_screlease_fn = HSA::hsa_signal_store_screlease; core_api.hsa_signal_wait_relaxed_fn = HSA::hsa_signal_wait_relaxed; core_api.hsa_signal_wait_scacquire_fn = HSA::hsa_signal_wait_scacquire; core_api.hsa_signal_and_relaxed_fn = HSA::hsa_signal_and_relaxed; core_api.hsa_signal_and_scacquire_fn = HSA::hsa_signal_and_scacquire; core_api.hsa_signal_and_screlease_fn = HSA::hsa_signal_and_screlease; core_api.hsa_signal_and_scacq_screl_fn = HSA::hsa_signal_and_scacq_screl; core_api.hsa_signal_or_relaxed_fn = HSA::hsa_signal_or_relaxed; core_api.hsa_signal_or_scacquire_fn = HSA::hsa_signal_or_scacquire; core_api.hsa_signal_or_screlease_fn = HSA::hsa_signal_or_screlease; core_api.hsa_signal_or_scacq_screl_fn = HSA::hsa_signal_or_scacq_screl; core_api.hsa_signal_xor_relaxed_fn = HSA::hsa_signal_xor_relaxed; core_api.hsa_signal_xor_scacquire_fn = HSA::hsa_signal_xor_scacquire; core_api.hsa_signal_xor_screlease_fn = HSA::hsa_signal_xor_screlease; core_api.hsa_signal_xor_scacq_screl_fn = HSA::hsa_signal_xor_scacq_screl; core_api.hsa_signal_exchange_relaxed_fn = HSA::hsa_signal_exchange_relaxed; core_api.hsa_signal_exchange_scacquire_fn = HSA::hsa_signal_exchange_scacquire; core_api.hsa_signal_exchange_screlease_fn = HSA::hsa_signal_exchange_screlease; core_api.hsa_signal_exchange_scacq_screl_fn = HSA::hsa_signal_exchange_scacq_screl; core_api.hsa_signal_add_relaxed_fn = HSA::hsa_signal_add_relaxed; core_api.hsa_signal_add_scacquire_fn = HSA::hsa_signal_add_scacquire; core_api.hsa_signal_add_screlease_fn = HSA::hsa_signal_add_screlease; core_api.hsa_signal_add_scacq_screl_fn = HSA::hsa_signal_add_scacq_screl; core_api.hsa_signal_subtract_relaxed_fn = HSA::hsa_signal_subtract_relaxed; core_api.hsa_signal_subtract_scacquire_fn = HSA::hsa_signal_subtract_scacquire; core_api.hsa_signal_subtract_screlease_fn = HSA::hsa_signal_subtract_screlease; core_api.hsa_signal_subtract_scacq_screl_fn = HSA::hsa_signal_subtract_scacq_screl; core_api.hsa_signal_cas_relaxed_fn = HSA::hsa_signal_cas_relaxed; core_api.hsa_signal_cas_scacquire_fn = HSA::hsa_signal_cas_scacquire; core_api.hsa_signal_cas_screlease_fn = HSA::hsa_signal_cas_screlease; core_api.hsa_signal_cas_scacq_screl_fn = HSA::hsa_signal_cas_scacq_screl; //===--- Instruction Set Architecture -----------------------------------===// core_api.hsa_isa_from_name_fn = HSA::hsa_isa_from_name; // Deprecated since v1.1. core_api.hsa_isa_get_info_fn = HSA::hsa_isa_get_info; // Deprecated since v1.1. core_api.hsa_isa_compatible_fn = HSA::hsa_isa_compatible; //===--- Code Objects (deprecated) --------------------------------------===// // Deprecated since v1.1. core_api.hsa_code_object_serialize_fn = HSA::hsa_code_object_serialize; // Deprecated since v1.1. core_api.hsa_code_object_deserialize_fn = HSA::hsa_code_object_deserialize; // Deprecated since v1.1. core_api.hsa_code_object_destroy_fn = HSA::hsa_code_object_destroy; // Deprecated since v1.1. core_api.hsa_code_object_get_info_fn = HSA::hsa_code_object_get_info; // Deprecated since v1.1. core_api.hsa_code_object_get_symbol_fn = HSA::hsa_code_object_get_symbol; // Deprecated since v1.1. core_api.hsa_code_symbol_get_info_fn = HSA::hsa_code_symbol_get_info; // Deprecated since v1.1. core_api.hsa_code_object_iterate_symbols_fn = HSA::hsa_code_object_iterate_symbols; //===--- Executable -----------------------------------------------------===// // Deprecated since v1.1. core_api.hsa_executable_create_fn = HSA::hsa_executable_create; core_api.hsa_executable_destroy_fn = HSA::hsa_executable_destroy; // Deprecated since v1.1. core_api.hsa_executable_load_code_object_fn = HSA::hsa_executable_load_code_object; core_api.hsa_executable_freeze_fn = HSA::hsa_executable_freeze; core_api.hsa_executable_get_info_fn = HSA::hsa_executable_get_info; core_api.hsa_executable_global_variable_define_fn = HSA::hsa_executable_global_variable_define; core_api.hsa_executable_agent_global_variable_define_fn = HSA::hsa_executable_agent_global_variable_define; core_api.hsa_executable_readonly_variable_define_fn = HSA::hsa_executable_readonly_variable_define; core_api.hsa_executable_validate_fn = HSA::hsa_executable_validate; // Deprecated since v1.1. core_api.hsa_executable_get_symbol_fn = HSA::hsa_executable_get_symbol; core_api.hsa_executable_symbol_get_info_fn = HSA::hsa_executable_symbol_get_info; // Deprecated since v1.1. core_api.hsa_executable_iterate_symbols_fn = HSA::hsa_executable_iterate_symbols; //===--- Runtime Notifications ------------------------------------------===// core_api.hsa_status_string_fn = HSA::hsa_status_string; // Start HSA v1.1 additions core_api.hsa_extension_get_name_fn = HSA::hsa_extension_get_name; core_api.hsa_system_major_extension_supported_fn = HSA::hsa_system_major_extension_supported; core_api.hsa_system_get_major_extension_table_fn = HSA::hsa_system_get_major_extension_table; core_api.hsa_agent_major_extension_supported_fn = HSA::hsa_agent_major_extension_supported; core_api.hsa_cache_get_info_fn = HSA::hsa_cache_get_info; core_api.hsa_agent_iterate_caches_fn = HSA::hsa_agent_iterate_caches; // Silent store optimization is present in all signal ops when no agents are sleeping. core_api.hsa_signal_silent_store_relaxed_fn = HSA::hsa_signal_store_relaxed; core_api.hsa_signal_silent_store_screlease_fn = HSA::hsa_signal_store_screlease; core_api.hsa_signal_group_create_fn = HSA::hsa_signal_group_create; core_api.hsa_signal_group_destroy_fn = HSA::hsa_signal_group_destroy; core_api.hsa_signal_group_wait_any_scacquire_fn = HSA::hsa_signal_group_wait_any_scacquire; core_api.hsa_signal_group_wait_any_relaxed_fn = HSA::hsa_signal_group_wait_any_relaxed; //===--- Instruction Set Architecture - HSA v1.1 additions --------------===// core_api.hsa_agent_iterate_isas_fn = HSA::hsa_agent_iterate_isas; core_api.hsa_isa_get_info_alt_fn = HSA::hsa_isa_get_info_alt; core_api.hsa_isa_get_exception_policies_fn = HSA::hsa_isa_get_exception_policies; core_api.hsa_isa_get_round_method_fn = HSA::hsa_isa_get_round_method; core_api.hsa_wavefront_get_info_fn = HSA::hsa_wavefront_get_info; core_api.hsa_isa_iterate_wavefronts_fn = HSA::hsa_isa_iterate_wavefronts; //===--- Code Objects (deprecated) - HSA v1.1 additions -----------------===// // Deprecated since v1.1. core_api.hsa_code_object_get_symbol_from_name_fn = HSA::hsa_code_object_get_symbol_from_name; //===--- Executable - HSA v1.1 additions --------------------------------===// core_api.hsa_code_object_reader_create_from_file_fn = HSA::hsa_code_object_reader_create_from_file; core_api.hsa_code_object_reader_create_from_memory_fn = HSA::hsa_code_object_reader_create_from_memory; core_api.hsa_code_object_reader_destroy_fn = HSA::hsa_code_object_reader_destroy; core_api.hsa_executable_create_alt_fn = HSA::hsa_executable_create_alt; core_api.hsa_executable_load_program_code_object_fn = HSA::hsa_executable_load_program_code_object; core_api.hsa_executable_load_agent_code_object_fn = HSA::hsa_executable_load_agent_code_object; core_api.hsa_executable_validate_alt_fn = HSA::hsa_executable_validate_alt; core_api.hsa_executable_get_symbol_by_name_fn = HSA::hsa_executable_get_symbol_by_name; core_api.hsa_executable_iterate_agent_symbols_fn = HSA::hsa_executable_iterate_agent_symbols; core_api.hsa_executable_iterate_program_symbols_fn = HSA::hsa_executable_iterate_program_symbols; } // Update Api table for Amd Extensions. // @note: Current implementation will initialize the // member variable hsa_amd_image_create_fn while loading // Image extension library void HsaApiTable::UpdateAmdExts() { // Initialize Version of Api Table amd_ext_api.version.major_id = HSA_AMD_EXT_API_TABLE_MAJOR_VERSION; amd_ext_api.version.minor_id = sizeof(::AmdExtTable); amd_ext_api.version.step_id = HSA_AMD_EXT_API_TABLE_STEP_VERSION; // Initialize function pointers for Amd Extension Api's amd_ext_api.hsa_amd_coherency_get_type_fn = AMD::hsa_amd_coherency_get_type; amd_ext_api.hsa_amd_coherency_set_type_fn = AMD::hsa_amd_coherency_set_type; amd_ext_api.hsa_amd_profiling_set_profiler_enabled_fn = AMD::hsa_amd_profiling_set_profiler_enabled; amd_ext_api.hsa_amd_profiling_async_copy_enable_fn = AMD::hsa_amd_profiling_async_copy_enable; amd_ext_api.hsa_amd_profiling_get_dispatch_time_fn = AMD::hsa_amd_profiling_get_dispatch_time; amd_ext_api.hsa_amd_profiling_get_async_copy_time_fn = AMD::hsa_amd_profiling_get_async_copy_time; amd_ext_api.hsa_amd_profiling_convert_tick_to_system_domain_fn = AMD::hsa_amd_profiling_convert_tick_to_system_domain; amd_ext_api.hsa_amd_signal_async_handler_fn = AMD::hsa_amd_signal_async_handler; amd_ext_api.hsa_amd_async_function_fn = AMD::hsa_amd_async_function; amd_ext_api.hsa_amd_signal_wait_any_fn = AMD::hsa_amd_signal_wait_any; amd_ext_api.hsa_amd_queue_cu_set_mask_fn = AMD::hsa_amd_queue_cu_set_mask; amd_ext_api.hsa_amd_queue_cu_get_mask_fn = AMD::hsa_amd_queue_cu_get_mask; amd_ext_api.hsa_amd_memory_pool_get_info_fn = AMD::hsa_amd_memory_pool_get_info; amd_ext_api.hsa_amd_agent_iterate_memory_pools_fn = AMD::hsa_amd_agent_iterate_memory_pools; amd_ext_api.hsa_amd_memory_pool_allocate_fn = AMD::hsa_amd_memory_pool_allocate; amd_ext_api.hsa_amd_memory_pool_free_fn = AMD::hsa_amd_memory_pool_free; amd_ext_api.hsa_amd_memory_async_copy_fn = AMD::hsa_amd_memory_async_copy; amd_ext_api.hsa_amd_agent_memory_pool_get_info_fn = AMD::hsa_amd_agent_memory_pool_get_info; amd_ext_api.hsa_amd_agents_allow_access_fn = AMD::hsa_amd_agents_allow_access; amd_ext_api.hsa_amd_memory_pool_can_migrate_fn = AMD::hsa_amd_memory_pool_can_migrate; amd_ext_api.hsa_amd_memory_migrate_fn = AMD::hsa_amd_memory_migrate; amd_ext_api.hsa_amd_memory_lock_fn = AMD::hsa_amd_memory_lock; amd_ext_api.hsa_amd_memory_unlock_fn = AMD::hsa_amd_memory_unlock; amd_ext_api.hsa_amd_memory_fill_fn = AMD::hsa_amd_memory_fill; amd_ext_api.hsa_amd_interop_map_buffer_fn = AMD::hsa_amd_interop_map_buffer; amd_ext_api.hsa_amd_interop_unmap_buffer_fn = AMD::hsa_amd_interop_unmap_buffer; amd_ext_api.hsa_amd_pointer_info_fn = AMD::hsa_amd_pointer_info; amd_ext_api.hsa_amd_pointer_info_set_userdata_fn = AMD::hsa_amd_pointer_info_set_userdata; amd_ext_api.hsa_amd_ipc_memory_create_fn = AMD::hsa_amd_ipc_memory_create; amd_ext_api.hsa_amd_ipc_memory_attach_fn = AMD::hsa_amd_ipc_memory_attach; amd_ext_api.hsa_amd_ipc_memory_detach_fn = AMD::hsa_amd_ipc_memory_detach; amd_ext_api.hsa_amd_signal_create_fn = AMD::hsa_amd_signal_create; amd_ext_api.hsa_amd_ipc_signal_create_fn = AMD::hsa_amd_ipc_signal_create; amd_ext_api.hsa_amd_ipc_signal_attach_fn = AMD::hsa_amd_ipc_signal_attach; amd_ext_api.hsa_amd_register_system_event_handler_fn = AMD::hsa_amd_register_system_event_handler; amd_ext_api.hsa_amd_queue_intercept_create_fn = AMD::hsa_amd_queue_intercept_create; amd_ext_api.hsa_amd_queue_intercept_register_fn = AMD::hsa_amd_queue_intercept_register; amd_ext_api.hsa_amd_queue_set_priority_fn = AMD::hsa_amd_queue_set_priority; amd_ext_api.hsa_amd_memory_async_copy_rect_fn = AMD::hsa_amd_memory_async_copy_rect; amd_ext_api.hsa_amd_runtime_queue_create_register_fn = AMD::hsa_amd_runtime_queue_create_register; amd_ext_api.hsa_amd_memory_lock_to_pool_fn = AMD::hsa_amd_memory_lock_to_pool; amd_ext_api.hsa_amd_register_deallocation_callback_fn = AMD::hsa_amd_register_deallocation_callback; amd_ext_api.hsa_amd_deregister_deallocation_callback_fn = AMD::hsa_amd_deregister_deallocation_callback; amd_ext_api.hsa_amd_signal_value_pointer_fn = AMD::hsa_amd_signal_value_pointer; amd_ext_api.hsa_amd_svm_attributes_set_fn = AMD::hsa_amd_svm_attributes_set; amd_ext_api.hsa_amd_svm_attributes_get_fn = AMD::hsa_amd_svm_attributes_get; amd_ext_api.hsa_amd_svm_prefetch_async_fn = AMD::hsa_amd_svm_prefetch_async; } void LoadInitialHsaApiTable() { hsa_table_interface_init(&hsa_api_table_.hsa_api); } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/hsa_ext_amd.cpp000066400000000000000000001025611420110115200230120ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include #include #include #include #include #include #include #include #include "core/inc/runtime.h" #include "core/inc/agent.h" #include "core/inc/amd_cpu_agent.h" #include "core/inc/amd_gpu_agent.h" #include "core/inc/amd_memory_region.h" #include "core/inc/signal.h" #include "core/inc/default_signal.h" #include "core/inc/interrupt_signal.h" #include "core/inc/ipc_signal.h" #include "core/inc/intercept_queue.h" #include "core/inc/exceptions.h" namespace rocr { template struct ValidityError; template <> struct ValidityError { enum { value = HSA_STATUS_ERROR_INVALID_SIGNAL }; }; template <> struct ValidityError { enum { value = HSA_STATUS_ERROR_INVALID_AGENT }; }; template <> struct ValidityError { enum { value = HSA_STATUS_ERROR_INVALID_REGION }; }; template <> struct ValidityError { enum { value = HSA_STATUS_ERROR_INVALID_REGION }; }; template <> struct ValidityError { enum { value = HSA_STATUS_ERROR_INVALID_QUEUE }; }; template struct ValidityError { enum { value = ValidityError::value }; }; #define IS_BAD_PTR(ptr) \ do { \ if ((ptr) == NULL) return HSA_STATUS_ERROR_INVALID_ARGUMENT; \ } while (false) #define IS_VALID(ptr) \ do { \ if ((ptr) == NULL || !(ptr)->IsValid()) \ return hsa_status_t(ValidityError::value); \ } while (false) #define CHECK_ALLOC(ptr) \ do { \ if ((ptr) == NULL) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; \ } while (false) #define IS_OPEN() \ do { \ if (!core::Runtime::runtime_singleton_->IsOpen()) \ return HSA_STATUS_ERROR_NOT_INITIALIZED; \ } while (false) template static __forceinline bool IsValid(T* ptr) { return (ptr == NULL) ? NULL : ptr->IsValid(); } #define TRY try { #define CATCH } catch(...) { return AMD::handleException(); } #define CATCHRET(RETURN_TYPE) } catch(...) { return AMD::handleExceptionT(); } namespace AMD { hsa_status_t handleException() { try { throw; } catch (const std::bad_alloc& e) { debug_print("HSA exception: BadAlloc\n"); return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } catch (const hsa_exception& e) { ifdebug { if (!strIsEmpty(e.what())) debug_print("HSA exception: %s\n", e.what()); } return e.error_code(); } catch (const std::exception& e) { debug_print("Unhandled exception: %s\n", e.what()); assert(false && "Unhandled exception."); return HSA_STATUS_ERROR; } catch (const std::nested_exception& e) { debug_print("Callback threw, forwarding.\n"); e.rethrow_nested(); return HSA_STATUS_ERROR; } catch (...) { assert(false && "Unhandled exception."); abort(); return HSA_STATUS_ERROR; } } template static __forceinline T handleExceptionT() { handleException(); abort(); return T(); } hsa_status_t hsa_amd_coherency_get_type(hsa_agent_t agent_handle, hsa_amd_coherency_type_t* type) { TRY; IS_OPEN(); const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); IS_BAD_PTR(type); if (agent->device_type() != core::Agent::kAmdGpuDevice) { return HSA_STATUS_ERROR_INVALID_AGENT; } const AMD::GpuAgentInt* gpu_agent = static_cast(agent); *type = gpu_agent->current_coherency_type(); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_coherency_set_type(hsa_agent_t agent_handle, hsa_amd_coherency_type_t type) { TRY; IS_OPEN(); core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); if (type < HSA_AMD_COHERENCY_TYPE_COHERENT || type > HSA_AMD_COHERENCY_TYPE_NONCOHERENT) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (agent->device_type() != core::Agent::kAmdGpuDevice) { return HSA_STATUS_ERROR_INVALID_AGENT; } AMD::GpuAgent* gpu_agent = static_cast(agent); if (!gpu_agent->current_coherency_type(type)) { return HSA_STATUS_ERROR; } return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_memory_fill(void* ptr, uint32_t value, size_t count) { TRY; IS_OPEN(); if ((ptr == nullptr) || (uintptr_t(ptr) % 4 != 0)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (count == 0) { return HSA_STATUS_SUCCESS; } return core::Runtime::runtime_singleton_->FillMemory(ptr, value, count); CATCH; } hsa_status_t hsa_amd_memory_async_copy(void* dst, hsa_agent_t dst_agent_handle, const void* src, hsa_agent_t src_agent_handle, size_t size, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal) { TRY; if (dst == NULL || src == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if ((num_dep_signals == 0 && dep_signals != NULL) || (num_dep_signals > 0 && dep_signals == NULL)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } core::Agent* dst_agent = core::Agent::Convert(dst_agent_handle); IS_VALID(dst_agent); core::Agent* src_agent = core::Agent::Convert(src_agent_handle); IS_VALID(src_agent); std::vector dep_signal_list(num_dep_signals); if (num_dep_signals > 0) { for (size_t i = 0; i < num_dep_signals; ++i) { core::Signal* dep_signal_obj = core::Signal::Convert(dep_signals[i]); IS_VALID(dep_signal_obj); dep_signal_list[i] = dep_signal_obj; } } core::Signal* out_signal_obj = core::Signal::Convert(completion_signal); IS_VALID(out_signal_obj); bool rev_copy_dir = core::Runtime::runtime_singleton_->flag().rev_copy_dir(); if (size > 0) { return core::Runtime::runtime_singleton_->CopyMemory( dst, (rev_copy_dir ? *src_agent : *dst_agent), src, (rev_copy_dir ? *dst_agent : *src_agent), size, dep_signal_list, *out_signal_obj); } return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_memory_async_copy_rect( const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, hsa_agent_t copy_agent, hsa_amd_copy_direction_t dir, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal) { TRY; if (dst == nullptr || src == nullptr || dst_offset == nullptr || src_offset == nullptr || range == nullptr) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if ((num_dep_signals == 0 && dep_signals != NULL) || (num_dep_signals > 0 && dep_signals == NULL)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (dir == hsaHostToHost) return HSA_STATUS_ERROR_INVALID_ARGUMENT; core::Agent* base_agent = core::Agent::Convert(copy_agent); IS_VALID(base_agent); if (base_agent->device_type() != core::Agent::DeviceType::kAmdGpuDevice) return HSA_STATUS_ERROR_INVALID_AGENT; AMD::GpuAgent* agent = static_cast(base_agent); std::vector dep_signal_list(num_dep_signals); if (num_dep_signals > 0) { for (size_t i = 0; i < num_dep_signals; ++i) { core::Signal* dep_signal_obj = core::Signal::Convert(dep_signals[i]); IS_VALID(dep_signal_obj); dep_signal_list[i] = dep_signal_obj; } } core::Signal* out_signal_obj = core::Signal::Convert(completion_signal); IS_VALID(out_signal_obj); if ((range->x != 0) && (range->y != 0) && (range->z != 0)) { return agent->DmaCopyRect(dst, dst_offset, src, src_offset, range, dir, dep_signal_list, *out_signal_obj); } return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_profiling_set_profiler_enabled(hsa_queue_t* queue, int enable) { TRY; IS_OPEN(); core::Queue* cmd_queue = core::Queue::Convert(queue); IS_VALID(cmd_queue); cmd_queue->SetProfiling(enable); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_profiling_async_copy_enable(bool enable) { TRY; IS_OPEN(); hsa_status_t ret = HSA_STATUS_SUCCESS; for (core::Agent* agent : core::Runtime::runtime_singleton_->gpu_agents()) { hsa_status_t err = agent->profiling_enabled(enable); if (err != HSA_STATUS_SUCCESS) ret = err; } return ret; CATCH; } hsa_status_t hsa_amd_profiling_get_dispatch_time( hsa_agent_t agent_handle, hsa_signal_t hsa_signal, hsa_amd_profiling_dispatch_time_t* time) { TRY; IS_OPEN(); IS_BAD_PTR(time); core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); core::Signal* signal = core::Signal::Convert(hsa_signal); IS_VALID(signal); if (agent->device_type() != core::Agent::kAmdGpuDevice) { return HSA_STATUS_ERROR_INVALID_AGENT; } AMD::GpuAgentInt* gpu_agent = static_cast(agent); // Translate timestamp from GPU to system domain. gpu_agent->TranslateTime(signal, *time); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_profiling_get_async_copy_time( hsa_signal_t hsa_signal, hsa_amd_profiling_async_copy_time_t* time) { TRY; IS_OPEN(); IS_BAD_PTR(time); core::Signal* signal = core::Signal::Convert(hsa_signal); IS_VALID(signal); core::Agent* agent = signal->async_copy_agent(); if (agent == nullptr) { return HSA_STATUS_ERROR; } if (agent->device_type() == core::Agent::DeviceType::kAmdGpuDevice) { // Translate timestamp from GPU to system domain. static_cast(agent)->TranslateTime(signal, *time); return HSA_STATUS_SUCCESS; } // The timestamp is already in system domain. time->start = signal->signal_.start_ts; time->end = signal->signal_.end_ts; return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_profiling_convert_tick_to_system_domain(hsa_agent_t agent_handle, uint64_t agent_tick, uint64_t* system_tick) { TRY; IS_OPEN(); IS_BAD_PTR(system_tick); core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); if (agent->device_type() != core::Agent::kAmdGpuDevice) { return HSA_STATUS_ERROR_INVALID_AGENT; } AMD::GpuAgentInt* gpu_agent = static_cast(agent); *system_tick = gpu_agent->TranslateTime(agent_tick); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_signal_create(hsa_signal_value_t initial_value, uint32_t num_consumers, const hsa_agent_t* consumers, uint64_t attributes, hsa_signal_t* hsa_signal) { struct AgentHandleCompare { bool operator()(const hsa_agent_t& lhs, const hsa_agent_t& rhs) const { return lhs.handle < rhs.handle; } }; TRY; IS_OPEN(); IS_BAD_PTR(hsa_signal); core::Signal* ret; bool enable_ipc = attributes & HSA_AMD_SIGNAL_IPC; bool use_default = enable_ipc || (attributes & HSA_AMD_SIGNAL_AMD_GPU_ONLY) || (!core::g_use_interrupt_wait); if ((!use_default) && (num_consumers != 0)) { IS_BAD_PTR(consumers); // Check for duplicates in consumers. std::set consumer_set(consumers, consumers + num_consumers); if (consumer_set.size() != num_consumers) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } use_default = true; for (const core::Agent* cpu_agent : core::Runtime::runtime_singleton_->cpu_agents()) { use_default &= (consumer_set.find(cpu_agent->public_handle()) == consumer_set.end()); } } if (use_default) { ret = new core::DefaultSignal(initial_value, enable_ipc); } else { ret = new core::InterruptSignal(initial_value); } *hsa_signal = core::Signal::Convert(ret); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_signal_value_pointer(hsa_signal_t hsa_signal, volatile hsa_signal_value_t** value_ptr) { TRY; IS_OPEN(); IS_BAD_PTR(value_ptr); core::Signal* signal = core::Signal::Convert(hsa_signal); IS_VALID(signal); if(!core::BusyWaitSignal::IsType(signal)) return HSA_STATUS_ERROR_INVALID_ARGUMENT; *value_ptr = (volatile hsa_signal_value_t*)&signal->signal_.value; return HSA_STATUS_SUCCESS; CATCH; } uint32_t hsa_amd_signal_wait_any(uint32_t signal_count, hsa_signal_t* hsa_signals, hsa_signal_condition_t* conds, hsa_signal_value_t* values, uint64_t timeout_hint, hsa_wait_state_t wait_hint, hsa_signal_value_t* satisfying_value) { TRY; if (!core::Runtime::runtime_singleton_->IsOpen()) { assert(false && "hsa_amd_signal_wait_any called while not initialized."); return uint32_t(0); } // Do not check for signal invalidation. Invalidation may occur during async // signal handler loop and is not an error. for (uint i = 0; i < signal_count; i++) assert(hsa_signals[i].handle != 0 && core::SharedSignal::Convert(hsa_signals[i])->IsValid() && "Invalid signal."); return core::Signal::WaitAny(signal_count, hsa_signals, conds, values, timeout_hint, wait_hint, satisfying_value); CATCHRET(uint32_t); } hsa_status_t hsa_amd_signal_async_handler(hsa_signal_t hsa_signal, hsa_signal_condition_t cond, hsa_signal_value_t value, hsa_amd_signal_handler handler, void* arg) { TRY; IS_OPEN(); IS_BAD_PTR(handler); core::Signal* signal = core::Signal::Convert(hsa_signal); IS_VALID(signal); if (core::g_use_interrupt_wait && (!core::InterruptSignal::IsType(signal))) return HSA_STATUS_ERROR_INVALID_SIGNAL; return core::Runtime::runtime_singleton_->SetAsyncSignalHandler( hsa_signal, cond, value, handler, arg); CATCH; } hsa_status_t hsa_amd_async_function(void (*callback)(void* arg), void* arg) { TRY; IS_OPEN(); IS_BAD_PTR(callback); static const hsa_signal_t null_signal = {0}; return core::Runtime::runtime_singleton_->SetAsyncSignalHandler( null_signal, HSA_SIGNAL_CONDITION_EQ, 0, (hsa_amd_signal_handler)callback, arg); CATCH; } hsa_status_t hsa_amd_queue_cu_set_mask(const hsa_queue_t* queue, uint32_t num_cu_mask_count, const uint32_t* cu_mask) { TRY; IS_OPEN(); core::Queue* cmd_queue = core::Queue::Convert(queue); IS_VALID(cmd_queue); if (num_cu_mask_count != 0) IS_BAD_PTR(cu_mask); if (num_cu_mask_count % 32 != 0) return HSA_STATUS_ERROR_INVALID_ARGUMENT; return cmd_queue->SetCUMasking(num_cu_mask_count, cu_mask); CATCH; } hsa_status_t hsa_amd_queue_cu_get_mask(const hsa_queue_t* queue, uint32_t num_cu_mask_count, uint32_t* cu_mask) { TRY; IS_OPEN(); IS_BAD_PTR(cu_mask); core::Queue* cmd_queue = core::Queue::Convert(queue); IS_VALID(cmd_queue); if ((num_cu_mask_count == 0) || (num_cu_mask_count % 32 != 0)) return HSA_STATUS_ERROR_INVALID_ARGUMENT; return cmd_queue->GetCUMasking(num_cu_mask_count, cu_mask); CATCH; } hsa_status_t hsa_amd_memory_lock(void* host_ptr, size_t size, hsa_agent_t* agents, int num_agent, void** agent_ptr) { TRY; IS_OPEN(); if (size == 0 || host_ptr == nullptr || agent_ptr == nullptr) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } *agent_ptr = nullptr; if ((agents != nullptr && num_agent == 0) || (agents == nullptr && num_agent != 0)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } // Check for APU if (core::Runtime::runtime_singleton_->system_regions_coarse().size() == 0) { assert(core::Runtime::runtime_singleton_->system_regions_fine()[0]->full_profile() && "Missing coarse grain host memory on dGPU system."); *agent_ptr = host_ptr; return HSA_STATUS_SUCCESS; } const AMD::MemoryRegion* system_region = static_cast( core::Runtime::runtime_singleton_->system_regions_coarse()[0]); return system_region->Lock(num_agent, agents, host_ptr, size, agent_ptr); CATCH; } hsa_status_t hsa_amd_memory_lock_to_pool(void* host_ptr, size_t size, hsa_agent_t* agents, int num_agent, hsa_amd_memory_pool_t pool, uint32_t flags, void** agent_ptr) { TRY; IS_OPEN(); if (size == 0 || host_ptr == nullptr || agent_ptr == nullptr || flags != 0) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } *agent_ptr = nullptr; if ((agents != nullptr && num_agent == 0) || (agents == nullptr && num_agent != 0)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } hsa_region_t region = {pool.handle}; const AMD::MemoryRegion* mem_region = AMD::MemoryRegion::Convert(region); if (mem_region == nullptr) { return (hsa_status_t)HSA_STATUS_ERROR_INVALID_MEMORY_POOL; } if (mem_region->owner()->device_type() != core::Agent::kAmdCpuDevice) return (hsa_status_t)HSA_STATUS_ERROR_INVALID_MEMORY_POOL; return mem_region->Lock(num_agent, agents, host_ptr, size, agent_ptr); CATCH; } hsa_status_t hsa_amd_memory_unlock(void* host_ptr) { TRY; IS_OPEN(); const AMD::MemoryRegion* system_region = reinterpret_cast( core::Runtime::runtime_singleton_->system_regions_fine()[0]); return system_region->Unlock(host_ptr); CATCH; } hsa_status_t hsa_amd_memory_pool_get_info(hsa_amd_memory_pool_t memory_pool, hsa_amd_memory_pool_info_t attribute, void* value) { TRY; IS_OPEN(); IS_BAD_PTR(value); hsa_region_t region = {memory_pool.handle}; const AMD::MemoryRegion* mem_region = AMD::MemoryRegion::Convert(region); if (mem_region == NULL) { return (hsa_status_t)HSA_STATUS_ERROR_INVALID_MEMORY_POOL; } return mem_region->GetPoolInfo(attribute, value); CATCH; } hsa_status_t hsa_amd_agent_iterate_memory_pools( hsa_agent_t agent_handle, hsa_status_t (*callback)(hsa_amd_memory_pool_t memory_pool, void* data), void* data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); if (agent->device_type() == core::Agent::kAmdCpuDevice) { return reinterpret_cast(agent)->VisitRegion( false, reinterpret_cast(callback), data); } return reinterpret_cast(agent)->VisitRegion( false, reinterpret_cast( callback), data); CATCH; } hsa_status_t hsa_amd_memory_pool_allocate(hsa_amd_memory_pool_t memory_pool, size_t size, uint32_t flags, void** ptr) { TRY; IS_OPEN(); if (size == 0 || ptr == NULL || flags != 0) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } hsa_region_t region = {memory_pool.handle}; const core::MemoryRegion* mem_region = core::MemoryRegion::Convert(region); if (mem_region == NULL || !mem_region->IsValid()) { return (hsa_status_t)HSA_STATUS_ERROR_INVALID_MEMORY_POOL; } return core::Runtime::runtime_singleton_->AllocateMemory( mem_region, size, core::MemoryRegion::AllocateRestrict, ptr); CATCH; } hsa_status_t hsa_amd_memory_pool_free(void* ptr) { return HSA::hsa_memory_free(ptr); } hsa_status_t hsa_amd_agents_allow_access(uint32_t num_agents, const hsa_agent_t* agents, const uint32_t* flags, const void* ptr) { TRY; IS_OPEN(); if (num_agents == 0 || agents == NULL || flags != NULL || ptr == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return core::Runtime::runtime_singleton_->AllowAccess(num_agents, agents, ptr); CATCH; } hsa_status_t hsa_amd_memory_pool_can_migrate(hsa_amd_memory_pool_t src_memory_pool, hsa_amd_memory_pool_t dst_memory_pool, bool* result) { TRY; IS_OPEN(); if (result == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } hsa_region_t src_region_handle = {src_memory_pool.handle}; const AMD::MemoryRegion* src_mem_region = AMD::MemoryRegion::Convert(src_region_handle); if (src_mem_region == NULL || !src_mem_region->IsValid()) { return static_cast(HSA_STATUS_ERROR_INVALID_MEMORY_POOL); } hsa_region_t dst_region_handle = {dst_memory_pool.handle}; const AMD::MemoryRegion* dst_mem_region = AMD::MemoryRegion::Convert(dst_region_handle); if (dst_mem_region == NULL || !dst_mem_region->IsValid()) { return static_cast(HSA_STATUS_ERROR_INVALID_MEMORY_POOL); } return src_mem_region->CanMigrate(*dst_mem_region, *result); CATCH; } hsa_status_t hsa_amd_memory_migrate(const void* ptr, hsa_amd_memory_pool_t memory_pool, uint32_t flags) { TRY; IS_OPEN(); if (ptr == NULL || flags != 0) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } hsa_region_t dst_region_handle = {memory_pool.handle}; const AMD::MemoryRegion* dst_mem_region = AMD::MemoryRegion::Convert(dst_region_handle); if (dst_mem_region == NULL || !dst_mem_region->IsValid()) { return static_cast(HSA_STATUS_ERROR_INVALID_MEMORY_POOL); } return dst_mem_region->Migrate(flags, ptr); CATCH; } hsa_status_t hsa_amd_agent_memory_pool_get_info( hsa_agent_t agent_handle, hsa_amd_memory_pool_t memory_pool, hsa_amd_agent_memory_pool_info_t attribute, void* value) { TRY; IS_OPEN(); if (value == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } const core::Agent* agent = core::Agent::Convert(agent_handle); IS_VALID(agent); hsa_region_t region_handle = {memory_pool.handle}; const AMD::MemoryRegion* mem_region = AMD::MemoryRegion::Convert(region_handle); if (mem_region == NULL || !mem_region->IsValid()) { return static_cast(HSA_STATUS_ERROR_INVALID_MEMORY_POOL); } return mem_region->GetAgentPoolInfo(*agent, attribute, value); CATCH; } hsa_status_t hsa_amd_interop_map_buffer(uint32_t num_agents, hsa_agent_t* agents, int interop_handle, uint32_t flags, size_t* size, void** ptr, size_t* metadata_size, const void** metadata) { static const int tinyArraySize=8; TRY; IS_OPEN(); IS_BAD_PTR(agents); IS_BAD_PTR(size); IS_BAD_PTR(ptr); if (flags != 0) return HSA_STATUS_ERROR_INVALID_ARGUMENT; if (num_agents == 0) return HSA_STATUS_ERROR_INVALID_ARGUMENT; core::Agent* short_agents[tinyArraySize]; core::Agent** core_agents = short_agents; if (num_agents > tinyArraySize) { core_agents = new core::Agent* [num_agents]; if (core_agents == nullptr) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } MAKE_SCOPE_GUARD([&]() { if (num_agents > tinyArraySize) delete[] core_agents; }); for (uint32_t i = 0; i < num_agents; i++) { core::Agent* device = core::Agent::Convert(agents[i]); IS_VALID(device); core_agents[i] = device; } auto ret = core::Runtime::runtime_singleton_->InteropMap( num_agents, core_agents, interop_handle, flags, size, ptr, metadata_size, metadata); return ret; CATCH; } hsa_status_t hsa_amd_interop_unmap_buffer(void* ptr) { TRY; IS_OPEN(); if (ptr != NULL) core::Runtime::runtime_singleton_->InteropUnmap(ptr); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_pointer_info(const void* ptr, hsa_amd_pointer_info_t* info, void* (*alloc)(size_t), uint32_t* num_accessible, hsa_agent_t** accessible) { TRY; IS_OPEN(); IS_BAD_PTR(ptr); IS_BAD_PTR(info); return core::Runtime::runtime_singleton_->PtrInfo(ptr, info, alloc, num_accessible, accessible); CATCH; } hsa_status_t hsa_amd_pointer_info_set_userdata(const void* ptr, void* userdata) { TRY; IS_OPEN(); IS_BAD_PTR(ptr); return core::Runtime::runtime_singleton_->SetPtrInfoData(ptr, userdata); CATCH; } hsa_status_t hsa_amd_ipc_memory_create(void* ptr, size_t len, hsa_amd_ipc_memory_t* handle) { TRY; IS_OPEN(); IS_BAD_PTR(ptr); IS_BAD_PTR(handle); return core::Runtime::runtime_singleton_->IPCCreate(ptr, len, handle); CATCH; } hsa_status_t hsa_amd_ipc_memory_attach(const hsa_amd_ipc_memory_t* ipc, size_t len, uint32_t num_agents, const hsa_agent_t* mapping_agents, void** mapped_ptr) { static const int tinyArraySize = 8; TRY; IS_OPEN(); IS_BAD_PTR(mapped_ptr); if (num_agents != 0) IS_BAD_PTR(mapping_agents); core::Agent** core_agents = nullptr; if (num_agents > tinyArraySize) core_agents = new core::Agent*[num_agents]; else core_agents = (core::Agent**)alloca(sizeof(core::Agent*) * num_agents); if (core_agents == NULL) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; MAKE_SCOPE_GUARD([&]() { if (num_agents > tinyArraySize) delete[] core_agents; }); for (uint32_t i = 0; i < num_agents; i++) { core::Agent* device = core::Agent::Convert(mapping_agents[i]); IS_VALID(device); core_agents[i] = device; } return core::Runtime::runtime_singleton_->IPCAttach(ipc, len, num_agents, core_agents, mapped_ptr); CATCH; } hsa_status_t hsa_amd_ipc_memory_detach(void* mapped_ptr) { TRY; IS_OPEN(); IS_BAD_PTR(mapped_ptr); return core::Runtime::runtime_singleton_->IPCDetach(mapped_ptr); CATCH; } hsa_status_t hsa_amd_ipc_signal_create(hsa_signal_t hsa_signal, hsa_amd_ipc_signal_t* handle) { TRY; IS_OPEN(); IS_BAD_PTR(handle); core::Signal* signal = core::Signal::Convert(hsa_signal); IS_VALID(signal); core::IPCSignal::CreateHandle(signal, handle); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_ipc_signal_attach(const hsa_amd_ipc_signal_t* handle, hsa_signal_t* hsa_signal) { TRY; IS_OPEN(); IS_BAD_PTR(handle); IS_BAD_PTR(hsa_signal); core::Signal* signal = core::IPCSignal::Attach(handle); *hsa_signal = core::Signal::Convert(signal); return HSA_STATUS_SUCCESS; CATCH; } // For use by tools only - not in library export table. hsa_status_t hsa_amd_queue_intercept_create( hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type, void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data, uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue) { TRY; IS_OPEN(); IS_BAD_PTR(queue); hsa_queue_t* lower_queue; hsa_status_t err = HSA::hsa_queue_create(agent_handle, size, type, callback, data, private_segment_size, group_segment_size, &lower_queue); if (err != HSA_STATUS_SUCCESS) return err; std::unique_ptr lowerQueue(core::Queue::Convert(lower_queue)); std::unique_ptr upperQueue(new core::InterceptQueue(std::move(lowerQueue))); *queue = core::Queue::Convert(upperQueue.release()); return HSA_STATUS_SUCCESS; CATCH; } // For use by tools only - not in library export table. hsa_status_t hsa_amd_queue_intercept_register(hsa_queue_t* queue, hsa_amd_queue_intercept_handler callback, void* user_data) { TRY; IS_OPEN(); IS_BAD_PTR(callback); core::Queue* cmd_queue = core::Queue::Convert(queue); IS_VALID(cmd_queue); if (!core::InterceptQueue::IsType(cmd_queue)) return HSA_STATUS_ERROR_INVALID_QUEUE; core::InterceptQueue* iQueue = static_cast(cmd_queue); iQueue->AddInterceptor(callback, user_data); return HSA_STATUS_SUCCESS; CATCH; } hsa_status_t hsa_amd_register_system_event_handler(hsa_amd_system_event_callback_t callback, void* data) { TRY; IS_OPEN(); return core::Runtime::runtime_singleton_->SetCustomSystemEventHandler(callback, data); CATCH; } hsa_status_t hsa_amd_queue_set_priority(hsa_queue_t* queue, hsa_amd_queue_priority_t priority) { TRY; IS_OPEN(); IS_BAD_PTR(queue); core::Queue* cmd_queue = core::Queue::Convert(queue); IS_VALID(cmd_queue); static std::map ext_kmt_priomap = { {HSA_AMD_QUEUE_PRIORITY_LOW, HSA_QUEUE_PRIORITY_MINIMUM}, {HSA_AMD_QUEUE_PRIORITY_NORMAL, HSA_QUEUE_PRIORITY_NORMAL}, {HSA_AMD_QUEUE_PRIORITY_HIGH, HSA_QUEUE_PRIORITY_MAXIMUM}, }; auto priority_it = ext_kmt_priomap.find(priority); if (priority_it == ext_kmt_priomap.end()) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return cmd_queue->SetPriority(priority_it->second); CATCH; } hsa_status_t hsa_amd_register_deallocation_callback(void* ptr, hsa_amd_deallocation_callback_t callback, void* user_data) { TRY; IS_OPEN(); IS_BAD_PTR(ptr); IS_BAD_PTR(callback); return core::Runtime::runtime_singleton_->RegisterReleaseNotifier(ptr, callback, user_data); CATCH; } hsa_status_t hsa_amd_deregister_deallocation_callback(void* ptr, hsa_amd_deallocation_callback_t callback) { TRY; IS_OPEN(); IS_BAD_PTR(ptr); IS_BAD_PTR(callback); return core::Runtime::runtime_singleton_->DeregisterReleaseNotifier(ptr, callback); CATCH; } // For use by tools only - not in library export table. hsa_status_t hsa_amd_runtime_queue_create_register(hsa_amd_runtime_queue_notifier callback, void* user_data) { TRY; IS_OPEN(); return core::Runtime::runtime_singleton_->SetInternalQueueCreateNotifier(callback, user_data); CATCH; } hsa_status_t hsa_amd_svm_attributes_set(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count) { TRY; IS_OPEN(); return core::Runtime::runtime_singleton_->SetSvmAttrib(ptr, size, attribute_list, attribute_count); CATCH; } hsa_status_t hsa_amd_svm_attributes_get(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count) { TRY; IS_OPEN(); return core::Runtime::runtime_singleton_->GetSvmAttrib(ptr, size, attribute_list, attribute_count); CATCH; } hsa_status_t hsa_amd_svm_prefetch_async(void* ptr, size_t size, hsa_agent_t agent, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal) { TRY; IS_OPEN(); // Validate inputs. // if (core::g_use_interrupt_wait && (!core::InterruptSignal::IsType(signal))) return core::Runtime::runtime_singleton_->SvmPrefetch(ptr, size, agent, num_dep_signals, dep_signals, completion_signal); CATCH; } } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/hsa_ext_interface.cpp000066400000000000000000000443441420110115200242150ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "image/inc/hsa_ext_image_impl.h" #include "core/inc/hsa_ext_interface.h" #include "core/inc/runtime.h" #include namespace rocr { // Implementations for missing / unsupported extensions template static R hsa_ext_null(ARGS...) { return HSA_STATUS_ERROR_NOT_INITIALIZED; } namespace core { ExtensionEntryPoints::ExtensionEntryPoints() { InitFinalizerExtTable(); InitImageExtTable(); InitAmdExtTable(); } // Initialize Finalizer function table to be NULLs void ExtensionEntryPoints::InitFinalizerExtTable() { // Initialize Version of Api Table finalizer_api.version.major_id = 0x00; finalizer_api.version.minor_id = 0x00; finalizer_api.version.step_id = 0x00; finalizer_api.hsa_ext_program_create_fn = hsa_ext_null; finalizer_api.hsa_ext_program_destroy_fn = hsa_ext_null; finalizer_api.hsa_ext_program_add_module_fn = hsa_ext_null; finalizer_api.hsa_ext_program_iterate_modules_fn = hsa_ext_null; finalizer_api.hsa_ext_program_get_info_fn = hsa_ext_null; finalizer_api.hsa_ext_program_finalize_fn = hsa_ext_null; } // Initialize Image function table to be NULLs void ExtensionEntryPoints::InitImageExtTable() { // Initialize Version of Api Table image_api.version.major_id = 0x00; image_api.version.minor_id = 0x00; image_api.version.step_id = 0x00; image_api.hsa_ext_image_get_capability_fn = hsa_ext_null; image_api.hsa_ext_image_data_get_info_fn = hsa_ext_null; image_api.hsa_ext_image_create_fn = hsa_ext_null; image_api.hsa_ext_image_import_fn = hsa_ext_null; image_api.hsa_ext_image_export_fn = hsa_ext_null; image_api.hsa_ext_image_copy_fn = hsa_ext_null; image_api.hsa_ext_image_clear_fn = hsa_ext_null; image_api.hsa_ext_image_destroy_fn = hsa_ext_null; image_api.hsa_ext_sampler_create_fn = hsa_ext_null; image_api.hsa_ext_sampler_destroy_fn = hsa_ext_null; image_api.hsa_amd_image_get_info_max_dim_fn = hsa_ext_null; image_api.hsa_ext_image_get_capability_with_layout_fn = hsa_ext_null; image_api.hsa_ext_image_data_get_info_with_layout_fn = hsa_ext_null; image_api.hsa_ext_image_create_with_layout_fn = hsa_ext_null; } // Initialize Amd Ext table for Api related to Images void ExtensionEntryPoints::InitAmdExtTable() { hsa_api_table_.amd_ext_api.hsa_amd_image_create_fn = hsa_ext_null; hsa_internal_api_table_.amd_ext_api.hsa_amd_image_create_fn = hsa_ext_null; } // Update Amd Ext table for Api related to Images. // @note: Interface should be updated when Amd Ext table // begins hosting Api's from other extension libraries void ExtensionEntryPoints::UpdateAmdExtTable(decltype(::hsa_amd_image_create)* func_ptr) { assert(hsa_api_table_.amd_ext_api.hsa_amd_image_create_fn == (decltype(hsa_amd_image_create)*)hsa_ext_null && "Duplicate load of extension import."); assert(hsa_internal_api_table_.amd_ext_api.hsa_amd_image_create_fn == (decltype(hsa_amd_image_create)*)hsa_ext_null && "Duplicate load of extension import."); hsa_api_table_.amd_ext_api.hsa_amd_image_create_fn = func_ptr; hsa_internal_api_table_.amd_ext_api.hsa_amd_image_create_fn = func_ptr; } void ExtensionEntryPoints::UnloadImage() { InitAmdExtTable(); InitImageExtTable(); core::hsa_internal_api_table_.Reset(); #ifdef HSA_IMAGE_SUPPORT rocr::image::ReleaseImageRsrcs(); #endif } void ExtensionEntryPoints::Unload() { // Reset Image apis to hsa_ext_null function UnloadImage(); for (auto lib : libs_) { void* ptr = os::GetExportAddress(lib, "Unload"); if (ptr) { ((Unload_t)ptr)(); } } // Due to valgrind bug, runtime cannot dlclose extensions see: // http://valgrind.org/docs/manual/faq.html#faq.unhelpful if (!core::Runtime::runtime_singleton_->flag().running_valgrind()) { for (auto lib : libs_) { os::CloseLib(lib); } } libs_.clear(); InitFinalizerExtTable(); InitImageExtTable(); InitAmdExtTable(); core::hsa_internal_api_table_.Reset(); } bool ExtensionEntryPoints::LoadImage() { #ifdef HSA_IMAGE_SUPPORT // Consult user input on linking to Image implementation bool disable_image = core::Runtime::runtime_singleton_->flag().disable_image(); if (disable_image) { return true; } // Bind to Image implementation api's decltype(::hsa_amd_image_create)* func; rocr::image::LoadImage(&image_api, &func); // Initialize Version of Api Table image_api.version.major_id = HSA_IMAGE_API_TABLE_MAJOR_VERSION; image_api.version.minor_id = sizeof(ImageExtTable); image_api.version.step_id = HSA_IMAGE_API_TABLE_STEP_VERSION; // Update private copy of Api table with handle for Image extensions hsa_internal_api_table_.CloneExts(&image_api, core::HsaApiTable::HSA_EXT_IMAGE_API_TABLE_ID); // Update Amd Ext Api table Api that deals with Images UpdateAmdExtTable(func); #endif return true; } bool ExtensionEntryPoints::LoadFinalizer(std::string library_name) { os::LibHandle lib = os::LoadLib(library_name); if (lib == NULL) { return false; } libs_.push_back(lib); void* ptr; ptr = os::GetExportAddress(lib, "hsa_ext_program_create_impl"); if (ptr != NULL) { assert(finalizer_api.hsa_ext_program_create_fn == (decltype(::hsa_ext_program_create)*)hsa_ext_null && "Duplicate load of extension import."); finalizer_api.hsa_ext_program_create_fn = (decltype(::hsa_ext_program_create)*)ptr; } ptr = os::GetExportAddress(lib, "hsa_ext_program_destroy_impl"); if (ptr != NULL) { assert(finalizer_api.hsa_ext_program_destroy_fn == (decltype(::hsa_ext_program_destroy)*)hsa_ext_null && "Duplicate load of extension import."); finalizer_api.hsa_ext_program_destroy_fn = (decltype(::hsa_ext_program_destroy)*)ptr; } ptr = os::GetExportAddress(lib, "hsa_ext_program_add_module_impl"); if (ptr != NULL) { assert(finalizer_api.hsa_ext_program_add_module_fn == (decltype(::hsa_ext_program_add_module)*)hsa_ext_null && "Duplicate load of extension import."); finalizer_api.hsa_ext_program_add_module_fn = (decltype(::hsa_ext_program_add_module)*)ptr; } ptr = os::GetExportAddress(lib, "hsa_ext_program_iterate_modules_impl"); if (ptr != NULL) { assert(finalizer_api.hsa_ext_program_iterate_modules_fn == (decltype(::hsa_ext_program_iterate_modules)*)hsa_ext_null && "Duplicate load of extension import."); finalizer_api.hsa_ext_program_iterate_modules_fn = (decltype(::hsa_ext_program_iterate_modules)*)ptr; } ptr = os::GetExportAddress(lib, "hsa_ext_program_get_info_impl"); if (ptr != NULL) { assert(finalizer_api.hsa_ext_program_get_info_fn == (decltype(::hsa_ext_program_get_info)*)hsa_ext_null && "Duplicate load of extension import."); finalizer_api.hsa_ext_program_get_info_fn = (decltype(::hsa_ext_program_get_info)*)ptr; } ptr = os::GetExportAddress(lib, "hsa_ext_program_finalize_impl"); if (ptr != NULL) { assert(finalizer_api.hsa_ext_program_finalize_fn == (decltype(::hsa_ext_program_finalize)*)hsa_ext_null && "Duplicate load of extension import."); finalizer_api.hsa_ext_program_finalize_fn = (decltype(::hsa_ext_program_finalize)*)ptr; } // Initialize Version of Api Table finalizer_api.version.major_id = HSA_FINALIZER_API_TABLE_MAJOR_VERSION; finalizer_api.version.minor_id = sizeof(::FinalizerExtTable); finalizer_api.version.step_id = HSA_FINALIZER_API_TABLE_STEP_VERSION; // Update handle of table of HSA extensions hsa_internal_api_table_.CloneExts(&finalizer_api, core::HsaApiTable::HSA_EXT_FINALIZER_API_TABLE_ID); ptr = os::GetExportAddress(lib, "Load"); if (ptr != NULL) { ((Load_t)ptr)(&core::hsa_internal_api_table_.hsa_api); } return true; } } // namespace core } // namespace rocr //---------------------------------------------------------------------------// // Exported extension stub functions //---------------------------------------------------------------------------// hsa_status_t hsa_ext_program_create( hsa_machine_model_t machine_model, hsa_profile_t profile, hsa_default_float_rounding_mode_t default_float_rounding_mode, const char* options, hsa_ext_program_t* program) { return rocr::core::Runtime::runtime_singleton_->extensions_.finalizer_api .hsa_ext_program_create_fn(machine_model, profile, default_float_rounding_mode, options, program); } hsa_status_t hsa_ext_program_destroy(hsa_ext_program_t program) { return rocr::core::Runtime::runtime_singleton_->extensions_.finalizer_api .hsa_ext_program_destroy_fn(program); } hsa_status_t hsa_ext_program_add_module(hsa_ext_program_t program, hsa_ext_module_t module) { return rocr::core::Runtime::runtime_singleton_->extensions_.finalizer_api .hsa_ext_program_add_module_fn(program, module); } hsa_status_t hsa_ext_program_iterate_modules( hsa_ext_program_t program, hsa_status_t (*callback)(hsa_ext_program_t program, hsa_ext_module_t module, void* data), void* data) { return rocr::core::Runtime::runtime_singleton_->extensions_.finalizer_api .hsa_ext_program_iterate_modules_fn(program, callback, data); } hsa_status_t hsa_ext_program_get_info(hsa_ext_program_t program, hsa_ext_program_info_t attribute, void* value) { return rocr::core::Runtime::runtime_singleton_->extensions_.finalizer_api .hsa_ext_program_get_info_fn(program, attribute, value); } hsa_status_t hsa_ext_program_finalize( hsa_ext_program_t program, hsa_isa_t isa, int32_t call_convention, hsa_ext_control_directives_t control_directives, const char* options, hsa_code_object_type_t code_object_type, hsa_code_object_t* code_object) { return rocr::core::Runtime::runtime_singleton_->extensions_.finalizer_api .hsa_ext_program_finalize_fn(program, isa, call_convention, control_directives, options, code_object_type, code_object); } hsa_status_t hsa_ext_image_get_capability( hsa_agent_t agent, hsa_ext_image_geometry_t geometry, const hsa_ext_image_format_t* image_format, uint32_t* capability_mask) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_get_capability_fn(agent, geometry, image_format, capability_mask); } hsa_status_t hsa_ext_image_data_get_info( hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_info_t* image_data_info) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_data_get_info_fn(agent, image_descriptor, access_permission, image_data_info); } hsa_status_t hsa_ext_image_create( hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, const void* image_data, hsa_access_permission_t access_permission, hsa_ext_image_t* image) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_create_fn(agent, image_descriptor, image_data, access_permission, image); } hsa_status_t hsa_ext_image_import(hsa_agent_t agent, const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, hsa_ext_image_t dst_image, const hsa_ext_image_region_t* image_region) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_import_fn(agent, src_memory, src_row_pitch, src_slice_pitch, dst_image, image_region); } hsa_status_t hsa_ext_image_export(hsa_agent_t agent, hsa_ext_image_t src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t* image_region) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_export_fn(agent, src_image, dst_memory, dst_row_pitch, dst_slice_pitch, image_region); } hsa_status_t hsa_ext_image_copy(hsa_agent_t agent, hsa_ext_image_t src_image, const hsa_dim3_t* src_offset, hsa_ext_image_t dst_image, const hsa_dim3_t* dst_offset, const hsa_dim3_t* range) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_copy_fn(agent, src_image, src_offset, dst_image, dst_offset, range); } hsa_status_t hsa_ext_image_clear(hsa_agent_t agent, hsa_ext_image_t image, const void* data, const hsa_ext_image_region_t* image_region) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_clear_fn(agent, image, data, image_region); } hsa_status_t hsa_ext_image_destroy(hsa_agent_t agent, hsa_ext_image_t image) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_destroy_fn(agent, image); } hsa_status_t hsa_ext_sampler_create( hsa_agent_t agent, const hsa_ext_sampler_descriptor_t* sampler_descriptor, hsa_ext_sampler_t* sampler) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_sampler_create_fn(agent, sampler_descriptor, sampler); } hsa_status_t hsa_ext_sampler_destroy(hsa_agent_t agent, hsa_ext_sampler_t sampler) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_sampler_destroy_fn(agent, sampler); } hsa_status_t hsa_ext_image_get_capability_with_layout( hsa_agent_t agent, hsa_ext_image_geometry_t geometry, const hsa_ext_image_format_t* image_format, hsa_ext_image_data_layout_t image_data_layout, uint32_t* capability_mask) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_get_capability_with_layout_fn(agent, geometry, image_format, image_data_layout, capability_mask); } hsa_status_t hsa_ext_image_data_get_info_with_layout( hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t* image_data_info) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_data_get_info_with_layout_fn(agent, image_descriptor, access_permission, image_data_layout, image_data_row_pitch, image_data_slice_pitch, image_data_info); } hsa_status_t hsa_ext_image_create_with_layout( hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, const void* image_data, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_t* image) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_ext_image_create_with_layout_fn(agent, image_descriptor, image_data, access_permission, image_data_layout, image_data_row_pitch, image_data_slice_pitch, image); } //---------------------------------------------------------------------------// // Stubs for internal extension functions //---------------------------------------------------------------------------// // Use the function pointer from local instance Image Extension hsa_status_t hsa_amd_image_get_info_max_dim(hsa_agent_t component, hsa_agent_info_t attribute, void* value) { return rocr::core::Runtime::runtime_singleton_->extensions_.image_api .hsa_amd_image_get_info_max_dim_fn(component, attribute, value); } ROCR-Runtime-rocm-5.0.0/src/core/runtime/hsa_ven_amd_loader.cpp000066400000000000000000000237101420110115200243260ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/hsa_ven_amd_loader_impl.h" #include "core/inc/amd_hsa_loader.hpp" #include "core/inc/runtime.h" namespace rocr { using namespace amd::hsa; using namespace core; using loader::CodeObjectReaderImpl; using loader::Executable; using loader::LoadedCodeObject; using loader::Loader; namespace AMD { hsa_status_t handleException(); } // namespace amd hsa_status_t hsa_ven_amd_loader_query_host_address( const void *device_address, const void **host_address) { try { if (!Runtime::runtime_singleton_->IsOpen()) { return HSA_STATUS_ERROR_NOT_INITIALIZED; } if (nullptr == device_address) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (nullptr == host_address) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } uintptr_t udaddr = reinterpret_cast(device_address); uintptr_t uhaddr = Runtime::runtime_singleton_->loader()->FindHostAddress(udaddr); if (0 == uhaddr) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } *host_address = reinterpret_cast(uhaddr); return HSA_STATUS_SUCCESS; } catch(...) { return AMD::handleException(); } } hsa_status_t hsa_ven_amd_loader_query_segment_descriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors) { try { if (!Runtime::runtime_singleton_->IsOpen()) { return HSA_STATUS_ERROR_NOT_INITIALIZED; } // Arguments are checked by the loader. return Runtime::runtime_singleton_->loader()->QuerySegmentDescriptors(segment_descriptors, num_segment_descriptors); } catch(...) { return AMD::handleException(); } } hsa_status_t hsa_ven_amd_loader_query_executable( const void *device_address, hsa_executable_t *executable) { try { if (!Runtime::runtime_singleton_->IsOpen()) { return HSA_STATUS_ERROR_NOT_INITIALIZED; } if ((nullptr == device_address) || (nullptr == executable)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } uintptr_t udaddr = reinterpret_cast(device_address); hsa_executable_t exec = Runtime::runtime_singleton_->loader()->FindExecutable(udaddr); if (0 == exec.handle) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } *executable = exec; return HSA_STATUS_SUCCESS; } catch(...) { return AMD::handleException(); } } hsa_status_t hsa_ven_amd_loader_executable_iterate_loaded_code_objects( hsa_executable_t executable, hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data) { try { if (!Runtime::runtime_singleton_->IsOpen()) { return HSA_STATUS_ERROR_NOT_INITIALIZED; } if (nullptr == callback) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } Executable *exec = Executable::Object(executable); if (!exec) { return HSA_STATUS_ERROR_INVALID_EXECUTABLE; } return exec->IterateLoadedCodeObjects(callback, data); } catch(...) { return AMD::handleException(); } } hsa_status_t hsa_ven_amd_loader_loaded_code_object_get_info( hsa_loaded_code_object_t loaded_code_object, hsa_ven_amd_loader_loaded_code_object_info_t attribute, void *value) { try { if (!Runtime::runtime_singleton_->IsOpen()) { return HSA_STATUS_ERROR_NOT_INITIALIZED; } if (nullptr == value) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } const LoadedCodeObject *lcobj = LoadedCodeObject::Object(loaded_code_object); if (!lcobj) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } switch (attribute) { case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_EXECUTABLE: { *((hsa_executable_t*)value) = lcobj->getExecutable(); break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_KIND: { *((uint32_t*)value) = lcobj->getAgent().handle == 0 ? HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_KIND_PROGRAM : HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_KIND_AGENT; break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_AGENT: { hsa_agent_t agent = lcobj->getAgent(); if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } *((hsa_agent_t*)value) = agent; break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE: { // TODO Update loader so it keeps track if code object was loaded from a // file or memory. *((uint32_t*)value) = HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY; break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE: { *((uint64_t*)value) = lcobj->getElfData(); break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE: { *((uint64_t*)value) = lcobj->getElfSize(); break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE: { // TODO Update loader so it keeps track if code object was loaded from a // file or memory. return HSA_STATUS_ERROR_INVALID_ARGUMENT; break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA: { // TODO Check if executable is frozen. // This suggests this code should be moved into LoadedCodeObjectImpl::getinfo // as is done for other *_get_info methods. Currently LoadedCodeObject has a // GetInfo method which is likely not used. // Also should this have a *NOT_FROZEN ststus code added? // if (state_ != HSA_EXECUTABLE_STATE_FROZEN) { // return HSA_STATUS_ERROR_INVALID_ARGUMENT; // } *((int64_t*)value) = lcobj->getDelta(); break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE: { // TODO Check if executable is frozen. *((uint64_t*)value) = lcobj->getLoadBase(); break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE: { // TODO Check if executable is frozen. *((uint64_t*)value) = lcobj->getLoadSize(); break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH: { *(reinterpret_cast(value)) = lcobj->getUri().size(); break; } case HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI: { memcpy(value, lcobj->getUri().c_str(), lcobj->getUri().size()); break; } default: { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } } return HSA_STATUS_SUCCESS; } catch(...) { return AMD::handleException(); } } hsa_status_t hsa_ven_amd_loader_code_object_reader_create_from_file_with_offset_size( hsa_file_t file, size_t offset, size_t size, hsa_code_object_reader_t *code_object_reader) { try { if (!Runtime::runtime_singleton_->IsOpen()) { return HSA_STATUS_ERROR_NOT_INITIALIZED; } if (nullptr == code_object_reader) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (size == 0) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } std::unique_ptr reader( new (std::nothrow) CodeObjectReaderImpl()); if (!reader) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } hsa_status_t status = reader->SetFile(file, offset, size); if (status != HSA_STATUS_SUCCESS) { return status; } *code_object_reader = CodeObjectReaderImpl::Handle(reader.release()); return HSA_STATUS_SUCCESS; } catch(...) { return AMD::handleException(); } } namespace { Loader *GetLoader() { return Runtime::runtime_singleton_->loader(); } } // namespace anonymous hsa_status_t hsa_ven_amd_loader_iterate_executables( hsa_status_t (*callback)( hsa_executable_t executable, void *data), void *data) { try { if (!Runtime::runtime_singleton_->IsOpen()) { return HSA_STATUS_ERROR_NOT_INITIALIZED; } if (nullptr == callback) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return GetLoader()->IterateExecutables(callback, data); } catch(...) { return AMD::handleException(); } } } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/intercept_queue.cpp000066400000000000000000000232141420110115200237340ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/intercept_queue.h" #include "core/util/utils.h" namespace rocr { namespace core { struct InterceptFrame { InterceptQueue* queue; uint64_t pkt_index; size_t interceptor_index; }; static thread_local InterceptFrame Cursor = {nullptr, 0, 0}; static const uint16_t kInvalidHeader = (HSA_PACKET_TYPE_INVALID << HSA_PACKET_HEADER_TYPE) | (1 << HSA_PACKET_HEADER_BARRIER) | (HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE) | (HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE); static const uint16_t kBarrierHeader = (HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE) | (1 << HSA_PACKET_HEADER_BARRIER) | (HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE) | (HSA_FENCE_SCOPE_NONE << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE); static const hsa_barrier_and_packet_t kBarrierPacket = {kInvalidHeader, 0, 0, {}, 0, {}}; int InterceptQueue::rtti_id_ = 0; InterceptQueue::InterceptQueue(std::unique_ptr queue) : QueueProxy(std::move(queue)), LocalSignal(0, false), DoorbellSignal(signal()), next_packet_(0), retry_index_(0), quit_(false), active_(true) { buffer_ = SharedArray(wrapped->amd_queue_.hsa_queue.size); amd_queue_.hsa_queue.base_address = reinterpret_cast(&buffer_[0]); // Fill the ring buffer with invalid packet headers. // Leave packet content uninitialized to help trigger application errors. for (uint32_t pkt_id = 0; pkt_id < wrapped->amd_queue_.hsa_queue.size; ++pkt_id) { buffer_[pkt_id].dispatch.header = HSA_PACKET_TYPE_INVALID; } // Match the queue's signal ABI block to async_doorbell_'s // This allows us to use the queue's signal ABI block from devices to trigger async_doorbell while // host side use jumps directly to the queue's signal implementation. async_doorbell_ = new InterruptSignal(DOORBELL_MAX); MAKE_NAMED_SCOPE_GUARD(sigGuard, [&]() { async_doorbell_->DestroySignal(); }); this->signal_ = async_doorbell_->signal_; amd_queue_.hsa_queue.doorbell_signal = Signal::Convert(this); // Install an async handler for device side dispatches. auto err = Runtime::runtime_singleton_->SetAsyncSignalHandler( core::Signal::Convert(async_doorbell_), HSA_SIGNAL_CONDITION_NE, async_doorbell_->LoadRelaxed(), HandleAsyncDoorbell, this); if (err != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(err, "Doorbell handler registration failed.\n"); // Install copy submission interceptor. AddInterceptor(Submit, this); sigGuard.Dismiss(); } InterceptQueue::~InterceptQueue() { active_ = false; // Kill the async doorbell handler // Doorbell may not be used during or after queue destroy, however an interrupt may be in flight. // Ensure doorbell value is not 0, mark for exit, wake handler and wait for termination value. async_doorbell_->StoreRelaxed(DOORBELL_MAX); quit_ = true; hsa_signal_value_t val = async_doorbell_->ExchRelaxed(1); if (val != 0) async_doorbell_->WaitRelaxed(HSA_SIGNAL_CONDITION_EQ, 0, -1, HSA_WAIT_STATE_BLOCKED); async_doorbell_->DestroySignal(); } bool InterceptQueue::HandleAsyncDoorbell(hsa_signal_value_t value, void* arg) { InterceptQueue* queue = reinterpret_cast(arg); if (queue->quit_) { queue->async_doorbell_->StoreRelaxed(0); return false; } queue->async_doorbell_->StoreRelaxed(DOORBELL_MAX); queue->StoreRelease(value); return true; } void InterceptQueue::PacketWriter(const void* pkts, uint64_t pkt_count) { Cursor.interceptor_index--; auto& handler = Cursor.queue->interceptors[Cursor.interceptor_index]; handler.first(pkts, pkt_count, Cursor.pkt_index, handler.second, PacketWriter); } void InterceptQueue::Submit(const void* pkts, uint64_t pkt_count, uint64_t user_pkt_index, void* data, hsa_amd_queue_intercept_packet_writer writer) { InterceptQueue* queue = reinterpret_cast(data); const AqlPacket* packets = (const AqlPacket*)pkts; // Submit final packet transform to hardware. if (queue->Submit(packets, pkt_count)) return; // Could not submit final packets, stash for later. assert(queue->overflow_.empty() && "Packet intercept error: overflow buffer not empty.\n"); for (uint64_t i = 0; i < pkt_count; i++) queue->overflow_.push_back(packets[i]); } bool InterceptQueue::Submit(const AqlPacket* packets, uint64_t count) { if (count == 0) return true; AqlPacket* ring = reinterpret_cast(wrapped->amd_queue_.hsa_queue.base_address); uint64_t mask = wrapped->amd_queue_.hsa_queue.size - 1; while (true) { uint64_t write = wrapped->LoadWriteIndexRelaxed(); uint64_t read = wrapped->LoadReadIndexRelaxed(); uint64_t free_slots = wrapped->amd_queue_.hsa_queue.size - (write - read); // If out of space defer packet insertion. if (free_slots <= count) { // If there is not already a pending retry point add one. if (retry_index_ <= read) { // Reserve and wait for one slot. write = wrapped->AddWriteIndexRelaxed(1); read = write - wrapped->amd_queue_.hsa_queue.size + 1; while (wrapped->LoadReadIndexRelaxed() < read) os::YieldThread(); // Submit barrer which will wake async queue processing. ring[write & mask].barrier_and = kBarrierPacket; ring[write & mask].barrier_and.completion_signal = Signal::Convert(async_doorbell_); atomic::Store(&ring[write & mask].barrier_and.header, kBarrierHeader, std::memory_order_release); HSA::hsa_signal_store_screlease(wrapped->amd_queue_.hsa_queue.doorbell_signal, write); // Record the retry point retry_index_ = write; } return false; } // Attempt to reserve useable queue space uint64_t new_write = wrapped->CasWriteIndexRelaxed(write, write + count); if (new_write == write) { AqlPacket first = packets[0]; uint16_t header = first.dispatch.header; first.dispatch.header = kInvalidHeader; ring[write & mask] = first; for (uint64_t i = 1; i < count; i++) ring[(write + i) & mask] = packets[i]; atomic::Store(&ring[write & mask].dispatch.header, header, std::memory_order_release); HSA::hsa_signal_store_screlease(wrapped->amd_queue_.hsa_queue.doorbell_signal, write + count - 1); return true; } } } void InterceptQueue::StoreRelaxed(hsa_signal_value_t value) { if (!active_) return; // If called recursively defer to async doorbell thread. if (Cursor.queue != nullptr) { debug_print("Likely incorrect queue use observed in an interceptor.\n"); async_doorbell_->StoreRelaxed(value); return; } ScopedAcquire lock(&lock_); // Submit overflow packets. if (!overflow_.empty()) { if (!Submit(&overflow_[0], overflow_.size())) return; overflow_.clear(); } Cursor.queue = this; AqlPacket* ring = reinterpret_cast(amd_queue_.hsa_queue.base_address); uint64_t mask = wrapped->amd_queue_.hsa_queue.size - 1; // Loop over valid packets and process. uint64_t end = LoadWriteIndexAcquire(); uint64_t i; for (i = next_packet_; i < end; i++) { if (!ring[i & mask].IsValid()) break; // Process callbacks. Cursor.interceptor_index = interceptors.size() - 1; Cursor.pkt_index = i; auto& handler = interceptors[Cursor.interceptor_index]; handler.first(&ring[i & mask], 1, i, handler.second, PacketWriter); // Invalidate consumed packet atomic::Store(&ring[i & mask].dispatch.header, kInvalidHeader, std::memory_order_release); } next_packet_ = i; Cursor.queue = nullptr; atomic::Store(&amd_queue_.read_dispatch_id, next_packet_, std::memory_order_release); } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/interrupt_signal.cpp000066400000000000000000000311271420110115200241260ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/interrupt_signal.h" #include "core/inc/runtime.h" #include "core/util/timer.h" #include "core/util/locks.h" namespace rocr { namespace core { HsaEvent* InterruptSignal::EventPool::alloc() { ScopedAcquire lock(&lock_); if (events_.empty()) { if (!allEventsAllocated) { HsaEvent* evt = InterruptSignal::CreateEvent(HSA_EVENTTYPE_SIGNAL, false); if (evt == nullptr) allEventsAllocated = true; return evt; } return nullptr; } HsaEvent* ret = events_.back().release(); events_.pop_back(); return ret; } void InterruptSignal::EventPool::free(HsaEvent* evt) { if (evt == nullptr) return; ScopedAcquire lock(&lock_); events_.push_back(unique_event_ptr(evt)); } int InterruptSignal::rtti_id_ = 0; HsaEvent* InterruptSignal::CreateEvent(HSA_EVENTTYPE type, bool manual_reset) { HsaEventDescriptor event_descriptor; event_descriptor.EventType = type; event_descriptor.SyncVar.SyncVar.UserData = NULL; event_descriptor.SyncVar.SyncVarSize = sizeof(hsa_signal_value_t); event_descriptor.NodeId = 0; HsaEvent* ret = NULL; if (HSAKMT_STATUS_SUCCESS == hsaKmtCreateEvent(&event_descriptor, manual_reset, false, &ret)) { if (type == HSA_EVENTTYPE_MEMORY) { memset(&ret->EventData.EventData.MemoryAccessFault.Failure, 0, sizeof(HsaAccessAttributeFailure)); } } return ret; } void InterruptSignal::DestroyEvent(HsaEvent* evt) { hsaKmtDestroyEvent(evt); } InterruptSignal::InterruptSignal(hsa_signal_value_t initial_value, HsaEvent* use_event) : LocalSignal(initial_value, false), Signal(signal()) { if (use_event != nullptr) { event_ = use_event; free_event_ = false; } else { event_ = Runtime::runtime_singleton_->GetEventPool()->alloc(); free_event_ = true; } if (event_ != nullptr) { signal_.event_id = event_->EventId; signal_.event_mailbox_ptr = event_->EventData.HWData2; } else { signal_.event_id = 0; signal_.event_mailbox_ptr = 0; } signal_.kind = AMD_SIGNAL_KIND_USER; } InterruptSignal::~InterruptSignal() { if (free_event_) Runtime::runtime_singleton_->GetEventPool()->free(event_); } hsa_signal_value_t InterruptSignal::LoadRelaxed() { return hsa_signal_value_t( atomic::Load(&signal_.value, std::memory_order_relaxed)); } hsa_signal_value_t InterruptSignal::LoadAcquire() { return hsa_signal_value_t( atomic::Load(&signal_.value, std::memory_order_acquire)); } void InterruptSignal::StoreRelaxed(hsa_signal_value_t value) { atomic::Store(&signal_.value, int64_t(value), std::memory_order_relaxed); SetEvent(); } void InterruptSignal::StoreRelease(hsa_signal_value_t value) { atomic::Store(&signal_.value, int64_t(value), std::memory_order_release); SetEvent(); } hsa_signal_value_t InterruptSignal::WaitRelaxed( hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint) { Retain(); MAKE_SCOPE_GUARD([&]() { Release(); }); uint32_t prior = waiting_++; MAKE_SCOPE_GUARD([&]() { waiting_--; }); // Allow only the first waiter to sleep (temporary, known to be bad). if (prior != 0) wait_hint = HSA_WAIT_STATE_ACTIVE; int64_t value; timer::fast_clock::time_point start_time = timer::fast_clock::now(); // Set a polling timeout value // Should be a few times bigger than null kernel latency const timer::fast_clock::duration kMaxElapsed = std::chrono::microseconds(200); uint64_t hsa_freq; HSA::hsa_system_get_info(HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY, &hsa_freq); const timer::fast_clock::duration fast_timeout = timer::duration_from_seconds( double(timeout) / double(hsa_freq)); bool condition_met = false; while (true) { if (!IsValid()) return 0; value = atomic::Load(&signal_.value, std::memory_order_relaxed); switch (condition) { case HSA_SIGNAL_CONDITION_EQ: { condition_met = (value == compare_value); break; } case HSA_SIGNAL_CONDITION_NE: { condition_met = (value != compare_value); break; } case HSA_SIGNAL_CONDITION_GTE: { condition_met = (value >= compare_value); break; } case HSA_SIGNAL_CONDITION_LT: { condition_met = (value < compare_value); break; } default: return 0; } if (condition_met) return hsa_signal_value_t(value); timer::fast_clock::time_point time = timer::fast_clock::now(); if (time - start_time > fast_timeout) { value = atomic::Load(&signal_.value, std::memory_order_relaxed); return hsa_signal_value_t(value); } if (wait_hint == HSA_WAIT_STATE_ACTIVE) { continue; } if (time - start_time < kMaxElapsed) { // os::uSleep(20); continue; } uint32_t wait_ms; auto time_remaining = fast_timeout - (time - start_time); uint64_t ct=timer::duration_cast( time_remaining).count(); wait_ms = (ct>0xFFFFFFFEu) ? 0xFFFFFFFEu : ct; hsaKmtWaitOnEvent(event_, wait_ms); } } hsa_signal_value_t InterruptSignal::WaitAcquire( hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout, hsa_wait_state_t wait_hint) { hsa_signal_value_t ret = WaitRelaxed(condition, compare_value, timeout, wait_hint); std::atomic_thread_fence(std::memory_order_acquire); return ret; } void InterruptSignal::AndRelaxed(hsa_signal_value_t value) { atomic::And(&signal_.value, int64_t(value), std::memory_order_relaxed); SetEvent(); } void InterruptSignal::AndAcquire(hsa_signal_value_t value) { atomic::And(&signal_.value, int64_t(value), std::memory_order_acquire); SetEvent(); } void InterruptSignal::AndRelease(hsa_signal_value_t value) { atomic::And(&signal_.value, int64_t(value), std::memory_order_release); SetEvent(); } void InterruptSignal::AndAcqRel(hsa_signal_value_t value) { atomic::And(&signal_.value, int64_t(value), std::memory_order_acq_rel); SetEvent(); } void InterruptSignal::OrRelaxed(hsa_signal_value_t value) { atomic::Or(&signal_.value, int64_t(value), std::memory_order_relaxed); SetEvent(); } void InterruptSignal::OrAcquire(hsa_signal_value_t value) { atomic::Or(&signal_.value, int64_t(value), std::memory_order_acquire); SetEvent(); } void InterruptSignal::OrRelease(hsa_signal_value_t value) { atomic::Or(&signal_.value, int64_t(value), std::memory_order_release); SetEvent(); } void InterruptSignal::OrAcqRel(hsa_signal_value_t value) { atomic::Or(&signal_.value, int64_t(value), std::memory_order_acq_rel); SetEvent(); } void InterruptSignal::XorRelaxed(hsa_signal_value_t value) { atomic::Xor(&signal_.value, int64_t(value), std::memory_order_relaxed); SetEvent(); } void InterruptSignal::XorAcquire(hsa_signal_value_t value) { atomic::Xor(&signal_.value, int64_t(value), std::memory_order_acquire); SetEvent(); } void InterruptSignal::XorRelease(hsa_signal_value_t value) { atomic::Xor(&signal_.value, int64_t(value), std::memory_order_release); SetEvent(); } void InterruptSignal::XorAcqRel(hsa_signal_value_t value) { atomic::Xor(&signal_.value, int64_t(value), std::memory_order_acq_rel); SetEvent(); } void InterruptSignal::AddRelaxed(hsa_signal_value_t value) { atomic::Add(&signal_.value, int64_t(value), std::memory_order_relaxed); SetEvent(); } void InterruptSignal::AddAcquire(hsa_signal_value_t value) { atomic::Add(&signal_.value, int64_t(value), std::memory_order_acquire); SetEvent(); } void InterruptSignal::AddRelease(hsa_signal_value_t value) { atomic::Add(&signal_.value, int64_t(value), std::memory_order_release); SetEvent(); } void InterruptSignal::AddAcqRel(hsa_signal_value_t value) { atomic::Add(&signal_.value, int64_t(value), std::memory_order_acq_rel); SetEvent(); } void InterruptSignal::SubRelaxed(hsa_signal_value_t value) { atomic::Sub(&signal_.value, int64_t(value), std::memory_order_relaxed); SetEvent(); } void InterruptSignal::SubAcquire(hsa_signal_value_t value) { atomic::Sub(&signal_.value, int64_t(value), std::memory_order_acquire); SetEvent(); } void InterruptSignal::SubRelease(hsa_signal_value_t value) { atomic::Sub(&signal_.value, int64_t(value), std::memory_order_release); SetEvent(); } void InterruptSignal::SubAcqRel(hsa_signal_value_t value) { atomic::Sub(&signal_.value, int64_t(value), std::memory_order_acq_rel); SetEvent(); } hsa_signal_value_t InterruptSignal::ExchRelaxed(hsa_signal_value_t value) { hsa_signal_value_t ret = hsa_signal_value_t(atomic::Exchange( &signal_.value, int64_t(value), std::memory_order_relaxed)); SetEvent(); return ret; } hsa_signal_value_t InterruptSignal::ExchAcquire(hsa_signal_value_t value) { hsa_signal_value_t ret = hsa_signal_value_t(atomic::Exchange( &signal_.value, int64_t(value), std::memory_order_acquire)); SetEvent(); return ret; } hsa_signal_value_t InterruptSignal::ExchRelease(hsa_signal_value_t value) { hsa_signal_value_t ret = hsa_signal_value_t(atomic::Exchange( &signal_.value, int64_t(value), std::memory_order_release)); SetEvent(); return ret; } hsa_signal_value_t InterruptSignal::ExchAcqRel(hsa_signal_value_t value) { hsa_signal_value_t ret = hsa_signal_value_t(atomic::Exchange( &signal_.value, int64_t(value), std::memory_order_acq_rel)); SetEvent(); return ret; } hsa_signal_value_t InterruptSignal::CasRelaxed(hsa_signal_value_t expected, hsa_signal_value_t value) { hsa_signal_value_t ret = hsa_signal_value_t( atomic::Cas(&signal_.value, int64_t(value), int64_t(expected), std::memory_order_relaxed)); SetEvent(); return ret; } hsa_signal_value_t InterruptSignal::CasAcquire(hsa_signal_value_t expected, hsa_signal_value_t value) { hsa_signal_value_t ret = hsa_signal_value_t( atomic::Cas(&signal_.value, int64_t(value), int64_t(expected), std::memory_order_acquire)); SetEvent(); return ret; } hsa_signal_value_t InterruptSignal::CasRelease(hsa_signal_value_t expected, hsa_signal_value_t value) { hsa_signal_value_t ret = hsa_signal_value_t( atomic::Cas(&signal_.value, int64_t(value), int64_t(expected), std::memory_order_release)); SetEvent(); return ret; } hsa_signal_value_t InterruptSignal::CasAcqRel(hsa_signal_value_t expected, hsa_signal_value_t value) { hsa_signal_value_t ret = hsa_signal_value_t( atomic::Cas(&signal_.value, int64_t(value), int64_t(expected), std::memory_order_acq_rel)); SetEvent(); return ret; } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/ipc_signal.cpp000066400000000000000000000073521420110115200226500ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/ipc_signal.h" #include #include "core/inc/runtime.h" #include "core/inc/exceptions.h" namespace rocr { namespace core { int IPCSignal::rtti_id_ = 0; KernelMutex IPCSignal::lock_; SharedMemory::SharedMemory(const hsa_amd_ipc_memory_t* handle, size_t len) { hsa_status_t err = Runtime::runtime_singleton_->IPCAttach(handle, len, 0, NULL, &ptr_); if (err != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(err, "IPC memory attach failed."); } SharedMemory::SharedMemory(SharedMemory&& rhs) { ptr_ = rhs.ptr_; rhs.ptr_ = nullptr; } SharedMemory::~SharedMemory() { if (ptr_ == nullptr) return; auto err = Runtime::runtime_singleton_->IPCDetach(ptr_); assert(err == HSA_STATUS_SUCCESS && "IPC detach failed."); } void IPCSignal::CreateHandle(Signal* signal, hsa_amd_ipc_signal_t* ipc_handle) { if (!signal->isIPC()) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Signal must be IPC enabled."); SharedSignal* shared = SharedSignal::Convert(Convert(signal)); hsa_status_t err = Runtime::runtime_singleton_->IPCCreate(shared, 4096, ipc_handle); if (err != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(err, "IPC memory create failed."); } Signal* IPCSignal::Attach(const hsa_amd_ipc_signal_t* ipc_signal_handle) { SharedMemorySignal shared(ipc_signal_handle); if (!(shared.signal()->IsIPC())) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "IPC memory does not contain an IPC signal abi block."); hsa_signal_t handle = SharedSignal::Convert(shared.signal()); ScopedAcquire lock(&lock_); Signal* ret = core::Signal::DuplicateHandle(handle); if (ret == nullptr) ret = new IPCSignal(std::move(shared)); return ret; } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/isa.cpp000077500000000000000000000347601420110115200213220ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/isa.h" #include #include #include #include #include namespace rocr { namespace core { bool Wavefront::GetInfo( const hsa_wavefront_info_t &attribute, void *value) const { if (!value) { return false; } switch (attribute) { case HSA_WAVEFRONT_INFO_SIZE: { *((uint32_t*)value) = 64; return true; } default: { return false; } } } /* static */ bool Isa::IsCompatible(const Isa &code_object_isa, const Isa &agent_isa) { if (code_object_isa.GetVersion() != agent_isa.GetVersion()) return false; assert(code_object_isa.IsSrameccSupported() == agent_isa.IsSrameccSupported() && agent_isa.GetSramecc() != IsaFeature::Any); if ((code_object_isa.GetSramecc() == IsaFeature::Enabled || code_object_isa.GetSramecc() == IsaFeature::Disabled) && code_object_isa.GetSramecc() != agent_isa.GetSramecc()) return false; assert(code_object_isa.IsXnackSupported() == agent_isa.IsXnackSupported() && agent_isa.GetXnack() != IsaFeature::Any); if ((code_object_isa.GetXnack() == IsaFeature::Enabled || code_object_isa.GetXnack() == IsaFeature::Disabled) && code_object_isa.GetXnack() != agent_isa.GetXnack()) return false; return true; } std::string Isa::GetProcessorName() const { std::string processor(targetid_); return processor.substr(0, processor.find(':')); } std::string Isa::GetIsaName() const { constexpr char hsa_isa_name_prefix[] = "amdgcn-amd-amdhsa--"; return std::string(hsa_isa_name_prefix) + targetid_; } bool Isa::GetInfo(const hsa_isa_info_t &attribute, void *value) const { if (!value) { return false; } switch (attribute) { case HSA_ISA_INFO_NAME_LENGTH: { std::string isa_name = GetIsaName(); *((uint32_t*)value) = static_cast(isa_name.size() + 1); return true; } case HSA_ISA_INFO_NAME: { std::string isa_name = GetIsaName(); memset(value, 0x0, isa_name.size() + 1); memcpy(value, isa_name.c_str(), isa_name.size()); return true; } // deprecated. case HSA_ISA_INFO_CALL_CONVENTION_COUNT: { *((uint32_t*)value) = 1; return true; } // deprecated. case HSA_ISA_INFO_CALL_CONVENTION_INFO_WAVEFRONT_SIZE: { *((uint32_t*)value) = 64; return true; } // deprecated. case HSA_ISA_INFO_CALL_CONVENTION_INFO_WAVEFRONTS_PER_COMPUTE_UNIT: { *((uint32_t*)value) = 40; return true; } case HSA_ISA_INFO_MACHINE_MODELS: { const bool machine_models[2] = {false, true}; memcpy(value, machine_models, sizeof(machine_models)); return true; } case HSA_ISA_INFO_PROFILES: { bool profiles[2] = {true, false}; if (this->GetVersion() == Version(7, 0, 0) || this->GetVersion() == Version(8, 0, 1)) { profiles[1] = true; } memcpy(value, profiles, sizeof(profiles)); return true; } case HSA_ISA_INFO_DEFAULT_FLOAT_ROUNDING_MODES: { const bool rounding_modes[3] = {false, false, true}; memcpy(value, rounding_modes, sizeof(rounding_modes)); return true; } case HSA_ISA_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES: { const bool rounding_modes[3] = {false, false, true}; memcpy(value, rounding_modes, sizeof(rounding_modes)); return true; } case HSA_ISA_INFO_FAST_F16_OPERATION: { if (this->GetMajorVersion() >= 8) { *((bool*)value) = true; } else { *((bool*)value) = false; } return true; } case HSA_ISA_INFO_WORKGROUP_MAX_DIM: { const uint16_t workgroup_max_dim[3] = {1024, 1024, 1024}; memcpy(value, workgroup_max_dim, sizeof(workgroup_max_dim)); return true; } case HSA_ISA_INFO_WORKGROUP_MAX_SIZE: { *((uint32_t*)value) = 1024; return true; } case HSA_ISA_INFO_GRID_MAX_DIM: { const hsa_dim3_t grid_max_dim = {UINT32_MAX, UINT32_MAX, UINT32_MAX}; memcpy(value, &grid_max_dim, sizeof(grid_max_dim)); return true; } case HSA_ISA_INFO_GRID_MAX_SIZE: { *((uint64_t*)value) = UINT64_MAX; return true; } case HSA_ISA_INFO_FBARRIER_MAX_SIZE: { *((uint32_t*)value) = 32; return true; } default: { return false; } } } hsa_round_method_t Isa::GetRoundMethod( hsa_fp_type_t fp_type, hsa_flush_mode_t flush_mode) const { return HSA_ROUND_METHOD_SINGLE; } const Isa *IsaRegistry::GetIsa(const std::string &full_name) { auto isareg_iter = supported_isas_.find(full_name); return isareg_iter == supported_isas_.end() ? nullptr : &isareg_iter->second; } const Isa *IsaRegistry::GetIsa(const Isa::Version &version, IsaFeature sramecc, IsaFeature xnack) { auto isareg_iter = std::find_if(supported_isas_.begin(), supported_isas_.end(), [&](const IsaMap::value_type& isareg) { return isareg.second.GetVersion() == version && (isareg.second.GetSramecc() == IsaFeature::Unsupported || isareg.second.GetSramecc() == sramecc) && (isareg.second.GetXnack() == IsaFeature::Unsupported || isareg.second.GetXnack() == xnack); }); return isareg_iter == supported_isas_.end() ? nullptr : &isareg_iter->second; } const IsaRegistry::IsaMap IsaRegistry::supported_isas_ = IsaRegistry::GetSupportedIsas(); const IsaRegistry::IsaMap IsaRegistry::GetSupportedIsas() { // agent, and vendor name length limit excluding terminating nul character. constexpr size_t hsa_name_size = 63; // FIXME: Use static_assert when C++17 used. #define ISAREG_ENTRY_GEN(name, maj, min, stp, sramecc, xnack) \ assert(std::char_traits::length(name) <= hsa_name_size); \ Isa amd_amdgpu_##maj##min##stp##_SRAMECC_##sramecc##_XNACK_##xnack; \ amd_amdgpu_##maj##min##stp##_SRAMECC_##sramecc##_XNACK_##xnack.targetid_ = name; \ amd_amdgpu_##maj##min##stp##_SRAMECC_##sramecc##_XNACK_##xnack.version_ = Isa::Version(maj, min, stp); \ amd_amdgpu_##maj##min##stp##_SRAMECC_##sramecc##_XNACK_##xnack.sramecc_ = sramecc; \ amd_amdgpu_##maj##min##stp##_SRAMECC_##sramecc##_XNACK_##xnack.xnack_ = xnack; \ supported_isas.insert(std::make_pair( \ amd_amdgpu_##maj##min##stp##_SRAMECC_##sramecc##_XNACK_##xnack.GetIsaName(), \ amd_amdgpu_##maj##min##stp##_SRAMECC_##sramecc##_XNACK_##xnack)); \ IsaMap supported_isas; IsaFeature unsupported = IsaFeature::Unsupported; IsaFeature any = IsaFeature::Any; IsaFeature disabled = IsaFeature::Disabled; IsaFeature enabled = IsaFeature::Enabled; // Target ID Version SRAMECC XNACK ISAREG_ENTRY_GEN("gfx700", 7, 0, 0, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx701", 7, 0, 1, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx702", 7, 0, 2, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx801", 8, 0, 1, unsupported, any) ISAREG_ENTRY_GEN("gfx801:xnack-", 8, 0, 1, unsupported, disabled) ISAREG_ENTRY_GEN("gfx801:xnack+", 8, 0, 1, unsupported, enabled) ISAREG_ENTRY_GEN("gfx802", 8, 0, 2, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx803", 8, 0, 3, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx805", 8, 0, 5, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx810", 8, 1, 0, unsupported, any) ISAREG_ENTRY_GEN("gfx810:xnack-", 8, 1, 0, unsupported, disabled) ISAREG_ENTRY_GEN("gfx810:xnack+", 8, 1, 0, unsupported, enabled) ISAREG_ENTRY_GEN("gfx900", 9, 0, 0, unsupported, any) ISAREG_ENTRY_GEN("gfx900:xnack-", 9, 0, 0, unsupported, disabled) ISAREG_ENTRY_GEN("gfx900:xnack+", 9, 0, 0, unsupported, enabled) ISAREG_ENTRY_GEN("gfx902", 9, 0, 2, unsupported, any) ISAREG_ENTRY_GEN("gfx902:xnack-", 9, 0, 2, unsupported, disabled) ISAREG_ENTRY_GEN("gfx902:xnack+", 9, 0, 2, unsupported, enabled) ISAREG_ENTRY_GEN("gfx904", 9, 0, 4, unsupported, any) ISAREG_ENTRY_GEN("gfx904:xnack-", 9, 0, 4, unsupported, disabled) ISAREG_ENTRY_GEN("gfx904:xnack+", 9, 0, 4, unsupported, enabled) ISAREG_ENTRY_GEN("gfx906", 9, 0, 6, any, any) ISAREG_ENTRY_GEN("gfx906:xnack-", 9, 0, 6, any, disabled) ISAREG_ENTRY_GEN("gfx906:xnack+", 9, 0, 6, any, enabled) ISAREG_ENTRY_GEN("gfx906:sramecc-", 9, 0, 6, disabled, any) ISAREG_ENTRY_GEN("gfx906:sramecc+", 9, 0, 6, enabled, any) ISAREG_ENTRY_GEN("gfx906:sramecc-:xnack-", 9, 0, 6, disabled, disabled) ISAREG_ENTRY_GEN("gfx906:sramecc-:xnack+", 9, 0, 6, disabled, enabled) ISAREG_ENTRY_GEN("gfx906:sramecc+:xnack-", 9, 0, 6, enabled, disabled) ISAREG_ENTRY_GEN("gfx906:sramecc+:xnack+", 9, 0, 6, enabled, enabled) ISAREG_ENTRY_GEN("gfx908", 9, 0, 8, any, any) ISAREG_ENTRY_GEN("gfx908:xnack-", 9, 0, 8, any, disabled) ISAREG_ENTRY_GEN("gfx908:xnack+", 9, 0, 8, any, enabled) ISAREG_ENTRY_GEN("gfx908:sramecc-", 9, 0, 8, disabled, any) ISAREG_ENTRY_GEN("gfx908:sramecc+", 9, 0, 8, enabled, any) ISAREG_ENTRY_GEN("gfx908:sramecc-:xnack-", 9, 0, 8, disabled, disabled) ISAREG_ENTRY_GEN("gfx908:sramecc-:xnack+", 9, 0, 8, disabled, enabled) ISAREG_ENTRY_GEN("gfx908:sramecc+:xnack-", 9, 0, 8, enabled, disabled) ISAREG_ENTRY_GEN("gfx908:sramecc+:xnack+", 9, 0, 8, enabled, enabled) ISAREG_ENTRY_GEN("gfx909", 9, 0, 9, unsupported, any) ISAREG_ENTRY_GEN("gfx909:xnack-", 9, 0, 9, unsupported, disabled) ISAREG_ENTRY_GEN("gfx909:xnack+", 9, 0, 9, unsupported, enabled) ISAREG_ENTRY_GEN("gfx90a", 9, 0, 10, any, any) ISAREG_ENTRY_GEN("gfx90a:xnack-", 9, 0, 10, any, disabled) ISAREG_ENTRY_GEN("gfx90a:xnack+", 9, 0, 10, any, enabled) ISAREG_ENTRY_GEN("gfx90a:sramecc-", 9, 0, 10, disabled, any) ISAREG_ENTRY_GEN("gfx90a:sramecc+", 9, 0, 10, enabled, any) ISAREG_ENTRY_GEN("gfx90a:sramecc-:xnack-", 9, 0, 10, disabled, disabled) ISAREG_ENTRY_GEN("gfx90a:sramecc-:xnack+", 9, 0, 10, disabled, enabled) ISAREG_ENTRY_GEN("gfx90a:sramecc+:xnack-", 9, 0, 10, enabled, disabled) ISAREG_ENTRY_GEN("gfx90a:sramecc+:xnack+", 9, 0, 10, enabled, enabled) ISAREG_ENTRY_GEN("gfx90c", 9, 0, 12, unsupported, any) ISAREG_ENTRY_GEN("gfx90c:xnack-", 9, 0, 12, unsupported, disabled) ISAREG_ENTRY_GEN("gfx90c:xnack+", 9, 0, 12, unsupported, enabled) ISAREG_ENTRY_GEN("gfx1010", 10, 1, 0, unsupported, any) ISAREG_ENTRY_GEN("gfx1010:xnack-", 10, 1, 0, unsupported, disabled) ISAREG_ENTRY_GEN("gfx1010:xnack+", 10, 1, 0, unsupported, enabled) ISAREG_ENTRY_GEN("gfx1011", 10, 1, 1, unsupported, any) ISAREG_ENTRY_GEN("gfx1011:xnack-", 10, 1, 1, unsupported, disabled) ISAREG_ENTRY_GEN("gfx1011:xnack+", 10, 1, 1, unsupported, enabled) ISAREG_ENTRY_GEN("gfx1012", 10, 1, 2, unsupported, any) ISAREG_ENTRY_GEN("gfx1012:xnack-", 10, 1, 2, unsupported, disabled) ISAREG_ENTRY_GEN("gfx1012:xnack+", 10, 1, 2, unsupported, enabled) ISAREG_ENTRY_GEN("gfx1013", 10, 1, 3, unsupported, any) ISAREG_ENTRY_GEN("gfx1013:xnack-", 10, 1, 3, unsupported, disabled) ISAREG_ENTRY_GEN("gfx1013:xnack+", 10, 1, 3, unsupported, enabled) ISAREG_ENTRY_GEN("gfx1030", 10, 3, 0, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx1031", 10, 3, 1, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx1032", 10, 3, 2, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx1033", 10, 3, 3, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx1034", 10, 3, 4, unsupported, unsupported) ISAREG_ENTRY_GEN("gfx1035", 10, 3, 5, unsupported, unsupported) #undef ISAREG_ENTRY_GEN return supported_isas; } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/queue.cpp000066400000000000000000000050621420110115200216600ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/queue.h" #include "core/inc/runtime.h" namespace rocr { namespace core { // HSA Queue ID - used to bind a unique ID std::atomic Queue::hsa_queue_counter_(0); void Queue::DefaultErrorHandler(hsa_status_t status, hsa_queue_t* source, void* data) { if (core::Runtime::runtime_singleton_->flag().enable_queue_fault_message()) { const char* msg = "UNKNOWN ERROR"; HSA::hsa_status_string(status, &msg); fprintf(stderr, "Queue at %p inactivated due to async error:\n\t%s\n", source, msg); abort(); } } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/runtime.cpp000066400000000000000000002272141420110115200222240ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/runtime.h" #include #include #include #include #include #include #include "core/common/shared.h" #include "core/inc/hsa_ext_interface.h" #include "core/inc/amd_cpu_agent.h" #include "core/inc/amd_gpu_agent.h" #include "core/inc/amd_memory_region.h" #include "core/inc/amd_topology.h" #include "core/inc/signal.h" #include "core/inc/interrupt_signal.h" #include "core/inc/hsa_ext_amd_impl.h" #include "core/inc/hsa_api_trace_int.h" #include "core/util/os.h" #include "core/inc/exceptions.h" #include "inc/hsa_ven_amd_aqlprofile.h" #define HSA_VERSION_MAJOR 1 #define HSA_VERSION_MINOR 1 const char rocrbuildid[] __attribute__((used)) = "ROCR BUILD ID: " STRING(ROCR_BUILD_ID); namespace rocr { namespace core { bool g_use_interrupt_wait = true; Runtime* Runtime::runtime_singleton_ = NULL; KernelMutex Runtime::bootstrap_lock_; static bool loaded = true; class RuntimeCleanup { public: ~RuntimeCleanup() { if (!Runtime::IsOpen()) { delete Runtime::runtime_singleton_; } loaded = false; } }; static RuntimeCleanup cleanup_at_unload_; hsa_status_t Runtime::Acquire() { // Check to see if HSA has been cleaned up (process exit) if (!loaded) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; ScopedAcquire boot(&bootstrap_lock_); if (runtime_singleton_ == NULL) { runtime_singleton_ = new Runtime(); } if (runtime_singleton_->ref_count_ == INT32_MAX) { return HSA_STATUS_ERROR_REFCOUNT_OVERFLOW; } runtime_singleton_->ref_count_++; MAKE_NAMED_SCOPE_GUARD(refGuard, [&]() { runtime_singleton_->ref_count_--; }); if (runtime_singleton_->ref_count_ == 1) { hsa_status_t status = runtime_singleton_->Load(); if (status != HSA_STATUS_SUCCESS) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } } refGuard.Dismiss(); return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::Release() { // Check to see if HSA has been cleaned up (process exit) if (!loaded) return HSA_STATUS_SUCCESS; ScopedAcquire boot(&bootstrap_lock_); if (runtime_singleton_ == nullptr) return HSA_STATUS_ERROR_NOT_INITIALIZED; if (runtime_singleton_->ref_count_ == 1) { // Release all registered memory, then unload backends runtime_singleton_->Unload(); } runtime_singleton_->ref_count_--; if (runtime_singleton_->ref_count_ == 0) { delete runtime_singleton_; runtime_singleton_ = nullptr; } return HSA_STATUS_SUCCESS; } bool Runtime::IsOpen() { return (Runtime::runtime_singleton_ != NULL) && (Runtime::runtime_singleton_->ref_count_ != 0); } // Register agent information only. Must not call anything that may use the registered information // since those tables are incomplete. void Runtime::RegisterAgent(Agent* agent) { // Record the agent in the node-to-agent reverse lookup table. agents_by_node_[agent->node_id()].push_back(agent); // Process agent as a cpu or gpu device. if (agent->device_type() == Agent::DeviceType::kAmdCpuDevice) { cpu_agents_.push_back(agent); // Add cpu regions to the system region list. for (const core::MemoryRegion* region : agent->regions()) { if (region->fine_grain()) { system_regions_fine_.push_back(region); } else { system_regions_coarse_.push_back(region); } } assert(system_regions_fine_.size() > 0); // Init default fine grain system region allocator using fine grain // system region of the first discovered CPU agent. if (cpu_agents_.size() == 1) { // Might need memory pooling to cover allocation that // requires less than 4096 bytes. // Default system pool must support kernarg for (auto pool : system_regions_fine_) { if (pool->kernarg()) { system_allocator_ = [pool](size_t size, size_t alignment, MemoryRegion::AllocateFlags alloc_flags) -> void* { assert(alignment <= 4096); void* ptr = NULL; return (HSA_STATUS_SUCCESS == core::Runtime::runtime_singleton_->AllocateMemory(pool, size, alloc_flags, &ptr)) ? ptr : NULL; }; system_deallocator_ = [](void* ptr) { core::Runtime::runtime_singleton_->FreeMemory(ptr); }; BaseShared::SetAllocateAndFree(system_allocator_, system_deallocator_); break; } } } } else if (agent->device_type() == Agent::DeviceType::kAmdGpuDevice) { gpu_agents_.push_back(agent); gpu_ids_.push_back(agent->node_id()); // Assign the first discovered gpu agent as region gpu. if (region_gpu_ == NULL) region_gpu_ = agent; } } void Runtime::DestroyAgents() { agents_by_node_.clear(); std::for_each(gpu_agents_.begin(), gpu_agents_.end(), DeleteObject()); gpu_agents_.clear(); gpu_ids_.clear(); std::for_each(cpu_agents_.begin(), cpu_agents_.end(), DeleteObject()); cpu_agents_.clear(); region_gpu_ = NULL; system_regions_fine_.clear(); system_regions_coarse_.clear(); } void Runtime::SetLinkCount(size_t num_nodes) { num_nodes_ = num_nodes; link_matrix_.resize(num_nodes * num_nodes); } void Runtime::RegisterLinkInfo(uint32_t node_id_from, uint32_t node_id_to, uint32_t num_hop, hsa_amd_memory_pool_link_info_t& link_info) { const uint32_t idx = GetIndexLinkInfo(node_id_from, node_id_to); link_matrix_[idx].num_hop = num_hop; link_matrix_[idx].info = link_info; // Limit the number of hop to 1 since the runtime does not have enough // information to share to the user about each hop. link_matrix_[idx].num_hop = std::min(link_matrix_[idx].num_hop , 1U); } const Runtime::LinkInfo Runtime::GetLinkInfo(uint32_t node_id_from, uint32_t node_id_to) { return (node_id_from != node_id_to) ? link_matrix_[GetIndexLinkInfo(node_id_from, node_id_to)] : LinkInfo(); // No link. } uint32_t Runtime::GetIndexLinkInfo(uint32_t node_id_from, uint32_t node_id_to) { return ((node_id_from * num_nodes_) + node_id_to); } hsa_status_t Runtime::IterateAgent(hsa_status_t (*callback)(hsa_agent_t agent, void* data), void* data) { AMD::callback_t call(callback); std::vector* agent_lists[2] = {&cpu_agents_, &gpu_agents_}; for (std::vector* agent_list : agent_lists) { for (size_t i = 0; i < agent_list->size(); ++i) { hsa_agent_t agent = Agent::Convert(agent_list->at(i)); hsa_status_t status = call(agent, data); if (status != HSA_STATUS_SUCCESS) { return status; } } } return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::AllocateMemory(const MemoryRegion* region, size_t size, MemoryRegion::AllocateFlags alloc_flags, void** address) { hsa_status_t status = region->Allocate(size, alloc_flags, address); // Track the allocation result so that it could be freed properly. if (status == HSA_STATUS_SUCCESS) { ScopedAcquire lock(&memory_lock_); allocation_map_[*address] = AllocationRegion(region, size); } return status; } hsa_status_t Runtime::FreeMemory(void* ptr) { if (ptr == nullptr) { return HSA_STATUS_SUCCESS; } const MemoryRegion* region = nullptr; size_t size = 0; std::unique_ptr> notifiers; { ScopedAcquire lock(&memory_lock_); std::map::iterator it = allocation_map_.find(ptr); if (it == allocation_map_.end()) { debug_warning(false && "Can't find address in allocation map"); return HSA_STATUS_ERROR_INVALID_ALLOCATION; } region = it->second.region; size = it->second.size; // Imported fragments can't be released with FreeMemory. if (region == nullptr) { assert(false && "Can't release imported memory with free."); return HSA_STATUS_ERROR_INVALID_ARGUMENT; } notifiers = std::move(it->second.notifiers); allocation_map_.erase(it); } // Notifiers can't run while holding the lock or the callback won't be able to manage memory. // The memory triggering the notification has already been removed from the memory map so can't // be double released during the callback. if (notifiers) { for (auto& notifier : *notifiers) { notifier.callback(notifier.ptr, notifier.user_data); } } return region->Free(ptr, size); } hsa_status_t Runtime::RegisterReleaseNotifier(void* ptr, hsa_amd_deallocation_callback_t callback, void* user_data) { ScopedAcquire lock(&memory_lock_); auto mem = allocation_map_.upper_bound(ptr); if (mem != allocation_map_.begin()) { mem--; // No support for imported fragments yet. if (mem->second.region == nullptr) return HSA_STATUS_ERROR_INVALID_ALLOCATION; if ((mem->first <= ptr) && (ptr < reinterpret_cast(mem->first) + mem->second.size)) { auto& notifiers = mem->second.notifiers; if (!notifiers) notifiers.reset(new std::vector); AllocationRegion::notifier_t notifier = { ptr, AMD::callback_t(callback), user_data}; notifiers->push_back(notifier); return HSA_STATUS_SUCCESS; } } return HSA_STATUS_ERROR_INVALID_ALLOCATION; } hsa_status_t Runtime::DeregisterReleaseNotifier(void* ptr, hsa_amd_deallocation_callback_t callback) { hsa_status_t ret = HSA_STATUS_ERROR_INVALID_ARGUMENT; ScopedAcquire lock(&memory_lock_); auto mem = allocation_map_.upper_bound(ptr); if (mem != allocation_map_.begin()) { mem--; if ((mem->first <= ptr) && (ptr < reinterpret_cast(mem->first) + mem->second.size)) { auto& notifiers = mem->second.notifiers; if (!notifiers) return HSA_STATUS_ERROR_INVALID_ARGUMENT; for (size_t i = 0; i < notifiers->size(); i++) { if (((*notifiers)[i].ptr == ptr) && ((*notifiers)[i].callback) == callback) { (*notifiers)[i] = std::move((*notifiers)[notifiers->size() - 1]); notifiers->pop_back(); i--; ret = HSA_STATUS_SUCCESS; } } } } return ret; } hsa_status_t Runtime::CopyMemory(void* dst, const void* src, size_t size) { void* source = const_cast(src); // Choose agents from pointer info bool is_src_system = false; bool is_dst_system = false; core::Agent* src_agent; core::Agent* dst_agent; // Fetch ownership const auto& is_system_mem = [&](void* ptr, core::Agent*& agent, bool& need_lock) { hsa_amd_pointer_info_t info; uint32_t count; hsa_agent_t* accessible = nullptr; MAKE_SCOPE_GUARD([&]() { free(accessible); }); info.size = sizeof(info); hsa_status_t err = PtrInfo(ptr, &info, malloc, &count, &accessible); if (err != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(err, "PtrInfo failed in hsa_memory_copy."); ptrdiff_t endPtr = (ptrdiff_t)ptr + size; if (info.agentBaseAddress <= ptr && endPtr <= (ptrdiff_t)info.agentBaseAddress + info.sizeInBytes) { if (info.agentOwner.handle == 0) info.agentOwner = accessible[0]; agent = core::Agent::Convert(info.agentOwner); need_lock = false; return agent->device_type() != core::Agent::DeviceType::kAmdGpuDevice; } else { need_lock = true; agent = cpu_agents_[0]; return true; } }; bool src_lock, dst_lock; is_src_system = is_system_mem(source, src_agent, src_lock); is_dst_system = is_system_mem(dst, dst_agent, dst_lock); // CPU-CPU if (is_src_system && is_dst_system) { memcpy(dst, source, size); return HSA_STATUS_SUCCESS; } // Same GPU if (src_agent->node_id() == dst_agent->node_id()) return dst_agent->DmaCopy(dst, source, size); // GPU-CPU // Must ensure that system memory is visible to the GPU during the copy. const AMD::MemoryRegion* system_region = static_cast(system_regions_fine_[0]); void* gpuPtr = nullptr; const auto& locked_copy = [&](void*& ptr, core::Agent* locking_agent) { void* tmp; hsa_agent_t agent = locking_agent->public_handle(); hsa_status_t err = system_region->Lock(1, &agent, ptr, size, &tmp); if (err != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(err, "Lock failed in hsa_memory_copy."); gpuPtr = ptr; ptr = tmp; }; MAKE_SCOPE_GUARD([&]() { if (gpuPtr != nullptr) system_region->Unlock(gpuPtr); }); if (src_lock) locked_copy(source, dst_agent); if (dst_lock) locked_copy(dst, src_agent); if (is_src_system) return dst_agent->DmaCopy(dst, source, size); if (is_dst_system) return src_agent->DmaCopy(dst, source, size); /* GPU-GPU - functional support, not a performance path. This goes through system memory because we have to support copying between non-peer GPUs and we can't use P2P pointers even if the GPUs are peers. Because hsa_amd_agents_allow_access requires the caller to specify all allowed agents we can't assume that a peer mapped pointer would remain mapped for the duration of the copy. */ void* temp = system_allocator_(size, 0, core::MemoryRegion::AllocateNoFlags); MAKE_SCOPE_GUARD([&]() { system_deallocator_(temp); }); hsa_status_t err = src_agent->DmaCopy(temp, source, size); if (err == HSA_STATUS_SUCCESS) err = dst_agent->DmaCopy(dst, temp, size); return err; } hsa_status_t Runtime::CopyMemory(void* dst, core::Agent& dst_agent, const void* src, core::Agent& src_agent, size_t size, std::vector& dep_signals, core::Signal& completion_signal) { const bool dst_gpu = (dst_agent.device_type() == core::Agent::DeviceType::kAmdGpuDevice); const bool src_gpu = (src_agent.device_type() == core::Agent::DeviceType::kAmdGpuDevice); if (dst_gpu || src_gpu) { core::Agent* copy_agent = (src_gpu) ? &src_agent : &dst_agent; return copy_agent->DmaCopy(dst, dst_agent, src, src_agent, size, dep_signals, completion_signal); } // For cpu to cpu, fire and forget a copy thread. const bool profiling_enabled = (dst_agent.profiling_enabled() || src_agent.profiling_enabled()); if (profiling_enabled) completion_signal.async_copy_agent(&dst_agent); std::thread( [](void* dst, const void* src, size_t size, std::vector dep_signals, core::Signal* completion_signal, bool profiling_enabled) { for (core::Signal* dep : dep_signals) { dep->WaitRelaxed(HSA_SIGNAL_CONDITION_EQ, 0, UINT64_MAX, HSA_WAIT_STATE_BLOCKED); } if (profiling_enabled) { core::Runtime::runtime_singleton_->GetSystemInfo(HSA_SYSTEM_INFO_TIMESTAMP, &completion_signal->signal_.start_ts); } memcpy(dst, src, size); if (profiling_enabled) { core::Runtime::runtime_singleton_->GetSystemInfo(HSA_SYSTEM_INFO_TIMESTAMP, &completion_signal->signal_.end_ts); } completion_signal->SubRelease(1); }, dst, src, size, dep_signals, &completion_signal, profiling_enabled).detach(); return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::FillMemory(void* ptr, uint32_t value, size_t count) { // Choose blit agent from pointer info hsa_amd_pointer_info_t info; uint32_t agent_count; hsa_agent_t* accessible = nullptr; info.size = sizeof(info); MAKE_SCOPE_GUARD([&]() { free(accessible); }); hsa_status_t err = PtrInfo(ptr, &info, malloc, &agent_count, &accessible); if (err != HSA_STATUS_SUCCESS) return err; ptrdiff_t endPtr = (ptrdiff_t)ptr + count * sizeof(uint32_t); // Check for GPU fill // Selects GPU fill for SVM and Locked allocations if a GPU address is given and is mapped. if (info.agentBaseAddress <= ptr && endPtr <= (ptrdiff_t)info.agentBaseAddress + info.sizeInBytes) { core::Agent* blit_agent = core::Agent::Convert(info.agentOwner); if (blit_agent->device_type() != core::Agent::DeviceType::kAmdGpuDevice) { blit_agent = nullptr; for (uint32_t i = 0; i < agent_count; i++) { if (core::Agent::Convert(accessible[i])->device_type() == core::Agent::DeviceType::kAmdGpuDevice) { blit_agent = core::Agent::Convert(accessible[i]); break; } } } if (blit_agent) return blit_agent->DmaFill(ptr, value, count); } // Host and unmapped SVM addresses copy via host. if (info.hostBaseAddress <= ptr && endPtr <= (ptrdiff_t)info.hostBaseAddress + info.sizeInBytes) { memset(ptr, value, count * sizeof(uint32_t)); return HSA_STATUS_SUCCESS; } return HSA_STATUS_ERROR_INVALID_ALLOCATION; } hsa_status_t Runtime::AllowAccess(uint32_t num_agents, const hsa_agent_t* agents, const void* ptr) { const AMD::MemoryRegion* amd_region = NULL; size_t alloc_size = 0; { ScopedAcquire lock(&memory_lock_); std::map::const_iterator it = allocation_map_.find(ptr); if (it == allocation_map_.end()) { return HSA_STATUS_ERROR; } amd_region = reinterpret_cast(it->second.region); alloc_size = it->second.size; } return amd_region->AllowAccess(num_agents, agents, ptr, alloc_size); } hsa_status_t Runtime::GetSystemInfo(hsa_system_info_t attribute, void* value) { switch (attribute) { case HSA_SYSTEM_INFO_VERSION_MAJOR: *((uint16_t*)value) = HSA_VERSION_MAJOR; break; case HSA_SYSTEM_INFO_VERSION_MINOR: *((uint16_t*)value) = HSA_VERSION_MINOR; break; case HSA_SYSTEM_INFO_TIMESTAMP: { HsaClockCounters clocks; hsaKmtGetClockCounters(0, &clocks); *((uint64_t*)value) = clocks.SystemClockCounter; break; } case HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY: { assert(sys_clock_freq_ != 0 && "Use of HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY before HSA " "initialization completes."); *(uint64_t*)value = sys_clock_freq_; break; } case HSA_SYSTEM_INFO_SIGNAL_MAX_WAIT: *((uint64_t*)value) = 0xFFFFFFFFFFFFFFFF; break; case HSA_SYSTEM_INFO_ENDIANNESS: #if defined(HSA_LITTLE_ENDIAN) *((hsa_endianness_t*)value) = HSA_ENDIANNESS_LITTLE; #else *((hsa_endianness_t*)value) = HSA_ENDIANNESS_BIG; #endif break; case HSA_SYSTEM_INFO_MACHINE_MODEL: #if defined(HSA_LARGE_MODEL) *((hsa_machine_model_t*)value) = HSA_MACHINE_MODEL_LARGE; #else *((hsa_machine_model_t*)value) = HSA_MACHINE_MODEL_SMALL; #endif break; case HSA_SYSTEM_INFO_EXTENSIONS: { memset(value, 0, sizeof(uint8_t) * 128); auto setFlag = [&](uint32_t bit) { assert(bit < 128 * 8 && "Extension value exceeds extension bitmask"); uint index = bit / 8; uint subBit = bit % 8; ((uint8_t*)value)[index] |= 1 << subBit; }; if (hsa_internal_api_table_.finalizer_api.hsa_ext_program_finalize_fn != NULL) { setFlag(HSA_EXTENSION_FINALIZER); } if (hsa_internal_api_table_.image_api.hsa_ext_image_create_fn != NULL) { setFlag(HSA_EXTENSION_IMAGES); } if (os::LibHandle lib = os::LoadLib(kAqlProfileLib)) { os::CloseLib(lib); setFlag(HSA_EXTENSION_AMD_AQLPROFILE); } setFlag(HSA_EXTENSION_AMD_PROFILER); break; } case HSA_AMD_SYSTEM_INFO_BUILD_VERSION: { *(const char**)value = STRING(ROCR_BUILD_ID); break; } case HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED: { bool ret = true; for (auto agent : gpu_agents_) { AMD::GpuAgent* gpu = (AMD::GpuAgent*)agent; ret &= (gpu->properties().Capability.ui32.SVMAPISupported == 1); } *(bool*)value = ret; break; } case HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT: { bool ret = true; for(auto agent : gpu_agents_) ret &= (agent->isa()->GetXnack() == IsaFeature::Enabled); *(bool*)value = ret; break; } default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::SetAsyncSignalHandler(hsa_signal_t signal, hsa_signal_condition_t cond, hsa_signal_value_t value, hsa_amd_signal_handler handler, void* arg) { // Indicate that this signal is in use. if (signal.handle != 0) hsa_signal_handle(signal)->Retain(); ScopedAcquire scope_lock(&async_events_control_.lock); // Lazy initializer if (async_events_control_.async_events_thread_ == NULL) { // Create monitoring thread control signal auto err = HSA::hsa_signal_create(0, 0, NULL, &async_events_control_.wake); if (err != HSA_STATUS_SUCCESS) { assert(false && "Asyncronous events control signal creation error."); return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } async_events_.PushBack(async_events_control_.wake, HSA_SIGNAL_CONDITION_NE, 0, NULL, NULL); // Start event monitoring thread async_events_control_.exit = false; async_events_control_.async_events_thread_ = os::CreateThread(AsyncEventsLoop, NULL); if (async_events_control_.async_events_thread_ == NULL) { assert(false && "Asyncronous events thread creation error."); return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } } new_async_events_.PushBack(signal, cond, value, handler, arg); hsa_signal_handle(async_events_control_.wake)->StoreRelease(1); return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::InteropMap(uint32_t num_agents, Agent** agents, int interop_handle, uint32_t flags, size_t* size, void** ptr, size_t* metadata_size, const void** metadata) { static const int tinyArraySize=8; HsaGraphicsResourceInfo info; HSAuint32 short_nodes[tinyArraySize]; HSAuint32* nodes = short_nodes; if (num_agents > tinyArraySize) { nodes = new HSAuint32[num_agents]; if (nodes == NULL) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } MAKE_SCOPE_GUARD([&]() { if (num_agents > tinyArraySize) delete[] nodes; }); for (uint32_t i = 0; i < num_agents; i++) agents[i]->GetInfo((hsa_agent_info_t)HSA_AMD_AGENT_INFO_DRIVER_NODE_ID, &nodes[i]); if (hsaKmtRegisterGraphicsHandleToNodes(interop_handle, &info, num_agents, nodes) != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR; HSAuint64 altAddress; HsaMemMapFlags map_flags; map_flags.Value = 0; map_flags.ui32.PageSize = HSA_PAGE_SIZE_64KB; if (hsaKmtMapMemoryToGPUNodes(info.MemoryAddress, info.SizeInBytes, &altAddress, map_flags, num_agents, nodes) != HSAKMT_STATUS_SUCCESS) { map_flags.ui32.PageSize = HSA_PAGE_SIZE_4KB; if (hsaKmtMapMemoryToGPUNodes(info.MemoryAddress, info.SizeInBytes, &altAddress, map_flags, num_agents, nodes) != HSAKMT_STATUS_SUCCESS) { hsaKmtDeregisterMemory(info.MemoryAddress); return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } } if (metadata_size != NULL) *metadata_size = info.MetadataSizeInBytes; if (metadata != NULL) *metadata = info.Metadata; *size = info.SizeInBytes; *ptr = info.MemoryAddress; return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::InteropUnmap(void* ptr) { if(hsaKmtUnmapMemoryToGPU(ptr)!=HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; if(hsaKmtDeregisterMemory(ptr)!=HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::PtrInfo(const void* ptr, hsa_amd_pointer_info_t* info, void* (*alloc)(size_t), uint32_t* num_agents_accessible, hsa_agent_t** accessible, PtrInfoBlockData* block_info) { static_assert(static_cast(HSA_POINTER_UNKNOWN) == static_cast(HSA_EXT_POINTER_TYPE_UNKNOWN), "Thunk pointer info mismatch"); static_assert(static_cast(HSA_POINTER_ALLOCATED) == static_cast(HSA_EXT_POINTER_TYPE_HSA), "Thunk pointer info mismatch"); static_assert(static_cast(HSA_POINTER_REGISTERED_USER) == static_cast(HSA_EXT_POINTER_TYPE_LOCKED), "Thunk pointer info mismatch"); static_assert(static_cast(HSA_POINTER_REGISTERED_GRAPHICS) == static_cast(HSA_EXT_POINTER_TYPE_GRAPHICS), "Thunk pointer info mismatch"); HsaPointerInfo thunkInfo; uint32_t* mappedNodes; hsa_amd_pointer_info_t retInfo = {0}; // check output struct has an initialized size. if (info->size == 0) return HSA_STATUS_ERROR_INVALID_ARGUMENT; bool returnListData = ((alloc != nullptr) && (num_agents_accessible != nullptr) && (accessible != nullptr)); { // memory_lock protects access to the NMappedNodes array and fragment user data since these may // change with calls to memory APIs. ScopedAcquire lock(&memory_lock_); // We don't care if this returns an error code. // The type will be HSA_EXT_POINTER_TYPE_UNKNOWN if so. auto err = hsaKmtQueryPointerInfo(ptr, &thunkInfo); assert(((err == HSAKMT_STATUS_SUCCESS) || (thunkInfo.Type == HSA_POINTER_UNKNOWN)) && "Thunk ptr info error and not type HSA_POINTER_UNKNOWN."); if (returnListData) { assert(thunkInfo.NMappedNodes <= agents_by_node_.size() && "PointerInfo: Thunk returned more than all agents in NMappedNodes."); mappedNodes = (uint32_t*)alloca(thunkInfo.NMappedNodes * sizeof(uint32_t)); memcpy(mappedNodes, thunkInfo.MappedNodes, thunkInfo.NMappedNodes * sizeof(uint32_t)); } retInfo.type = (hsa_amd_pointer_type_t)thunkInfo.Type; retInfo.agentBaseAddress = reinterpret_cast(thunkInfo.GPUAddress); retInfo.hostBaseAddress = thunkInfo.CPUAddress; retInfo.sizeInBytes = thunkInfo.SizeInBytes; retInfo.userData = thunkInfo.UserData; retInfo.global_flags = thunkInfo.MemFlags.ui32.CoarseGrain ? HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_COARSE_GRAINED : HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_FINE_GRAINED; retInfo.global_flags |= thunkInfo.MemFlags.ui32.Uncached ? HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_KERNARG_INIT : 0; if (block_info != nullptr) { // Block_info reports the thunk allocation from which we may have suballocated. // For locked memory we want to return the host address since hostBaseAddress is used to // manipulate locked memory and it is possible that hostBaseAddress is different from // agentBaseAddress. // For device memory, hostBaseAddress is either equal to agentBaseAddress or is NULL when the // CPU does not have access. assert((retInfo.hostBaseAddress || retInfo.agentBaseAddress) && "Thunk pointer info returned no base address."); block_info->base = (retInfo.hostBaseAddress ? retInfo.hostBaseAddress : retInfo.agentBaseAddress); block_info->length = retInfo.sizeInBytes; } auto fragment = allocation_map_.upper_bound(ptr); if (fragment != allocation_map_.begin()) { fragment--; if ((fragment->first <= ptr) && (ptr < reinterpret_cast(fragment->first) + fragment->second.size)) { // agent and host address must match here. Only lock memory is allowed to have differing // addresses but lock memory has type HSA_EXT_POINTER_TYPE_LOCKED and cannot be // suballocated. retInfo.agentBaseAddress = const_cast(fragment->first); retInfo.hostBaseAddress = retInfo.agentBaseAddress; retInfo.sizeInBytes = fragment->second.size; retInfo.userData = fragment->second.user_ptr; } } } // end lock scope retInfo.size = Min(size_t(info->size), sizeof(hsa_amd_pointer_info_t)); // IPC and Graphics memory may come from a node that does not have an agent in this process. // Ex. ROCR_VISIBLE_DEVICES or peer GPU is not supported by ROCm. auto nodeAgents = agents_by_node_.find(thunkInfo.Node); if (nodeAgents != agents_by_node_.end()) retInfo.agentOwner = nodeAgents->second[0]->public_handle(); else retInfo.agentOwner.handle = 0; // Correct agentOwner for locked memory. Thunk reports the GPU that owns the // alias but users are expecting to see a CPU when the memory is system. if (retInfo.type == HSA_EXT_POINTER_TYPE_LOCKED) { if ((nodeAgents == agents_by_node_.end()) || (nodeAgents->second[0]->device_type() != core::Agent::kAmdCpuDevice)) { retInfo.agentOwner = cpu_agents_[0]->public_handle(); } } memcpy(info, &retInfo, retInfo.size); if (returnListData) { uint32_t count = 0; for (HSAuint32 i = 0; i < thunkInfo.NMappedNodes; i++) { assert(mappedNodes[i] <= max_node_id() && "PointerInfo: Invalid node ID returned from thunk."); count += agents_by_node_[mappedNodes[i]].size(); } AMD::callback_t Alloc(alloc); *accessible = (hsa_agent_t*)Alloc(sizeof(hsa_agent_t) * count); if ((*accessible) == nullptr) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; *num_agents_accessible = count; uint32_t index = 0; for (HSAuint32 i = 0; i < thunkInfo.NMappedNodes; i++) { auto& list = agents_by_node_[mappedNodes[i]]; for (auto agent : list) { (*accessible)[index] = agent->public_handle(); index++; } } } return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::SetPtrInfoData(const void* ptr, void* userptr) { { // Use allocation map if possible to handle fragments. ScopedAcquire lock(&memory_lock_); const auto& it = allocation_map_.find(ptr); if (it != allocation_map_.end()) { it->second.user_ptr = userptr; return HSA_STATUS_SUCCESS; } } // Cover entries not in the allocation map (graphics, lock,...) if (hsaKmtSetMemoryUserData(ptr, userptr) == HSAKMT_STATUS_SUCCESS) return HSA_STATUS_SUCCESS; return HSA_STATUS_ERROR_INVALID_ARGUMENT; } hsa_status_t Runtime::IPCCreate(void* ptr, size_t len, hsa_amd_ipc_memory_t* handle) { static_assert(sizeof(hsa_amd_ipc_memory_t) == sizeof(HsaSharedMemoryHandle), "Thunk IPC mismatch."); // Reject sharing allocations larger than ~8TB due to thunk limitations. if (len > 0x7FFFFFFF000ull) return HSA_STATUS_ERROR_INVALID_ARGUMENT; // Check for fragment sharing. PtrInfoBlockData block; hsa_amd_pointer_info_t info; info.size = sizeof(info); if (PtrInfo(ptr, &info, nullptr, nullptr, nullptr, &block) != HSA_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; if ((info.agentBaseAddress != ptr) || (info.sizeInBytes != len)) return HSA_STATUS_ERROR_INVALID_ARGUMENT; if ((block.base != ptr) || (block.length != len)) { if (!IsMultipleOf(block.base, 2 * 1024 * 1024)) { assert(false && "Fragment's block not aligned to 2MB!"); return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (hsaKmtShareMemory(block.base, block.length, reinterpret_cast( handle)) != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; uint32_t offset = (reinterpret_cast(ptr) - reinterpret_cast(block.base)) / 4096; // Holds size in (4K?) pages in thunk handle: Mark as a fragment and denote offset. handle->handle[6] |= 0x80000000 | offset; // Mark block for IPC. Prevents reallocation of exported memory. ScopedAcquire lock(memory_lock_.shared()); hsa_status_t err = allocation_map_[ptr].region->IPCFragmentExport(ptr); assert(err == HSA_STATUS_SUCCESS && "Region inconsistent with address map."); return err; } else { if (hsaKmtShareMemory(ptr, len, reinterpret_cast(handle)) != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::IPCAttach(const hsa_amd_ipc_memory_t* handle, size_t len, uint32_t num_agents, Agent** agents, void** mapped_ptr) { static const int tinyArraySize = 8; void* importAddress; HSAuint64 importSize; HSAuint64 altAddress; hsa_amd_ipc_memory_t importHandle; importHandle = *handle; // Extract fragment info bool isFragment = false; uint32_t fragOffset = 0; auto fixFragment = [&]() { if (!isFragment) return; importAddress = reinterpret_cast(importAddress) + fragOffset; len = Min(len, importSize - fragOffset); ScopedAcquire lock(&memory_lock_); allocation_map_[importAddress] = AllocationRegion(nullptr, len); }; if ((importHandle.handle[6] & 0x80000000) != 0) { isFragment = true; fragOffset = (importHandle.handle[6] & 0x1FF) * 4096; importHandle.handle[6] &= ~(0x80000000 | 0x1FF); } if (num_agents == 0) { if (hsaKmtRegisterSharedHandle(reinterpret_cast(&importHandle), &importAddress, &importSize) != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; if (hsaKmtMapMemoryToGPU(importAddress, importSize, &altAddress) != HSAKMT_STATUS_SUCCESS) { hsaKmtDeregisterMemory(importAddress); return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } fixFragment(); *mapped_ptr = importAddress; return HSA_STATUS_SUCCESS; } HSAuint32* nodes = nullptr; if (num_agents > tinyArraySize) nodes = new HSAuint32[num_agents]; else nodes = (HSAuint32*)alloca(sizeof(HSAuint32) * num_agents); if (nodes == NULL) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; MAKE_SCOPE_GUARD([&]() { if (num_agents > tinyArraySize) delete[] nodes; }); for (uint32_t i = 0; i < num_agents; i++) agents[i]->GetInfo((hsa_agent_info_t)HSA_AMD_AGENT_INFO_DRIVER_NODE_ID, &nodes[i]); if (hsaKmtRegisterSharedHandleToNodes( reinterpret_cast(&importHandle), &importAddress, &importSize, num_agents, nodes) != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; HsaMemMapFlags map_flags; map_flags.Value = 0; map_flags.ui32.PageSize = HSA_PAGE_SIZE_64KB; if (hsaKmtMapMemoryToGPUNodes(importAddress, importSize, &altAddress, map_flags, num_agents, nodes) != HSAKMT_STATUS_SUCCESS) { map_flags.ui32.PageSize = HSA_PAGE_SIZE_4KB; if (hsaKmtMapMemoryToGPUNodes(importAddress, importSize, &altAddress, map_flags, num_agents, nodes) != HSAKMT_STATUS_SUCCESS) { hsaKmtDeregisterMemory(importAddress); return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } } fixFragment(); *mapped_ptr = importAddress; return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::IPCDetach(void* ptr) { { // Handle imported fragments. ScopedAcquire lock(&memory_lock_); const auto& it = allocation_map_.find(ptr); if (it != allocation_map_.end()) { if (it->second.region != nullptr) return HSA_STATUS_ERROR_INVALID_ARGUMENT; allocation_map_.erase(it); lock.Release(); // Can't hold memory lock when using pointer info. PtrInfoBlockData block; hsa_amd_pointer_info_t info; info.size = sizeof(info); if (PtrInfo(ptr, &info, nullptr, nullptr, nullptr, &block) != HSA_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; ptr = block.base; } } if (hsaKmtUnmapMemoryToGPU(ptr) != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; if (hsaKmtDeregisterMemory(ptr) != HSAKMT_STATUS_SUCCESS) return HSA_STATUS_ERROR_INVALID_ARGUMENT; return HSA_STATUS_SUCCESS; } void Runtime::AsyncEventsLoop(void*) { auto& async_events_control_ = runtime_singleton_->async_events_control_; auto& async_events_ = runtime_singleton_->async_events_; auto& new_async_events_ = runtime_singleton_->new_async_events_; while (!async_events_control_.exit) { // Wait for a signal hsa_signal_value_t value; uint32_t index = AMD::hsa_amd_signal_wait_any( uint32_t(async_events_.Size()), &async_events_.signal_[0], &async_events_.cond_[0], &async_events_.value_[0], uint64_t(-1), HSA_WAIT_STATE_BLOCKED, &value); // Reset the control signal if (index == 0) { hsa_signal_handle(async_events_control_.wake)->StoreRelaxed(0); } else if (index != -1) { // No error or timout occured, process the handlers // Call handler for the known satisfied signal. assert(async_events_.handler_[index] != NULL); bool keep = async_events_.handler_[index](value, async_events_.arg_[index]); if (!keep) { hsa_signal_handle(async_events_.signal_[index])->Release(); async_events_.CopyIndex(index, async_events_.Size() - 1); async_events_.PopBack(); } // Check remaining signals before sleeping. for (size_t i = index; i < async_events_.Size(); i++) { hsa_signal_handle sig(async_events_.signal_[i]); value = atomic::Load(&sig->signal_.value, std::memory_order_relaxed); bool condition_met = false; switch (async_events_.cond_[i]) { case HSA_SIGNAL_CONDITION_EQ: { condition_met = (value == async_events_.value_[i]); break; } case HSA_SIGNAL_CONDITION_NE: { condition_met = (value != async_events_.value_[i]); break; } case HSA_SIGNAL_CONDITION_GTE: { condition_met = (value >= async_events_.value_[i]); break; } case HSA_SIGNAL_CONDITION_LT: { condition_met = (value < async_events_.value_[i]); break; } } if (condition_met) { assert(async_events_.handler_[i] != NULL); bool keep = async_events_.handler_[i](value, async_events_.arg_[i]); if (!keep) { hsa_signal_handle(async_events_.signal_[i])->Release(); async_events_.CopyIndex(i, async_events_.Size() - 1); async_events_.PopBack(); i--; } } } } // Check for dead signals index = 0; while (index != async_events_.Size()) { if (!hsa_signal_handle(async_events_.signal_[index])->IsValid()) { hsa_signal_handle(async_events_.signal_[index])->Release(); async_events_.CopyIndex(index, async_events_.Size() - 1); async_events_.PopBack(); continue; } index++; } // Insert new signals and find plain functions typedef std::pair func_arg_t; std::vector functions; { ScopedAcquire scope_lock(&async_events_control_.lock); for (size_t i = 0; i < new_async_events_.Size(); i++) { if (new_async_events_.signal_[i].handle == 0) { functions.push_back( func_arg_t((void (*)(void*))new_async_events_.handler_[i], new_async_events_.arg_[i])); continue; } async_events_.PushBack( new_async_events_.signal_[i], new_async_events_.cond_[i], new_async_events_.value_[i], new_async_events_.handler_[i], new_async_events_.arg_[i]); } new_async_events_.Clear(); } // Call plain functions for (size_t i = 0; i < functions.size(); i++) functions[i].first(functions[i].second); functions.clear(); } // Release wait count of all pending signals for (size_t i = 1; i < async_events_.Size(); i++) hsa_signal_handle(async_events_.signal_[i])->Release(); async_events_.Clear(); for (size_t i = 0; i < new_async_events_.Size(); i++) hsa_signal_handle(new_async_events_.signal_[i])->Release(); new_async_events_.Clear(); } void Runtime::BindVmFaultHandler() { if (core::g_use_interrupt_wait && !gpu_agents_.empty()) { // Create memory event with manual reset to avoid racing condition // with driver in case of multiple concurrent VM faults. vm_fault_event_ = core::InterruptSignal::CreateEvent(HSA_EVENTTYPE_MEMORY, true); // Create an interrupt signal object to contain the memory event. // This signal object will be registered with the async handler global // thread. vm_fault_signal_ = new core::InterruptSignal(0, vm_fault_event_); if (!vm_fault_signal_->IsValid() || vm_fault_signal_->EopEvent() == NULL) { assert(false && "Failed on creating VM fault signal"); return; } SetAsyncSignalHandler(core::Signal::Convert(vm_fault_signal_), HSA_SIGNAL_CONDITION_NE, 0, VMFaultHandler, reinterpret_cast(vm_fault_signal_)); } } bool Runtime::VMFaultHandler(hsa_signal_value_t val, void* arg) { core::InterruptSignal* vm_fault_signal = reinterpret_cast(arg); assert(vm_fault_signal != NULL); if (vm_fault_signal == NULL) { return false; } HsaEvent* vm_fault_event = vm_fault_signal->EopEvent(); HsaMemoryAccessFault& fault = vm_fault_event->EventData.EventData.MemoryAccessFault; hsa_status_t custom_handler_status = HSA_STATUS_ERROR; auto system_event_handlers = runtime_singleton_->GetSystemEventHandlers(); // If custom handler is registered, pack the fault info and call the handler if (!system_event_handlers.empty()) { hsa_amd_event_t memory_fault_event; memory_fault_event.event_type = HSA_AMD_GPU_MEMORY_FAULT_EVENT; hsa_amd_gpu_memory_fault_info_t& fault_info = memory_fault_event.memory_fault; // Find the faulty agent auto it = runtime_singleton_->agents_by_node_.find(fault.NodeId); assert(it != runtime_singleton_->agents_by_node_.end() && "Can't find faulty agent."); Agent* faulty_agent = it->second.front(); fault_info.agent = Agent::Convert(faulty_agent); fault_info.virtual_address = fault.VirtualAddress; fault_info.fault_reason_mask = 0; if (fault.Failure.NotPresent == 1) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_PAGE_NOT_PRESENT; } if (fault.Failure.ReadOnly == 1) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_READ_ONLY; } if (fault.Failure.NoExecute == 1) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_NX; } if (fault.Failure.GpuAccess == 1) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_HOST_ONLY; } if (fault.Failure.Imprecise == 1) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_IMPRECISE; } if (fault.Failure.ECC == 1 && fault.Failure.ErrorType == 0) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_DRAMECC; } if (fault.Failure.ErrorType == 1) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_SRAMECC; } if (fault.Failure.ErrorType == 2) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_DRAMECC; } if (fault.Failure.ErrorType == 3) { fault_info.fault_reason_mask |= HSA_AMD_MEMORY_FAULT_HANG; } for (auto& callback : system_event_handlers) { hsa_status_t err = callback.first(&memory_fault_event, callback.second); if (err == HSA_STATUS_SUCCESS) custom_handler_status = HSA_STATUS_SUCCESS; } } // No custom VM fault handler registered or it failed. if (custom_handler_status != HSA_STATUS_SUCCESS) { if (runtime_singleton_->flag().enable_vm_fault_message()) { std::string reason = ""; if (fault.Failure.NotPresent == 1) { reason += "Page not present or supervisor privilege"; } else if (fault.Failure.ReadOnly == 1) { reason += "Write access to a read-only page"; } else if (fault.Failure.NoExecute == 1) { reason += "Execute access to a page marked NX"; } else if (fault.Failure.GpuAccess == 1) { reason += "Host access only"; } else if ((fault.Failure.ECC == 1 && fault.Failure.ErrorType == 0) || fault.Failure.ErrorType == 2) { reason += "DRAM ECC failure"; } else if (fault.Failure.ErrorType == 1) { reason += "SRAM ECC failure"; } else if (fault.Failure.ErrorType == 3) { reason += "Generic hang recovery"; } else { reason += "Unknown"; } core::Agent* faultingAgent = runtime_singleton_->agents_by_node_[fault.NodeId][0]; fprintf( stderr, "Memory access fault by GPU node-%u (Agent handle: %p) on address %p%s. Reason: %s.\n", fault.NodeId, reinterpret_cast(faultingAgent->public_handle().handle), reinterpret_cast(fault.VirtualAddress), (fault.Failure.Imprecise == 1) ? "(may not be exact address)" : "", reason.c_str()); #ifndef NDEBUG PrintMemoryMapNear(reinterpret_cast(fault.VirtualAddress)); #endif } assert(false && "GPU memory access fault."); std::abort(); } // No need to keep the signal because we are done. return false; } void Runtime::PrintMemoryMapNear(void* ptr) { runtime_singleton_->memory_lock_.Acquire(); auto it = runtime_singleton_->allocation_map_.upper_bound(ptr); for (int i = 0; i < 2; i++) { if (it != runtime_singleton_->allocation_map_.begin()) it--; } fprintf(stderr, "Nearby memory map:\n"); auto start = it; for (int i = 0; i < 3; i++) { if (it == runtime_singleton_->allocation_map_.end()) break; std::string kind = "Non-HSA"; if (it->second.region != nullptr) { const AMD::MemoryRegion* region = static_cast(it->second.region); if (region->IsSystem()) kind = "System"; else if (region->IsLocalMemory()) kind = "VRAM"; else if (region->IsScratch()) kind = "Scratch"; else if (region->IsLDS()) kind = "LDS"; } fprintf(stderr, "%p, 0x%lx, %s\n", it->first, it->second.size, kind.c_str()); it++; } fprintf(stderr, "\n"); it = start; runtime_singleton_->memory_lock_.Release(); hsa_amd_pointer_info_t info; PtrInfoBlockData block; uint32_t count; hsa_agent_t* canAccess; info.size = sizeof(info); for (int i = 0; i < 3; i++) { if (it == runtime_singleton_->allocation_map_.end()) break; runtime_singleton_->PtrInfo(const_cast(it->first), &info, malloc, &count, &canAccess, &block); fprintf(stderr, "PtrInfo:\n\tAddress: %p-%p/%p-%p\n\tSize: 0x%lx\n\tType: %u\n\tOwner: %p\n", info.agentBaseAddress, (char*)info.agentBaseAddress + info.sizeInBytes, info.hostBaseAddress, (char*)info.hostBaseAddress + info.sizeInBytes, info.sizeInBytes, info.type, reinterpret_cast(info.agentOwner.handle)); fprintf(stderr, "\tCanAccess: %u\n", count); for (int t = 0; t < count; t++) fprintf(stderr, "\t\t%p\n", reinterpret_cast(canAccess[t].handle)); fprintf(stderr, "\tIn block: %p, 0x%lx\n", block.base, block.length); free(canAccess); it++; } } Runtime::Runtime() : region_gpu_(nullptr), sys_clock_freq_(0), vm_fault_event_(nullptr), vm_fault_signal_(nullptr), ref_count_(0), kfd_version{0} {} hsa_status_t Runtime::Load() { flag_.Refresh(); g_use_interrupt_wait = flag_.enable_interrupt(); if (!AMD::Load()) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } // Setup system clock frequency for the first time. if (sys_clock_freq_ == 0) { // Cache system clock frequency HsaClockCounters clocks; hsaKmtGetClockCounters(0, &clocks); sys_clock_freq_ = clocks.SystemClockFrequencyHz; } BindVmFaultHandler(); loader_ = amd::hsa::loader::Loader::Create(&loader_context_); // Load extensions LoadExtensions(); // Initialize per GPU scratch, blits, and trap handler for (core::Agent* agent : gpu_agents_) { hsa_status_t status = reinterpret_cast(agent)->PostToolsInit(); if (status != HSA_STATUS_SUCCESS) { return status; } } // Load tools libraries LoadTools(); return HSA_STATUS_SUCCESS; } void Runtime::Unload() { UnloadTools(); UnloadExtensions(); amd::hsa::loader::Loader::Destroy(loader_); loader_ = nullptr; std::for_each(gpu_agents_.begin(), gpu_agents_.end(), DeleteObject()); gpu_agents_.clear(); async_events_control_.Shutdown(); if (vm_fault_signal_ != nullptr) { vm_fault_signal_->DestroySignal(); vm_fault_signal_ = nullptr; } core::InterruptSignal::DestroyEvent(vm_fault_event_); vm_fault_event_ = nullptr; SharedSignalPool.clear(); EventPool.clear(); DestroyAgents(); CloseTools(); AMD::Unload(); } void Runtime::LoadExtensions() { // Load finalizer and extension library #ifdef HSA_LARGE_MODEL static const std::string kFinalizerLib[] = {"hsa-ext-finalize64.dll", "libhsa-ext-finalize64.so.1"}; #else static const std::string kFinalizerLib[] = {"hsa-ext-finalize.dll", "libhsa-ext-finalize.so.1"}; #endif // Update Hsa Api Table with handle of Finalizer extension Apis // Skipping finalizer loading since finalizer is no longer distributed. // LinkExts will expose the finalizer-not-present implementation. // extensions_.LoadFinalizer(kFinalizerLib[os_index(os::current_os)]); hsa_api_table_.LinkExts(&extensions_.finalizer_api, core::HsaApiTable::HSA_EXT_FINALIZER_API_TABLE_ID); // Update Hsa Api Table with handle of Image extension Apis extensions_.LoadImage(); hsa_api_table_.LinkExts(&extensions_.image_api, core::HsaApiTable::HSA_EXT_IMAGE_API_TABLE_ID); } void Runtime::UnloadExtensions() { extensions_.Unload(); } static std::vector parse_tool_names(std::string tool_names) { std::vector names; std::string name = ""; bool quoted = false; while (tool_names.size() != 0) { auto index = tool_names.find_first_of(" \"\\"); if (index == std::string::npos) { name += tool_names; break; } switch (tool_names[index]) { case ' ': { if (!quoted) { name += tool_names.substr(0, index); tool_names.erase(0, index + 1); names.push_back(name); name = ""; } else { name += tool_names.substr(0, index + 1); tool_names.erase(0, index + 1); } break; } case '\"': { if (quoted) { quoted = false; name += tool_names.substr(0, index); tool_names.erase(0, index + 1); names.push_back(name); name = ""; } else { quoted = true; tool_names.erase(0, index + 1); } break; } case '\\': { if (tool_names.size() > index + 1) { name += tool_names.substr(0, index) + tool_names[index + 1]; tool_names.erase(0, index + 2); } break; } } // end switch } // end while if (name != "") names.push_back(name); return names; } void Runtime::LoadTools() { typedef bool (*tool_init_t)(::HsaApiTable*, uint64_t, uint64_t, const char* const*); typedef Agent* (*tool_wrap_t)(Agent*); typedef void (*tool_add_t)(Runtime*); // Load tool libs std::string tool_names = flag_.tools_lib_names(); if (tool_names != "") { std::vector names = parse_tool_names(tool_names); std::vector failed; for (auto& name : names) { os::LibHandle tool = os::LoadLib(name); if (tool != NULL) { tool_libs_.push_back(tool); rocr::AMD::callback_t ld = (tool_init_t)os::GetExportAddress(tool, "OnLoad"); if (ld) { if (!ld(&hsa_api_table_.hsa_api, hsa_api_table_.hsa_api.version.major_id, failed.size(), &failed[0])) { failed.push_back(name.c_str()); os::CloseLib(tool); continue; } } rocr::AMD::callback_t wrap = (tool_wrap_t)os::GetExportAddress(tool, "WrapAgent"); if (wrap) { std::vector* agent_lists[2] = {&cpu_agents_, &gpu_agents_}; for (std::vector* agent_list : agent_lists) { for (size_t agent_idx = 0; agent_idx < agent_list->size(); ++agent_idx) { Agent* agent = wrap(agent_list->at(agent_idx)); if (agent != NULL) { assert(agent->IsValid() && "Agent returned from WrapAgent is not valid"); agent_list->at(agent_idx) = agent; } } } } rocr::AMD::callback_t add = (tool_add_t)os::GetExportAddress(tool, "AddAgent"); if (add) add(this); } else { if (flag().report_tool_load_failures()) fprintf(stderr, "Tool lib \"%s\" failed to load.\n", name.c_str()); } } } } void Runtime::UnloadTools() { typedef void (*tool_unload_t)(); for (size_t i = tool_libs_.size(); i != 0; i--) { tool_unload_t unld; unld = (tool_unload_t)os::GetExportAddress(tool_libs_[i - 1], "OnUnload"); if (unld) unld(); } // Reset API table in case some tool doesn't cleanup properly hsa_api_table_.Reset(); } void Runtime::CloseTools() { // Due to valgrind bug, runtime cannot dlclose extensions see: // http://valgrind.org/docs/manual/faq.html#faq.unhelpful if (!flag_.running_valgrind()) { for (auto& lib : tool_libs_) os::CloseLib(lib); } tool_libs_.clear(); } void Runtime::AsyncEventsControl::Shutdown() { if (async_events_thread_ != NULL) { exit = true; hsa_signal_handle(wake)->StoreRelaxed(1); os::WaitForThread(async_events_thread_); os::CloseThread(async_events_thread_); async_events_thread_ = NULL; HSA::hsa_signal_destroy(wake); } } void Runtime::AsyncEvents::PushBack(hsa_signal_t signal, hsa_signal_condition_t cond, hsa_signal_value_t value, hsa_amd_signal_handler handler, void* arg) { signal_.push_back(signal); cond_.push_back(cond); value_.push_back(value); handler_.push_back(handler); arg_.push_back(arg); } void Runtime::AsyncEvents::CopyIndex(size_t dst, size_t src) { signal_[dst] = signal_[src]; cond_[dst] = cond_[src]; value_[dst] = value_[src]; handler_[dst] = handler_[src]; arg_[dst] = arg_[src]; } size_t Runtime::AsyncEvents::Size() { return signal_.size(); } void Runtime::AsyncEvents::PopBack() { signal_.pop_back(); cond_.pop_back(); value_.pop_back(); handler_.pop_back(); arg_.pop_back(); } void Runtime::AsyncEvents::Clear() { signal_.clear(); cond_.clear(); value_.clear(); handler_.clear(); arg_.clear(); } hsa_status_t Runtime::SetCustomSystemEventHandler(hsa_amd_system_event_callback_t callback, void* data) { ScopedAcquire lock(&system_event_lock_); system_event_handlers_.push_back( std::make_pair(AMD::callback_t(callback), data)); return HSA_STATUS_SUCCESS; } std::vector, void*>> Runtime::GetSystemEventHandlers() { ScopedAcquire lock(&system_event_lock_); return system_event_handlers_; } hsa_status_t Runtime::SetInternalQueueCreateNotifier(hsa_amd_runtime_queue_notifier callback, void* user_data) { if (internal_queue_create_notifier_) { return HSA_STATUS_ERROR; } else { internal_queue_create_notifier_ = callback; internal_queue_create_notifier_user_data_ = user_data; return HSA_STATUS_SUCCESS; } } void Runtime::InternalQueueCreateNotify(const hsa_queue_t* queue, hsa_agent_t agent) { if (internal_queue_create_notifier_) internal_queue_create_notifier_(queue, agent, internal_queue_create_notifier_user_data_); } hsa_status_t Runtime::SetSvmAttrib(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count) { uint32_t set_attribs = 0; std::vector agent_seen(max_node_id() + 1, false); std::vector attribs; attribs.reserve(attribute_count); uint32_t set_flags = 0; uint32_t clear_flags = 0; auto Convert = [&](uint64_t value) -> Agent* { hsa_agent_t handle = {value}; Agent* agent = Agent::Convert(handle); if ((agent == nullptr) || !agent->IsValid()) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_AGENT, "Invalid agent handle in Runtime::SetSvmAttrib."); return agent; }; auto ConvertAllowNull = [&](uint64_t value) -> Agent* { hsa_agent_t handle = {value}; Agent* agent = Agent::Convert(handle); if ((agent != nullptr) && (!agent->IsValid())) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_AGENT, "Invalid agent handle in Runtime::SetSvmAttrib."); return agent; }; auto ConfirmNew = [&](Agent* agent) { if (agent_seen[agent->node_id()]) throw AMD::hsa_exception( HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS, "Multiple attributes given for the same agent in Runtime::SetSvmAttrib."); agent_seen[agent->node_id()] = true; }; auto Check = [&](uint64_t attrib) { if (set_attribs & (1 << attrib)) throw AMD::hsa_exception(HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS, "Attribute given multiple times in Runtime::SetSvmAttrib."); set_attribs |= (1 << attrib); }; auto kmtPair = [](uint32_t attrib, uint32_t value) { HSA_SVM_ATTRIBUTE pair = {attrib, value}; return pair; }; for (uint32_t i = 0; i < attribute_count; i++) { auto attrib = attribute_list[i].attribute; auto value = attribute_list[i].value; switch (attrib) { case HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG: { Check(attrib); switch (value) { case HSA_AMD_SVM_GLOBAL_FLAG_FINE_GRAINED: set_flags |= HSA_SVM_FLAG_COHERENT; break; case HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED: clear_flags |= HSA_SVM_FLAG_COHERENT; break; default: throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Invalid HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG value."); } break; } case HSA_AMD_SVM_ATTRIB_READ_ONLY: { Check(attrib); if (value) set_flags |= HSA_SVM_FLAG_GPU_RO; else clear_flags |= HSA_SVM_FLAG_GPU_RO; break; } case HSA_AMD_SVM_ATTRIB_HIVE_LOCAL: { Check(attrib); if (value) set_flags |= HSA_SVM_FLAG_HIVE_LOCAL; else clear_flags |= HSA_SVM_FLAG_HIVE_LOCAL; break; } case HSA_AMD_SVM_ATTRIB_MIGRATION_GRANULARITY: { Check(attrib); // Max migration size is 1GB. if (value > 18) value = 18; attribs.push_back(kmtPair(HSA_SVM_ATTR_GRANULARITY, value)); break; } case HSA_AMD_SVM_ATTRIB_PREFERRED_LOCATION: { Check(attrib); Agent* agent = ConvertAllowNull(value); if (agent == nullptr) attribs.push_back(kmtPair(HSA_SVM_ATTR_PREFERRED_LOC, INVALID_NODEID)); else attribs.push_back(kmtPair(HSA_SVM_ATTR_PREFERRED_LOC, agent->node_id())); break; } case HSA_AMD_SVM_ATTRIB_READ_MOSTLY: { Check(attrib); if (value) set_flags |= HSA_SVM_FLAG_GPU_READ_MOSTLY; else clear_flags |= HSA_SVM_FLAG_GPU_READ_MOSTLY; break; } case HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE: { Agent* agent = Convert(value); ConfirmNew(agent); if (agent->device_type() == Agent::kAmdCpuDevice) { set_flags |= HSA_SVM_FLAG_HOST_ACCESS; } else { attribs.push_back(kmtPair(HSA_SVM_ATTR_ACCESS, agent->node_id())); } break; } case HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE: { Agent* agent = Convert(value); ConfirmNew(agent); if (agent->device_type() == Agent::kAmdCpuDevice) { set_flags |= HSA_SVM_FLAG_HOST_ACCESS; } else { attribs.push_back(kmtPair(HSA_SVM_ATTR_ACCESS_IN_PLACE, agent->node_id())); } break; } case HSA_AMD_SVM_ATTRIB_AGENT_NO_ACCESS: { Agent* agent = Convert(value); ConfirmNew(agent); if (agent->device_type() == Agent::kAmdCpuDevice) { clear_flags |= HSA_SVM_FLAG_HOST_ACCESS; } else { attribs.push_back(kmtPair(HSA_SVM_ATTR_NO_ACCESS, agent->node_id())); } break; } default: throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Illegal or invalid attribute in Runtime::SetSvmAttrib"); } } // Merge CPU access properties - grant access if any CPU needs access. // Probably wrong. if (set_flags & HSA_SVM_FLAG_HOST_ACCESS) clear_flags &= ~HSA_SVM_FLAG_HOST_ACCESS; // Add flag updates if (clear_flags) attribs.push_back(kmtPair(HSA_SVM_ATTR_CLR_FLAGS, clear_flags)); if (set_flags) attribs.push_back(kmtPair(HSA_SVM_ATTR_SET_FLAGS, set_flags)); uint8_t* base = AlignDown((uint8_t*)ptr, 4096); uint8_t* end = AlignUp((uint8_t*)ptr + size, 4096); size_t len = end - base; HSAKMT_STATUS error = hsaKmtSVMSetAttr(base, len, attribs.size(), &attribs[0]); if (error != HSAKMT_STATUS_SUCCESS) throw AMD::hsa_exception(HSA_STATUS_ERROR, "hsaKmtSVMSetAttr failed."); return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::GetSvmAttrib(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count) { std::vector attribs; attribs.reserve(attribute_count); std::vector kmtIndices(attribute_count); bool getFlags = false; auto Convert = [&](uint64_t value) -> Agent* { hsa_agent_t handle = {value}; Agent* agent = Agent::Convert(handle); if ((agent == nullptr) || !agent->IsValid()) throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_AGENT, "Invalid agent handle in Runtime::GetSvmAttrib."); return agent; }; auto kmtPair = [](uint32_t attrib, uint32_t value) { HSA_SVM_ATTRIBUTE pair = {attrib, value}; return pair; }; for (uint32_t i = 0; i < attribute_count; i++) { auto& attrib = attribute_list[i].attribute; auto& value = attribute_list[i].value; switch (attrib) { case HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG: case HSA_AMD_SVM_ATTRIB_READ_ONLY: case HSA_AMD_SVM_ATTRIB_HIVE_LOCAL: case HSA_AMD_SVM_ATTRIB_READ_MOSTLY: { getFlags = true; kmtIndices[i] = -1; break; } case HSA_AMD_SVM_ATTRIB_MIGRATION_GRANULARITY: { kmtIndices[i] = attribs.size(); attribs.push_back(kmtPair(HSA_SVM_ATTR_GRANULARITY, 0)); break; } case HSA_AMD_SVM_ATTRIB_PREFERRED_LOCATION: { kmtIndices[i] = attribs.size(); attribs.push_back(kmtPair(HSA_SVM_ATTR_PREFERRED_LOC, 0)); break; } case HSA_AMD_SVM_ATTRIB_PREFETCH_LOCATION: { value = Agent::Convert(GetSVMPrefetchAgent(ptr, size)).handle; kmtIndices[i] = -1; break; } case HSA_AMD_SVM_ATTRIB_ACCESS_QUERY: { Agent* agent = Convert(value); if (agent->device_type() == Agent::kAmdCpuDevice) { getFlags = true; kmtIndices[i] = -1; } else { kmtIndices[i] = attribs.size(); attribs.push_back(kmtPair(HSA_SVM_ATTR_ACCESS, agent->node_id())); } break; } default: throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Illegal or invalid attribute in Runtime::SetSvmAttrib"); } } if (getFlags) { // Order is important to later code. attribs.push_back(kmtPair(HSA_SVM_ATTR_CLR_FLAGS, 0)); attribs.push_back(kmtPair(HSA_SVM_ATTR_SET_FLAGS, 0)); } uint8_t* base = AlignDown((uint8_t*)ptr, 4096); uint8_t* end = AlignUp((uint8_t*)ptr + size, 4096); size_t len = end - base; if (attribs.size() != 0) { HSAKMT_STATUS error = hsaKmtSVMGetAttr(base, len, attribs.size(), &attribs[0]); if (error != HSAKMT_STATUS_SUCCESS) throw AMD::hsa_exception(HSA_STATUS_ERROR, "hsaKmtSVMGetAttr failed."); } for (uint32_t i = 0; i < attribute_count; i++) { auto& attrib = attribute_list[i].attribute; auto& value = attribute_list[i].value; switch (attrib) { case HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG: { if (attribs[attribs.size() - 1].value & HSA_SVM_FLAG_COHERENT) { value = HSA_AMD_SVM_GLOBAL_FLAG_FINE_GRAINED; break; } if (attribs[attribs.size() - 2].value & HSA_SVM_FLAG_COHERENT) value = HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED; else value = HSA_AMD_SVM_GLOBAL_FLAG_INDETERMINATE; break; } case HSA_AMD_SVM_ATTRIB_READ_ONLY: { value = (attribs[attribs.size() - 1].value & HSA_SVM_FLAG_GPU_RO); break; } case HSA_AMD_SVM_ATTRIB_HIVE_LOCAL: { value = (attribs[attribs.size() - 1].value & HSA_SVM_FLAG_HIVE_LOCAL); break; } case HSA_AMD_SVM_ATTRIB_MIGRATION_GRANULARITY: { value = attribs[kmtIndices[i]].value; break; } case HSA_AMD_SVM_ATTRIB_PREFERRED_LOCATION: { uint64_t node = attribs[kmtIndices[i]].value; Agent* agent = nullptr; if (node != INVALID_NODEID) agent = agents_by_node_[node][0]; value = Agent::Convert(agent).handle; break; } case HSA_AMD_SVM_ATTRIB_PREFETCH_LOCATION: { break; } case HSA_AMD_SVM_ATTRIB_READ_MOSTLY: { value = (attribs[attribs.size() - 1].value & HSA_SVM_FLAG_GPU_READ_MOSTLY); break; } case HSA_AMD_SVM_ATTRIB_ACCESS_QUERY: { if (kmtIndices[i] == -1) { if (attribs[attribs.size() - 1].value & HSA_SVM_FLAG_HOST_ACCESS) attrib = HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE; } else { switch (attribs[kmtIndices[i]].type) { case HSA_SVM_ATTR_ACCESS: attrib = HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE; break; case HSA_SVM_ATTR_ACCESS_IN_PLACE: attrib = HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE; break; case HSA_SVM_ATTR_NO_ACCESS: attrib = HSA_AMD_SVM_ATTRIB_AGENT_NO_ACCESS; break; default: assert(false && "Bad agent accessibility from KFD."); } } break; } default: throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_ARGUMENT, "Illegal or invalid attribute in Runtime::GetSvmAttrib"); } } return HSA_STATUS_SUCCESS; } hsa_status_t Runtime::SvmPrefetch(void* ptr, size_t size, hsa_agent_t agent, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal) { uintptr_t base = reinterpret_cast(AlignDown(ptr, 4096)); uintptr_t end = AlignUp(reinterpret_cast(ptr) + size, 4096); size_t len = end - base; PrefetchOp* op = new PrefetchOp(); MAKE_NAMED_SCOPE_GUARD(OpGuard, [&]() { delete op; }); Agent* dest = Agent::Convert(agent); if (dest->device_type() == Agent::kAmdCpuDevice) op->node_id = 0; else op->node_id = dest->node_id(); op->base = reinterpret_cast(base); op->size = len; op->completion = completion_signal; if (num_dep_signals > 1) { op->remaining_deps = num_dep_signals - 1; for (int i = 0; i < num_dep_signals - 1; i++) op->dep_signals.push_back(dep_signals[i]); } else { op->remaining_deps = 0; } { ScopedAcquire lock(&prefetch_lock_); // Remove all fully overlapped and trim partially overlapped ranges. // Get iteration bounds auto start = prefetch_map_.upper_bound(base); if (start != prefetch_map_.begin()) start--; auto stop = prefetch_map_.lower_bound(end); auto isEndNode = [&](decltype(start) node) { return node->second.next == prefetch_map_.end(); }; auto isFirstNode = [&](decltype(start) node) { return node->second.prev == prefetch_map_.end(); }; // Trim and remove old ranges. while (start != stop) { uintptr_t startBase = start->first; uintptr_t startEnd = startBase + start->second.bytes; auto ibase = Max(startBase, base); auto iend = Min(startEnd, end); // Check for overlap if (ibase < iend) { // Second range check if (iend < startEnd) { auto ret = prefetch_map_.insert( std::make_pair(iend, PrefetchRange(startEnd - iend, start->second.op))); assert(ret.second && "Prefetch map insert failed during range split."); auto it = ret.first; it->second.prev = start; it->second.next = start->second.next; start->second.next = it; if (!isEndNode(it)) it->second.next->second.prev = it; } // Is the first interval of the old range valid if (startBase < ibase) { start->second.bytes = ibase - startBase; } else { if (isFirstNode(start)) { start->second.op->prefetch_map_entry = start->second.next; if (!isEndNode(start)) start->second.next->second.prev = prefetch_map_.end(); } else { start->second.prev->second.next = start->second.next; if (!isEndNode(start)) start->second.next->second.prev = start->second.prev; } prefetch_map_.erase(start); } } start++; } // Insert new range. auto ret = prefetch_map_.insert(std::make_pair(base, PrefetchRange(len, op))); assert(ret.second && "Prefetch map insert failed."); auto it = ret.first; op->prefetch_map_entry = it; it->second.next = it->second.prev = prefetch_map_.end(); } // Remove the prefetch's ranges from the map. static auto removePrefetchRanges = [](PrefetchOp* op) { ScopedAcquire lock(&Runtime::runtime_singleton_->prefetch_lock_); auto it = op->prefetch_map_entry; while (it != Runtime::runtime_singleton_->prefetch_map_.end()) { auto next = it->second.next; Runtime::runtime_singleton_->prefetch_map_.erase(it); it = next; } }; // Prefetch Signal handler for synchronization. static hsa_amd_signal_handler signal_handler = [](hsa_signal_value_t value, void* arg) { PrefetchOp* op = reinterpret_cast(arg); if (op->remaining_deps > 0) { op->remaining_deps--; Runtime::runtime_singleton_->SetAsyncSignalHandler( op->dep_signals[op->remaining_deps], HSA_SIGNAL_CONDITION_EQ, 0, signal_handler, arg); return false; } HSA_SVM_ATTRIBUTE attrib; attrib.type = HSA_SVM_ATTR_PREFETCH_LOC; attrib.value = op->node_id; HSAKMT_STATUS error = hsaKmtSVMSetAttr(op->base, op->size, 1, &attrib); assert(error == HSAKMT_STATUS_SUCCESS && "KFD Prefetch failed."); removePrefetchRanges(op); if (op->completion.handle != 0) Signal::Convert(op->completion)->SubRelaxed(1); delete op; return false; }; auto no_dependencies = [](void* arg) { signal_handler(0, arg); }; MAKE_NAMED_SCOPE_GUARD(RangeGuard, [&]() { removePrefetchRanges(op); }); hsa_status_t err; if (num_dep_signals == 0) err = AMD::hsa_amd_async_function(no_dependencies, op); else err = SetAsyncSignalHandler(dep_signals[num_dep_signals - 1], HSA_SIGNAL_CONDITION_EQ, 0, signal_handler, op); if (err != HSA_STATUS_SUCCESS) throw AMD::hsa_exception(err, "Signal handler unable to be set."); RangeGuard.Dismiss(); OpGuard.Dismiss(); return HSA_STATUS_SUCCESS; } Agent* Runtime::GetSVMPrefetchAgent(void* ptr, size_t size) { uintptr_t base = reinterpret_cast(AlignDown(ptr, 4096)); uintptr_t end = AlignUp(reinterpret_cast(ptr) + size, 4096); size_t len = end - base; std::vector> holes; ScopedAcquire lock(&Runtime::runtime_singleton_->prefetch_lock_); auto start = prefetch_map_.upper_bound(base); if (start != prefetch_map_.begin()) start--; auto stop = prefetch_map_.lower_bound(end); // KFD returns -1 for no or mixed destinations. uint32_t prefetch_node = -2; if (start != stop) { prefetch_node = start->second.op->node_id; } while (start != stop) { uintptr_t startBase = start->first; uintptr_t startEnd = startBase + start->second.bytes; auto ibase = Max(base, startBase); auto iend = Min(end, startEnd); // Check for intersection with the query if (ibase < iend) { // If prefetch locations are different then we report null agent. if (prefetch_node != start->second.op->node_id) return nullptr; // Push leading gap to an array for checking KFD. if (base < ibase) holes.push_back(std::make_pair(base, ibase - base)); // Trim query range. base = iend; } start++; } if (base < end) holes.push_back(std::make_pair(base, end - base)); HSA_SVM_ATTRIBUTE attrib; attrib.type = HSA_SVM_ATTR_PREFETCH_LOC; for (auto& range : holes) { HSAKMT_STATUS error = hsaKmtSVMGetAttr(reinterpret_cast(range.first), range.second, 1, &attrib); assert(error == HSAKMT_STATUS_SUCCESS && "KFD prefetch query failed."); if (attrib.value == -1) return nullptr; if (prefetch_node == -2) prefetch_node = attrib.value; if (prefetch_node != attrib.value) return nullptr; } assert(prefetch_node != -2 && "prefetch_node was not updated."); assert(prefetch_node != -1 && "Should have already returned."); return agents_by_node_[prefetch_node][0]; } } // namespace core } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/runtime/signal.cpp000066400000000000000000000244371420110115200220200ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTME_CORE_SIGNAL_CPP_ #define HSA_RUNTME_CORE_SIGNAL_CPP_ #include "core/inc/signal.h" #include #include "core/util/timer.h" #include "core/inc/runtime.h" namespace rocr { namespace core { KernelMutex Signal::ipcLock_; std::map Signal::ipcMap_; void SharedSignalPool_t::clear() { ifdebug { size_t capacity = 0; for (auto& block : block_list_) capacity += block.second; if (capacity != free_list_.size()) debug_print("Warning: Resource leak detected by SharedSignalPool, %ld Signals leaked.\n", capacity - free_list_.size()); } for (auto& block : block_list_) free_(block.first); block_list_.clear(); free_list_.clear(); } SharedSignal* SharedSignalPool_t::alloc() { ScopedAcquire lock(&lock_); if (free_list_.empty()) { SharedSignal* block = reinterpret_cast( allocate_(block_size_ * sizeof(SharedSignal), __alignof(SharedSignal), 0)); if (block == nullptr) { block_size_ = minblock_; block = reinterpret_cast( allocate_(block_size_ * sizeof(SharedSignal), __alignof(SharedSignal), 0)); if (block == nullptr) throw std::bad_alloc(); } MAKE_NAMED_SCOPE_GUARD(throwGuard, [&]() { free_(block); }); block_list_.push_back(std::make_pair(block, block_size_)); throwGuard.Dismiss(); for (int i = 0; i < block_size_; i++) { free_list_.push_back(&block[i]); } block_size_ *= 2; } SharedSignal* ret = free_list_.back(); new (ret) SharedSignal(); free_list_.pop_back(); return ret; } void SharedSignalPool_t::free(SharedSignal* ptr) { if (ptr == nullptr) return; ptr->~SharedSignal(); ScopedAcquire lock(&lock_); ifdebug { bool valid = false; for (auto& block : block_list_) { if ((block.first <= ptr) && (uintptr_t(ptr) < uintptr_t(block.first) + block.second * sizeof(SharedSignal))) { valid = true; break; } } assert(valid && "Object does not belong to pool."); } free_list_.push_back(ptr); } LocalSignal::LocalSignal(hsa_signal_value_t initial_value, bool exportable) : local_signal_(exportable ? nullptr : core::Runtime::runtime_singleton_->GetSharedSignalPool(), exportable ? core::MemoryRegion::AllocateIPC : 0) { local_signal_.shared_object()->amd_signal.value = initial_value; } void Signal::registerIpc() { ScopedAcquire lock(&ipcLock_); auto handle = Convert(this); assert(ipcMap_.find(handle.handle) == ipcMap_.end() && "Can't register the same IPC signal twice."); ipcMap_[handle.handle] = this; } bool Signal::deregisterIpc() { ScopedAcquire lock(&ipcLock_); if (refcount_ != 0) return false; auto handle = Convert(this); const auto& it = ipcMap_.find(handle.handle); assert(it != ipcMap_.end() && "Deregister on non-IPC signal."); ipcMap_.erase(it); return true; } Signal* Signal::lookupIpc(hsa_signal_t signal) { ScopedAcquire lock(&ipcLock_); const auto& it = ipcMap_.find(signal.handle); if (it == ipcMap_.end()) return nullptr; return it->second; } Signal* Signal::duplicateIpc(hsa_signal_t signal) { ScopedAcquire lock(&ipcLock_); const auto& it = ipcMap_.find(signal.handle); if (it == ipcMap_.end()) return nullptr; it->second->refcount_++; it->second->Retain(); return it->second; } void Signal::Release() { if (--retained_ != 0) return; if (!isIPC()) doDestroySignal(); else if (deregisterIpc()) doDestroySignal(); } Signal::~Signal() { signal_.kind = AMD_SIGNAL_KIND_INVALID; if (refcount_ == 1 && isIPC()) { refcount_ = 0; deregisterIpc(); } } uint32_t Signal::WaitAny(uint32_t signal_count, const hsa_signal_t* hsa_signals, const hsa_signal_condition_t* conds, const hsa_signal_value_t* values, uint64_t timeout, hsa_wait_state_t wait_hint, hsa_signal_value_t* satisfying_value) { hsa_signal_handle* signals = reinterpret_cast(const_cast(hsa_signals)); for (uint32_t i = 0; i < signal_count; i++) signals[i]->Retain(); MAKE_SCOPE_GUARD([&]() { for (uint32_t i = 0; i < signal_count; i++) signals[i]->Release(); }); uint32_t prior = 0; for (uint32_t i = 0; i < signal_count; i++) prior = Max(prior, signals[i]->waiting_++); MAKE_SCOPE_GUARD([&]() { for (uint32_t i = 0; i < signal_count; i++) signals[i]->waiting_--; }); // Allow only the first waiter to sleep (temporary, known to be bad). if (prior != 0) wait_hint = HSA_WAIT_STATE_ACTIVE; // Ensure that all signals in the list can be slept on. if (wait_hint != HSA_WAIT_STATE_ACTIVE) { for (uint32_t i = 0; i < signal_count; i++) { if (signals[i]->EopEvent() == NULL) { wait_hint = HSA_WAIT_STATE_ACTIVE; break; } } } const uint32_t small_size = 10; HsaEvent* short_evts[small_size]; HsaEvent** evts = NULL; uint32_t unique_evts = 0; if (wait_hint != HSA_WAIT_STATE_ACTIVE) { if (signal_count > small_size) evts = new HsaEvent* [signal_count]; else evts = short_evts; for (uint32_t i = 0; i < signal_count; i++) evts[i] = signals[i]->EopEvent(); std::sort(evts, evts + signal_count); HsaEvent** end = std::unique(evts, evts + signal_count); unique_evts = uint32_t(end - evts); } MAKE_SCOPE_GUARD([&]() { if (signal_count > small_size) delete[] evts; }); int64_t value; timer::fast_clock::time_point start_time = timer::fast_clock::now(); // Set a polling timeout value const timer::fast_clock::duration kMaxElapsed = std::chrono::microseconds(200); // Convert timeout value into the fast_clock domain uint64_t hsa_freq; HSA::hsa_system_get_info(HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY, &hsa_freq); const timer::fast_clock::duration fast_timeout = timer::duration_from_seconds( double(timeout) / double(hsa_freq)); bool condition_met = false; while (true) { for (uint32_t i = 0; i < signal_count; i++) { if (!signals[i]->IsValid()) return uint32_t(-1); // Handling special event. if (signals[i]->EopEvent() != NULL) { const HSA_EVENTTYPE event_type = signals[i]->EopEvent()->EventData.EventType; if (event_type == HSA_EVENTTYPE_MEMORY) { const HsaMemoryAccessFault& fault = signals[i]->EopEvent()->EventData.EventData.MemoryAccessFault; if (fault.Flags == HSA_EVENTID_MEMORY_FATAL_PROCESS) { return i; } } } value = atomic::Load(&signals[i]->signal_.value, std::memory_order_relaxed); switch (conds[i]) { case HSA_SIGNAL_CONDITION_EQ: { condition_met = (value == values[i]); break; } case HSA_SIGNAL_CONDITION_NE: { condition_met = (value != values[i]); break; } case HSA_SIGNAL_CONDITION_GTE: { condition_met = (value >= values[i]); break; } case HSA_SIGNAL_CONDITION_LT: { condition_met = (value < values[i]); break; } default: return uint32_t(-1); } if (condition_met) { if (satisfying_value != NULL) *satisfying_value = value; return i; } } timer::fast_clock::time_point time = timer::fast_clock::now(); if (time - start_time > fast_timeout) { return uint32_t(-1); } if (wait_hint == HSA_WAIT_STATE_ACTIVE) { continue; } if (time - start_time < kMaxElapsed) { // os::uSleep(20); continue; } uint32_t wait_ms; auto time_remaining = fast_timeout - (time - start_time); uint64_t ct=timer::duration_cast( time_remaining).count(); wait_ms = (ct>0xFFFFFFFEu) ? 0xFFFFFFFEu : ct; hsaKmtWaitOnMultipleEvents(evts, unique_evts, false, wait_ms); } } SignalGroup::SignalGroup(uint32_t num_signals, const hsa_signal_t* hsa_signals) : count(num_signals) { if (count != 0) { signals = new hsa_signal_t[count]; } else { signals = NULL; } if (signals == NULL) return; for (uint32_t i = 0; i < count; i++) signals[i] = hsa_signals[i]; } } // namespace core } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/util/000077500000000000000000000000001420110115200173175ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/core/util/atomic_helpers.h000066400000000000000000000436211420110115200224740ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// /* Helpers to use native types with C++11 atomic operations. Fixes GCC builtin functionality for x86 with respect to WC and non-temporal stores. */ #ifndef HSA_RUNTIME_CORE_UTIL_ATOMIC_HELPERS_H_ #define HSA_RUNTIME_CORE_UTIL_ATOMIC_HELPERS_H_ #include #include "utils.h" //ALWAYS_CONSERVATIVE will very likely overfence your code. //For use as a debugging aid only. #define ALWAYS_CONSERVATIVE 0 #if !ALWAYS_CONSERVATIVE #ifdef __x86_64 #define X64_ORDER_WC 1 #endif #if X64_ORDER_WC #include #endif #endif namespace rocr { namespace atomic { static constexpr int c11ToBuiltInFlags(std::memory_order order) { #if ALWAYS_CONSERVATIVE return __ATOMIC_RELAXED; #elif X64_ORDER_WC return __ATOMIC_RELAXED; #else return (order == std::memory_order_relaxed) ? __ATOMIC_RELAXED : (order == std::memory_order_acquire) ? __ATOMIC_ACQUIRE : (order == std::memory_order_release) ? __ATOMIC_RELEASE : (order == std::memory_order_seq_cst) ? __ATOMIC_SEQ_CST : (order == std::memory_order_consume) ? __ATOMIC_CONSUME : (order == std::memory_order_acq_rel) ? __ATOMIC_ACQ_REL : __ATOMIC_SEQ_CST; #endif } static __forceinline void PreFence(std::memory_order order) { #if ALWAYS_CONSERVATIVE switch (order) { case std::memory_order_release: case std::memory_order_seq_cst: case std::memory_order_acq_rel: __atomic_thread_fence(__ATOMIC_SEQ_CST); default:; } #elif X64_ORDER_WC switch (order) { case std::memory_order_release: case std::memory_order_seq_cst: case std::memory_order_acq_rel: _mm_sfence(); default:; } #endif } static __forceinline void PostFence(std::memory_order order) { #if ALWAYS_CONSERVATIVE switch (order) { case std::memory_order_seq_cst: case std::memory_order_acq_rel: case std::memory_order_acquire: __atomic_thread_fence(__ATOMIC_SEQ_CST); default:; } #elif X64_ORDER_WC switch (order) { case std::memory_order_seq_cst: return _mm_mfence(); case std::memory_order_acq_rel: case std::memory_order_acquire: return _mm_lfence(); default:; } #endif } static __forceinline void Fence(std::memory_order order=std::memory_order_seq_cst) { #if ALWAYS_CONSERVATIVE __atomic_thread_fence(__ATOMIC_SEQ_CST); #elif X64_ORDER_WC switch (order) { case std::memory_order_seq_cst: case std::memory_order_acq_rel: return _mm_mfence(); case std::memory_order_acquire: return _mm_lfence(); case std::memory_order_release: return _mm_sfence(); default:; } #else std::atomic_thread_fence(order); #endif } template static __forceinline void BasicCheck(const T* ptr) { constexpr bool value = __atomic_always_lock_free(sizeof(T), 0); static_assert(value, "Atomic type may not be compatible with peripheral atomics."); }; template static __forceinline void BasicCheck(const volatile T* ptr) { constexpr bool value = __atomic_always_lock_free(sizeof(T), 0); static_assert(value, "Atomic type may not be compatible with peripheral atomics."); }; /// @brief: Load value of type T atomically with specified memory order. /// @param: ptr(Input), a pointer to type T. /// @param: order(Input), memory order with atomic load, relaxed by default. /// @return: T, loaded value. template static __forceinline T Load(const T* ptr, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); T ret; PreFence(order); __atomic_load(ptr, &ret, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: function overloading, for more info, see previous one. /// @param: ptr(Input), a pointer to volatile type T. /// @param: order(Input), memory order with atomic load, relaxed by default. /// @return: T, loaded value. template static __forceinline T Load(const volatile T* ptr, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); T ret; PreFence(order); __atomic_load(ptr, &ret, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Store value of type T with specified memory order. /// @param: ptr(Input), a pointer to instance which will be stored. /// @param: val(Input), value to be stored. /// @param: order(Input), memory order with atomic store, relaxed by default. /// @return: void. template static __forceinline void Store( T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); __atomic_store(ptr, &val, c11ToBuiltInFlags(order)); PostFence(order); } /// @brief: Function overloading, for more info, see previous one. /// @param: ptr(Input), a pointer to volatile instance which will be stored. /// @param: val(Input), value to be stored. /// @param: order(Input), memory order with atomic store, relaxed by default. /// @return: void. template static __forceinline void Store( volatile T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); __atomic_store(ptr, &val, c11ToBuiltInFlags(order)); PostFence(order); } /// @brief: Compare and swap value atomically with specified memory order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: val(Input), value to be stored if condition is satisfied. /// @param: expected(Input), value which is expected. /// @param: order(Input), memory order with atomic operation. /// @return: T, observed value of type T. template static __forceinline T Cas(T* ptr, T val, T expected, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); __atomic_compare_exchange(ptr, &expected, &val, false, c11ToBuiltInFlags(order), __ATOMIC_RELAXED); PostFence(order); return expected; } /// @brief: Function overloading, for more info, see previous one. /// @param: ptr(Input), a pointer to volatile variable which is operated on. /// @param: val(Input), value to be stored if condition is satisfied. /// @param: expected(Input), value which is expected. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, observed value of type T. template static __forceinline T Cas(volatile T* ptr, T val, T expected, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); __atomic_compare_exchange(ptr, &expected, &val, false, c11ToBuiltInFlags(order), __ATOMIC_RELAXED); PostFence(order); return expected; } /// @brief: Exchange the value atomically with specified memory order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: val(Input), value to be stored. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, the value prior to the exchange. template static __forceinline T Exchange(T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); T ret; PreFence(order); __atomic_exchange(ptr, &val, &ret, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Function overloading, for more info, see previous one. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: val(Input), value to be stored. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, the value prior to the exchange. template static __forceinline T Exchange(volatile T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); T ret; PreFence(order); __atomic_exchange(ptr, &val, &ret, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Add value to variable atomically with specified memory order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: val(Input), value to be added. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, the value of the variable prior to the addition. template static __forceinline T Add(T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_add(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Subtract value from the variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: val(Input), value to be subtraced. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of the variable prior to the subtraction. template static __forceinline T Sub(T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_sub(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Bit And operation on variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: val(Input), value which is ANDed with variable. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of variable prior to the operation. template static __forceinline T And(T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_and(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Bit Or operation on variable atomically with specified memory order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: val(Input), value which is ORed with variable. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of variable prior to the operation. template static __forceinline T Or(T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_or(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Bit Xor operation on variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: val(Input), value which is XORed with variable. /// @order: order(Input), memory order which is relaxed by default. /// @return: T, valud of variable prior to the opertaion. template static __forceinline T Xor(T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_xor(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Increase the value of variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of variable prior to the operation. template static __forceinline T Increment(T* ptr, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_add(ptr, 1, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Decrease the value of the variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to variable which is operated on. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of variable prior to the operation. template static __forceinline T Decrement(T* ptr, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_sub(ptr, 1, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Add value to variable atomically with specified memory order. /// @param: ptr(Input), a pointer to volatile variable which is operated on. /// @param: val(Input), value to be added. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, the value of the variable prior to the addition. template static __forceinline T Add(volatile T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_add(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Subtract value from the variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to volatile variable which is operated on. /// @param: val(Input), value to be subtraced. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of the variable prior to the subtraction. template static __forceinline T Sub(volatile T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_sub(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Bit And operation on variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to volatile variable which is operated on. /// @param: val(Input), value which is ANDed with variable. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of variable prior to the operation. template static __forceinline T And(volatile T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_and(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Bit Or operation on variable atomically with specified memory order. /// @param: ptr(Input), a pointer to volatile variable which is operated on. /// @param: val(Input), value which is ORed with variable. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of variable prior to the operation. template static __forceinline T Or(volatile T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_or(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Bit Xor operation on variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to volatile variable which is operated on. /// @param: val(Input), value which is XORed with variable. /// @order: order(Input), memory order which is relaxed by default. /// @return: T, valud of variable prior to the opertaion. template static __forceinline T Xor(volatile T* ptr, T val, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_xor(ptr, val, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Increase the value of variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to volatile variable which is operated on. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of variable prior to the operation. template static __forceinline T Increment(volatile T* ptr, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_add(ptr, 1, c11ToBuiltInFlags(order)); PostFence(order); return ret; } /// @brief: Decrease the value of the variable atomically with specified memory /// order. /// @param: ptr(Input), a pointer to volatile variable which is operated on. /// @param: order(Input), memory order which is relaxed by default. /// @return: T, value of variable prior to the operation. template static __forceinline T Decrement(volatile T* ptr, std::memory_order order = std::memory_order_relaxed) { BasicCheck(ptr); PreFence(order); T ret = __atomic_fetch_sub(ptr, 1, c11ToBuiltInFlags(order)); PostFence(order); return ret; } } // namespace atomic } // namespace rocr #ifdef X64_ORDER_WC #undef X64_ORDER_WC #endif #ifdef ALWAYS_CONSERVATIVE #undef ALWAYS_CONSERVATIVE #endif #endif // HSA_RUNTIME_CORE_UTIL_ATOMIC_HELPERS_H_ ROCR-Runtime-rocm-5.0.0/src/core/util/flag.cpp000066400000000000000000000155761420110115200207520ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2021-2021, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIESd OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/util/flag.h" #include "core/util/utils.h" #include #include #include #include #include namespace rocr { // split at separators static std::vector split(std::string& str, char sep) { std::vector ret; while (!str.empty()) { size_t pos = str.find(sep); if (pos == std::string::npos) { ret.push_back(str); return ret; } ret.push_back(str.substr(0, pos)); str.erase(0, pos + 1); } return ret; }; // Parse id,id-id,... strings into id lists static std::vector get_elements(std::string& str, uint32_t maxElement) { std::vector ret; MAKE_NAMED_SCOPE_GUARD(error, [&]() { ret.clear(); }); std::vector ranges = split(str, ','); for (auto& str : ranges) { auto range = split(str, '-'); // failure, too many -'s. if (range.size() > 2) return ret; char* end; uint32_t index = strtoul(range[0].c_str(), &end, 10); // Invalid syntax - id's must be base 10 digits only. if (*end != '\0') return ret; if (index <= maxElement) ret.push_back(index); if (range.size() == 2) { uint32_t secondindex = strtoul(range[1].c_str(), &end, 10); if (*end != '\0') return ret; // bad syntax if (secondindex < index) return ret; // inverted range secondindex = Min(secondindex, maxElement); for (uint32_t i = index + 1; i < secondindex + 1; i++) ret.push_back(i); } } // Confirm no duplicate ids. std::sort(ret.begin(), ret.end()); if (std::adjacent_find(ret.begin(), ret.end()) != ret.end()) return ret; // Good parse, keep result. error.Dismiss(); return ret; }; /* Parse env var per the following syntax, all whitespace is ignored: ID = [0-9][0-9]* ex. base 10 numbers ID_list = (ID | ID-ID)[, (ID | ID-ID)]* ex. 0,2-4,7 GPU_list = ID_list ex. 0,2-4,7 CU_list = 0x[0-F]* | ID_list ex. 0x337F OR 0,2-4,7 CU_Set = GPU_list : CU_list ex. 0,2-4,7:0-15,32-47 OR 0,2-4,7:0x337F HSA_CU_MASK = CU_Set [; CU_Set]* ex. 0,2-4,7:0-15,32-47; 3-9:0x337F GPU indexes are taken post ROCR_VISIBLE_DEVICES reordering. Listed or bit set CUs will be enabled at queue creation on the associated GPU. All other CUs on the associated GPUs will be disabled. CU masks of unlisted GPUs are not restricted. Repeating a GPU or CU ID is a syntax error. Parsing stops at the first CU_Set that has a syntax error, that set and all following sets are ignored. Specifying a mask with no usable CUs (CU_list is 0x0) is a syntax error. Users should use ROCR_VISIBLE_DEVICES if they want to exclude use of a particular GPU. */ void Flag::parse_masks(std::string& var, uint32_t maxGpu, uint32_t maxCU) { if (var.empty()) return; // Remove whitespace auto end = std::remove_if(var.begin(), var.end(), [](char c) { return std::isspace(c, std::locale::classic()); }); var.erase(end, var.end()); // Switch to uppercase for (auto& c : var) c = toupper(c); // Iterate over cu sets auto sets = split(var, ';'); for (auto& set : sets) { auto parts = split(set, ':'); if (parts.size() != 2) return; // temp storage for cu_set parsing. std::vector gpu_index; std::vector mask; // parse cu list first, check for bitmask format if (parts[1][1] == 'X') { // Confirm hex format and strip prefix auto& cu = parts[1]; if (cu[0] != '0') return; cu.erase(0, 2); // Ensure all valid hex characters for (auto& c : cu) { if (!isxdigit(c)) return; } // Convert to uint32_t, lsb first. size_t len = cu.length(); while (len != 0) { size_t trim = Min(len, size_t(8)); len -= trim; auto tmp = cu.substr(len, trim); auto chunk = stoul(tmp, nullptr, 16); mask.push_back(chunk); } // Trim dwords beyond maxCUs uint32_t maxDwords = maxCU / 32 + 1; if (maxDwords < mask.size()) mask.resize(maxDwords); // Trim leading zeros while (!mask.empty() && mask.back() == 0) mask.pop_back(); // Mask 0x0 is an error. if (mask.empty()) return; } else { // parse cu lists auto cu_indices = get_elements(parts[1], maxCU); if (cu_indices.empty()) return; uint32_t maxdword = cu_indices.back() / 32 + 1; mask.resize(maxdword, 0); for (auto id : cu_indices) { uint32_t index, offset; index = id / 32; offset = id % 32; mask[index] |= 1ul << offset; } } // parse device list gpu_index = get_elements(parts[0], maxGpu); if (gpu_index.empty()) return; // Ensure that no GPU was repeated across cu_sets for (auto id : gpu_index) { if (cu_mask_.find(id) != cu_mask_.end()) return; } // Insert into map for (auto id : gpu_index) { cu_mask_[id] = mask; } } } } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/util/flag.h000066400000000000000000000217421420110115200204070ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2021, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIESd OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_FLAG_H_ #define HSA_RUNTIME_CORE_INC_FLAG_H_ #include #include #include #include #include "core/util/os.h" #include "core/util/utils.h" namespace rocr { class Flag { public: enum SDMA_OVERRIDE { SDMA_DISABLE, SDMA_ENABLE, SDMA_DEFAULT }; // The values are meaningful and chosen to satisfy the thunk API. enum XNACK_REQUEST { XNACK_DISABLE = 0, XNACK_ENABLE = 1, XNACK_UNCHANGED = 2 }; static_assert(XNACK_DISABLE == 0, "XNACK_REQUEST enum values improperly changed."); static_assert(XNACK_ENABLE == 1, "XNACK_REQUEST enum values improperly changed."); explicit Flag() { Refresh(); } virtual ~Flag() {} void Refresh() { std::string var = os::GetEnvVar("HSA_CHECK_FLAT_SCRATCH"); check_flat_scratch_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_ENABLE_VM_FAULT_MESSAGE"); enable_vm_fault_message_ = (var == "0") ? false : true; var = os::GetEnvVar("HSA_ENABLE_QUEUE_FAULT_MESSAGE"); enable_queue_fault_message_ = (var == "0") ? false : true; var = os::GetEnvVar("HSA_ENABLE_INTERRUPT"); enable_interrupt_ = (var == "0") ? false : true; var = os::GetEnvVar("HSA_ENABLE_SDMA"); enable_sdma_ = (var == "0") ? SDMA_DISABLE : ((var == "1") ? SDMA_ENABLE : SDMA_DEFAULT); visible_gpus_ = os::GetEnvVar("ROCR_VISIBLE_DEVICES"); filter_visible_gpus_ = os::IsEnvVarSet("ROCR_VISIBLE_DEVICES"); var = os::GetEnvVar("HSA_RUNNING_UNDER_VALGRIND"); running_valgrind_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_SDMA_WAIT_IDLE"); sdma_wait_idle_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_MAX_QUEUES"); max_queues_ = static_cast(atoi(var.c_str())); var = os::GetEnvVar("HSA_SCRATCH_MEM"); scratch_mem_size_ = atoi(var.c_str()); tools_lib_names_ = os::GetEnvVar("HSA_TOOLS_LIB"); var = os::GetEnvVar("HSA_TOOLS_REPORT_LOAD_FAILURE"); ifdebug { report_tool_load_failures_ = (var == "1") ? true : false; } else { report_tool_load_failures_ = (var == "0") ? false : true; } var = os::GetEnvVar("HSA_DISABLE_FRAGMENT_ALLOCATOR"); disable_fragment_alloc_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_ENABLE_SDMA_HDP_FLUSH"); enable_sdma_hdp_flush_ = (var == "0") ? false : true; var = os::GetEnvVar("HSA_REV_COPY_DIR"); rev_copy_dir_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_FORCE_FINE_GRAIN_PCIE"); fine_grain_pcie_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_NO_SCRATCH_RECLAIM"); no_scratch_reclaim_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_NO_SCRATCH_THREAD_LIMITER"); no_scratch_thread_limit_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_DISABLE_IMAGE"); disable_image_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_LOADER_ENABLE_MMAP_URI"); loader_enable_mmap_uri_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_FORCE_SDMA_SIZE"); force_sdma_size_ = var.empty() ? 1024 * 1024 : atoi(var.c_str()); var = os::GetEnvVar("HSA_IGNORE_SRAMECC_MISREPORT"); check_sramecc_validity_ = (var == "1") ? false : true; // Legal values are zero "0" or one "1". Any other value will // be interpreted as not defining the env variable. var = os::GetEnvVar("HSA_XNACK"); xnack_ = (var == "0") ? XNACK_DISABLE : ((var == "1") ? XNACK_ENABLE : XNACK_UNCHANGED); var = os::GetEnvVar("HSA_ENABLE_DEBUG"); debug_ = (var == "1") ? true : false; var = os::GetEnvVar("HSA_CU_MASK_SKIP_INIT"); cu_mask_skip_init_ = (var == "1") ? true : false; // Temporary opt-in for corrected HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT behavior. // Will become opt-out and possibly removed in future releases. var = os::GetEnvVar("HSA_COOP_CU_COUNT"); coop_cu_count_ = (var == "1") ? true : false; } void parse_masks(uint32_t maxGpu, uint32_t maxCU) { std::string var = os::GetEnvVar("HSA_CU_MASK"); parse_masks(var, maxGpu, maxCU); } bool check_flat_scratch() const { return check_flat_scratch_; } bool enable_vm_fault_message() const { return enable_vm_fault_message_; } bool enable_queue_fault_message() const { return enable_queue_fault_message_; } bool enable_interrupt() const { return enable_interrupt_; } bool enable_sdma_hdp_flush() const { return enable_sdma_hdp_flush_; } bool running_valgrind() const { return running_valgrind_; } bool sdma_wait_idle() const { return sdma_wait_idle_; } bool report_tool_load_failures() const { return report_tool_load_failures_; } bool disable_fragment_alloc() const { return disable_fragment_alloc_; } bool rev_copy_dir() const { return rev_copy_dir_; } bool fine_grain_pcie() const { return fine_grain_pcie_; } bool no_scratch_reclaim() const { return no_scratch_reclaim_; } bool no_scratch_thread_limiter() const { return no_scratch_thread_limit_; } SDMA_OVERRIDE enable_sdma() const { return enable_sdma_; } std::string visible_gpus() const { return visible_gpus_; } bool filter_visible_gpus() const { return filter_visible_gpus_; } uint32_t max_queues() const { return max_queues_; } size_t scratch_mem_size() const { return scratch_mem_size_; } std::string tools_lib_names() const { return tools_lib_names_; } bool disable_image() const { return disable_image_; } bool loader_enable_mmap_uri() const { return loader_enable_mmap_uri_; } size_t force_sdma_size() const { return force_sdma_size_; } bool check_sramecc_validity() const { return check_sramecc_validity_; } XNACK_REQUEST xnack() const { return xnack_; } bool debug() const { return debug_; } const std::vector& cu_mask(uint32_t gpu_index) const { static const std::vector empty; auto it = cu_mask_.find(gpu_index); if (it == cu_mask_.end()) return empty; return it->second; } bool cu_mask_skip_init() const { return cu_mask_skip_init_; } bool coop_cu_count() const { return coop_cu_count_; } private: bool check_flat_scratch_; bool enable_vm_fault_message_; bool enable_interrupt_; bool enable_sdma_hdp_flush_; bool running_valgrind_; bool sdma_wait_idle_; bool enable_queue_fault_message_; bool report_tool_load_failures_; bool disable_fragment_alloc_; bool rev_copy_dir_; bool fine_grain_pcie_; bool no_scratch_reclaim_; bool no_scratch_thread_limit_; bool disable_image_; bool loader_enable_mmap_uri_; bool check_sramecc_validity_; bool debug_; bool cu_mask_skip_init_; bool coop_cu_count_; SDMA_OVERRIDE enable_sdma_; bool filter_visible_gpus_; std::string visible_gpus_; uint32_t max_queues_; size_t scratch_mem_size_; std::string tools_lib_names_; size_t force_sdma_size_; // Indicates user preference for Xnack state. XNACK_REQUEST xnack_; // Map GPU index post RVD to its default cu mask. std::map> cu_mask_; void parse_masks(std::string& args, uint32_t maxGpu, uint32_t maxCU); DISALLOW_COPY_AND_ASSIGN(Flag); }; } // namespace rocr #endif // header guard ROCR-Runtime-rocm-5.0.0/src/core/util/lazy_ptr.h000066400000000000000000000107501420110115200213370ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIESd OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_UTIL_LAZY_PTR_H_ #define HSA_RUNTIME_CORE_UTIL_LAZY_PTR_H_ #include #include #include #include "core/util/locks.h" #include "core/util/utils.h" namespace rocr { /* * Wrapper for a std::unique_ptr that initializes its object at first use. */ template class lazy_ptr { public: lazy_ptr() {} explicit lazy_ptr(std::function Constructor) { Init(Constructor); } lazy_ptr(lazy_ptr&& rhs) { obj = std::move(rhs.obj); func = std::move(rhs.func); } lazy_ptr& operator=(lazy_ptr&& rhs) { obj = std::move(rhs.obj); func = std::move(rhs.func); } lazy_ptr(lazy_ptr&) = delete; lazy_ptr& operator=(lazy_ptr&) = delete; void reset(std::function Constructor = nullptr) { obj.reset(); func = Constructor; } void reset(T* ptr) { obj.reset(ptr); func = nullptr; } bool operator==(T* rhs) const { return obj.get() == rhs; } bool operator!=(T* rhs) const { return obj.get() != rhs; } const std::unique_ptr& operator->() const { make(true); assert(obj != nullptr && "Null dereference through lazy_ptr."); return obj; } std::unique_ptr& operator*() { make(true); return obj; } const std::unique_ptr& operator*() const { make(true); return obj; } /* * Ensures that the object is created or is being created. * This is useful when early construction of the object is required. */ void touch() const { make(false); } // Tells if the lazy object has been constructed or not. // Construction may fail silently (return nullptr). bool created() const { std::atomic_thread_fence(std::memory_order_acquire); return func == nullptr; } // Tells if the lazy object exists or not. bool empty() const { std::atomic_thread_fence(std::memory_order_acquire); return obj == nullptr; } private: mutable std::unique_ptr obj; mutable std::function func; mutable KernelMutex lock; // Separated from make to improve inlining. void make_body(bool block) const { if (block) { lock.Acquire(); } else if (!lock.Try()) { return; } MAKE_SCOPE_GUARD([&]() { lock.Release(); }); if (func == nullptr) return; T* ptr = func(); obj.reset(ptr); std::atomic_thread_fence(std::memory_order_release); func = nullptr; } __forceinline void make(bool block) const { if (!created()) { make_body(block); } } }; } // namespace rocr #endif // HSA_RUNTIME_CORE_UTIL_LAZY_PTR_H_ ROCR-Runtime-rocm-5.0.0/src/core/util/lnx/000077500000000000000000000000001420110115200201205ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/core/util/lnx/os_linux.cpp000066400000000000000000000357411420110115200224760ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifdef __linux__ #include "core/util/os.h" #include "core/util/utils.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include namespace rocr { namespace os { struct ThreadArgs { void* entry_args; ThreadEntry entry_function; }; void* __stdcall ThreadTrampoline(void* arg) { ThreadArgs* ar = (ThreadArgs*)arg; ThreadEntry CallMe = ar->entry_function; void* Data = ar->entry_args; delete ar; CallMe(Data); return nullptr; } // Thread container allows multiple waits and separate close (destroy). class os_thread { public: explicit os_thread(ThreadEntry function, void* threadArgument, uint stackSize) : thread(0), lock(nullptr), state(RUNNING) { std::unique_ptr args(new ThreadArgs); lock = CreateMutex(); if (lock == nullptr) return; args->entry_args = threadArgument; args->entry_function = function; pthread_attr_t attrib; pthread_attr_init(&attrib); if (stackSize != 0) { stackSize = Max(uint(PTHREAD_STACK_MIN), stackSize); stackSize = AlignUp(stackSize, 4096); int err = pthread_attr_setstacksize(&attrib, stackSize); assert(err == 0 && "pthread_attr_setstacksize failed."); } int cores = get_nprocs_conf(); cpu_set_t* cpuset = CPU_ALLOC(cores); for(int i=0; i state; enum { FINISHED = 0, RUNNING = 1 }; }; static_assert(sizeof(LibHandle) == sizeof(void*), "OS abstraction size mismatch"); static_assert(sizeof(Mutex) == sizeof(pthread_mutex_t*), "OS abstraction size mismatch"); static_assert(sizeof(SharedMutex) == sizeof(pthread_rwlock_t*), "OS abstraction size mismatch"); static_assert(sizeof(Thread) == sizeof(os_thread*), "OS abstraction size mismatch"); LibHandle LoadLib(std::string filename) { void* ret = dlopen(filename.c_str(), RTLD_LAZY); if (ret == nullptr) debug_print("LoadLib(%s) failed: %s\n", filename.c_str(), dlerror()); return *(LibHandle*)&ret; } void* GetExportAddress(LibHandle lib, std::string export_name) { void* ret = dlsym(*(void**)&lib, export_name.c_str()); // dlsym searches the given library and all the library's load dependencies. // Remaining code limits symbol lookup to only the library handle given. // This lookup pattern matches Windows. if (ret == NULL) return ret; link_map* map; int err = dlinfo(*(void**)&lib, RTLD_DI_LINKMAP, &map); assert(err != -1 && "dlinfo failed."); Dl_info info; err = dladdr(ret, &info); assert(err != 0 && "dladdr failed."); if (strcmp(info.dli_fname, map->l_name) == 0) return ret; return NULL; } void CloseLib(LibHandle lib) { dlclose(*(void**)&lib); } Mutex CreateMutex() { pthread_mutex_t* mutex = new pthread_mutex_t; pthread_mutex_init(mutex, NULL); return *(Mutex*)&mutex; } bool TryAcquireMutex(Mutex lock) { return pthread_mutex_trylock(*(pthread_mutex_t**)&lock) == 0; } bool AcquireMutex(Mutex lock) { return pthread_mutex_lock(*(pthread_mutex_t**)&lock) == 0; } void ReleaseMutex(Mutex lock) { pthread_mutex_unlock(*(pthread_mutex_t**)&lock); } void DestroyMutex(Mutex lock) { pthread_mutex_destroy(*(pthread_mutex_t**)&lock); delete *(pthread_mutex_t**)&lock; } void Sleep(int delay_in_millisec) { usleep(delay_in_millisec * 1000); } void uSleep(int delayInUs) { usleep(delayInUs); } void YieldThread() { sched_yield(); } Thread CreateThread(ThreadEntry function, void* threadArgument, uint stackSize) { os_thread* result = new os_thread(function, threadArgument, stackSize); if (!result->Valid()) { delete result; return nullptr; } return reinterpret_cast(result); } void CloseThread(Thread thread) { delete reinterpret_cast(thread); } bool WaitForThread(Thread thread) { return reinterpret_cast(thread)->Wait(); } bool WaitForAllThreads(Thread* threads, uint threadCount) { for (uint i = 0; i < threadCount; i++) WaitForThread(threads[i]); return true; } bool IsEnvVarSet(std::string env_var_name) { char* buff = NULL; buff = getenv(env_var_name.c_str()); return (buff != NULL); } void SetEnvVar(std::string env_var_name, std::string env_var_value) { setenv(env_var_name.c_str(), env_var_value.c_str(), 1); } std::string GetEnvVar(std::string env_var_name) { char* buff; buff = getenv(env_var_name.c_str()); std::string ret; if (buff) { ret = buff; } return ret; } size_t GetUserModeVirtualMemorySize() { #ifdef _LP64 // https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt : // user space is 0000000000000000 - 00007fffffffffff (=47 bits) return (size_t)(0x800000000000); #else return (size_t)(0xffffffff); // ~4GB #endif } size_t GetUsablePhysicalHostMemorySize() { struct sysinfo info = {0}; if (sysinfo(&info) != 0) { return 0; } const size_t physical_size = static_cast(info.totalram * info.mem_unit); return std::min(GetUserModeVirtualMemorySize(), physical_size); } uintptr_t GetUserModeVirtualMemoryBase() { return (uintptr_t)0; } // Os event implementation typedef struct EventDescriptor_ { pthread_cond_t event; pthread_mutex_t mutex; bool state; bool auto_reset; } EventDescriptor; EventHandle CreateOsEvent(bool auto_reset, bool init_state) { EventDescriptor* eventDescrp; eventDescrp = (EventDescriptor*)malloc(sizeof(EventDescriptor)); pthread_mutex_init(&eventDescrp->mutex, NULL); pthread_cond_init(&eventDescrp->event, NULL); eventDescrp->auto_reset = auto_reset; eventDescrp->state = init_state; EventHandle handle = reinterpret_cast(eventDescrp); return handle; } int DestroyOsEvent(EventHandle event) { if (event == NULL) { return -1; } EventDescriptor* eventDescrp = reinterpret_cast(event); int ret_code = pthread_cond_destroy(&eventDescrp->event); ret_code |= pthread_mutex_destroy(&eventDescrp->mutex); free(eventDescrp); return ret_code; } int WaitForOsEvent(EventHandle event, unsigned int milli_seconds) { if (event == NULL) { return -1; } EventDescriptor* eventDescrp = reinterpret_cast(event); // Event wait time is 0 and state is non-signaled, return directly if (milli_seconds == 0) { int tmp_ret = pthread_mutex_trylock(&eventDescrp->mutex); if (tmp_ret == EBUSY) { // Timeout return 1; } } int ret_code = 0; pthread_mutex_lock(&eventDescrp->mutex); if (!eventDescrp->state) { if (milli_seconds == 0) { ret_code = 1; } else { struct timespec ts; struct timeval tp; ret_code = gettimeofday(&tp, NULL); ts.tv_sec = tp.tv_sec; ts.tv_nsec = tp.tv_usec * 1000; unsigned int sec = milli_seconds / 1000; unsigned int mSec = milli_seconds % 1000; ts.tv_sec += sec; ts.tv_nsec += mSec * 1000000; // More then one second, add 1 sec to the tv_sec elem if (ts.tv_nsec > 1000000000) { ts.tv_sec += 1; ts.tv_nsec = ts.tv_nsec - 1000000000; } ret_code = pthread_cond_timedwait(&eventDescrp->event, &eventDescrp->mutex, &ts); // Time out if (ret_code == 110) { ret_code = 0x14003; // 1 means time out in HSA } if (ret_code == 0 && eventDescrp->auto_reset) { eventDescrp->state = false; } } } else if (eventDescrp->auto_reset) { eventDescrp->state = false; } pthread_mutex_unlock(&eventDescrp->mutex); return ret_code; } int SetOsEvent(EventHandle event) { if (event == NULL) { return -1; } EventDescriptor* eventDescrp = reinterpret_cast(event); int ret_code = 0; ret_code = pthread_mutex_lock(&eventDescrp->mutex); eventDescrp->state = true; ret_code = pthread_mutex_unlock(&eventDescrp->mutex); ret_code |= pthread_cond_signal(&eventDescrp->event); return ret_code; } int ResetOsEvent(EventHandle event) { if (event == NULL) { return -1; } EventDescriptor* eventDescrp = reinterpret_cast(event); int ret_code = 0; ret_code = pthread_mutex_lock(&eventDescrp->mutex); eventDescrp->state = false; ret_code = pthread_mutex_unlock(&eventDescrp->mutex); return ret_code; } static double invPeriod = 0.0; uint64_t ReadAccurateClock() { if (invPeriod == 0.0) AccurateClockFrequency(); timespec time; int err = clock_gettime(CLOCK_MONOTONIC_RAW, &time); assert(err == 0 && "clock_gettime(CLOCK_MONOTONIC_RAW,...) failed"); return (uint64_t(time.tv_sec) * 1000000000ull + uint64_t(time.tv_nsec)) * invPeriod; } uint64_t AccurateClockFrequency() { static clockid_t clock = CLOCK_MONOTONIC; static std::atomic first(true); // Check kernel version - not a concurrency concern. // use non-RAW for getres due to bug in older 2.6.x kernels if (first.load(std::memory_order_acquire)) { utsname kernelInfo; if (uname(&kernelInfo) == 0) { try { std::string ver = kernelInfo.release; size_t idx; int major = std::stoi(ver, &idx); int minor = std::stoi(ver.substr(idx + 1)); if ((major >= 4) && (minor >= 4)) { clock = CLOCK_MONOTONIC_RAW; } } catch (...) { // Kernel version string doesn't conform to the standard pattern. // Keep using the "safe" (non-RAW) clock. } } first.store(false, std::memory_order_release); } timespec time; int err = clock_getres(clock, &time); assert(err == 0 && "clock_getres(CLOCK_MONOTONIC(_RAW),...) failed"); assert(time.tv_sec == 0 && "clock_getres(CLOCK_MONOTONIC(_RAW),...) returned very low frequency " "(<1Hz)."); assert(time.tv_nsec < 0xFFFFFFFF && "clock_getres(CLOCK_MONOTONIC(_RAW),...) returned very low frequency " "(<1Hz)."); if (invPeriod == 0.0) invPeriod = 1.0 / double(time.tv_nsec); return 1000000000ull / uint64_t(time.tv_nsec); } SharedMutex CreateSharedMutex() { pthread_rwlockattr_t attrib; int err = pthread_rwlockattr_init(&attrib); if (err != 0) { assert(false && "rw lock attribute init failed."); return nullptr; } err = pthread_rwlockattr_setkind_np(&attrib, PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP); assert(err == 0 && "Set rw lock attribute failure."); pthread_rwlock_t* lock = new pthread_rwlock_t; err = pthread_rwlock_init(lock, &attrib); if (err != 0) { assert(false && "rw lock init failed."); return nullptr; } pthread_rwlockattr_destroy(&attrib); return lock; } bool TryAcquireSharedMutex(SharedMutex lock) { int err = pthread_rwlock_trywrlock(*(pthread_rwlock_t**)&lock); return err == 0; } bool AcquireSharedMutex(SharedMutex lock) { int err = pthread_rwlock_wrlock(*(pthread_rwlock_t**)&lock); return err == 0; } void ReleaseSharedMutex(SharedMutex lock) { int err = pthread_rwlock_unlock(*(pthread_rwlock_t**)&lock); assert(err == 0 && "SharedMutex unlock failed."); } bool TrySharedAcquireSharedMutex(SharedMutex lock) { int err = pthread_rwlock_tryrdlock(*(pthread_rwlock_t**)&lock); return err == 0; } bool SharedAcquireSharedMutex(SharedMutex lock) { int err = pthread_rwlock_rdlock(*(pthread_rwlock_t**)&lock); return err == 0; } void SharedReleaseSharedMutex(SharedMutex lock) { int err = pthread_rwlock_unlock(*(pthread_rwlock_t**)&lock); assert(err == 0 && "SharedMutex unlock failed."); } void DestroySharedMutex(SharedMutex lock) { pthread_rwlock_destroy(*(pthread_rwlock_t**)&lock); delete *(pthread_rwlock_t**)&lock; } } // namespace os } // namespace rocr #endif ROCR-Runtime-rocm-5.0.0/src/core/util/locks.h000066400000000000000000000163741420110115200206160ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // Library of syncronization primitives - to be added to as needed. #ifndef HSA_RUNTIME_CORE_UTIL_LOCKS_H_ #define HSA_RUNTIME_CORE_UTIL_LOCKS_H_ #include "utils.h" #include "os.h" namespace rocr { /// @brief: a class represents a kernel mutex. /// Uses the kernel's scheduler to keep the waiting thread from being scheduled /// until the lock is released (Best for long waits, though anything using /// a kernel object is a long wait). class KernelMutex { public: KernelMutex() { lock_ = os::CreateMutex(); } ~KernelMutex() { os::DestroyMutex(lock_); } bool Try() { return os::TryAcquireMutex(lock_); } bool Acquire() { return os::AcquireMutex(lock_); } void Release() { os::ReleaseMutex(lock_); } private: os::Mutex lock_; /// @brief: Disable copiable and assignable ability. DISALLOW_COPY_AND_ASSIGN(KernelMutex); }; /// @brief: represents a spin lock. /// For very short hold durations on the order of the thread scheduling /// quanta or less. class SpinMutex { public: SpinMutex() { lock_ = 0; } bool Try() { int old = 0; return lock_.compare_exchange_strong(old, 1); } bool Acquire() { int old = 0; while (!lock_.compare_exchange_strong(old, 1)) { old=0; os::YieldThread(); } return true; } void Release() { lock_ = 0; } private: std::atomic lock_; /// @brief: Disable copiable and assignable ability. DISALLOW_COPY_AND_ASSIGN(SpinMutex); }; class KernelEvent { public: KernelEvent() { evt_ = os::CreateOsEvent(true, true); } ~KernelEvent() { os::DestroyOsEvent(evt_); } bool IsSet() { return os::WaitForOsEvent(evt_, 0)==0; } bool WaitForSet() { return os::WaitForOsEvent(evt_, 0xFFFFFFFF)==0; } void Set() { os::SetOsEvent(evt_); } void Reset() { os::ResetOsEvent(evt_); } private: os::EventHandle evt_; /// @brief: Disable copiable and assignable ability. DISALLOW_COPY_AND_ASSIGN(KernelEvent); }; /// @brief: represents a yielding shared mutex. /// aka read/write mutex class KernelSharedMutex { public: /// @brief: Interfaces ScopedAcquire to shared operations. class Shared { public: explicit Shared(KernelSharedMutex* lock) : lock_(lock) {} bool Try() { return lock_->TryShared(); } bool Acquire() { return lock_->AcquireShared(); } void Release() { lock_->ReleaseShared(); } private: KernelSharedMutex* lock_; }; KernelSharedMutex() { lock_ = os::CreateSharedMutex(); } ~KernelSharedMutex() { os::DestroySharedMutex(lock_); } // Exclusive mode operations bool Try() { return os::TryAcquireSharedMutex(lock_); } bool Acquire() { return os::AcquireSharedMutex(lock_); } void Release() { os::ReleaseSharedMutex(lock_); } // Shared mode operations bool TryShared() { return os::TrySharedAcquireSharedMutex(lock_); } bool AcquireShared() { return os::SharedAcquireSharedMutex(lock_); } void ReleaseShared() { os::SharedReleaseSharedMutex(lock_); } // Return shared operations interface Shared shared() { return Shared(this); } private: os::SharedMutex lock_; /// @brief: Disable copiable and assignable ability. DISALLOW_COPY_AND_ASSIGN(KernelSharedMutex); }; /// @brief: Type trait to identify mutex types template class isMutex { public: enum { value = false }; }; template <> class isMutex { public: enum { value = true }; }; template <> class isMutex { public: enum { value = true }; }; template <> class isMutex { public: enum { value = true }; }; /// @brief: A class behaves as a lock in a scope. When trying to enter into the /// critical section, creat a object of this class. After the control path goes /// out of the scope, it will release the lock automatically. template class ScopedAcquire { public: /// @brief: When constructing, acquire the lock. /// @param: lock(Input), pointer to an existing lock. explicit ScopedAcquire(LockType* lock) : lock_(lock), doRelease(true) { static_assert(isMutex::value, "ScopedAcquire requires a mutex type."); lock_.Acquire(); } explicit ScopedAcquire(LockType lock) : lock_(lock), doRelease(true) { static_assert(!isMutex::value, "Mutex types are not copyable."); lock_.Acquire(); } /// @brief: when destructing, release the lock. ~ScopedAcquire() { if (doRelease) lock_.Release(); } /// @brief: Release the lock early. Avoid using when possible. void Release() { lock_.Release(); doRelease = false; } private: /// @brief: Adapts between pointers to mutex types and mutex pointer types. template class container { public: container(T* lock) : lock_(lock) {} __forceinline bool Acquire() { return lock_->Acquire(); } __forceinline void Release() { return lock_->Release(); } private: T* lock_; }; /// @brief: Specialization for mutex pointer types. template class container { public: container(T lock) : lock_(lock) {} __forceinline bool Acquire() { return lock_.Acquire(); } __forceinline void Release() { return lock_.Release(); } private: T lock_; }; container::value> lock_; bool doRelease; /// @brief: Disable copiable and assignable ability. DISALLOW_COPY_AND_ASSIGN(ScopedAcquire); }; } // namespace rocr #endif // HSA_RUNTIME_CORE_SUTIL_LOCKS_H_ ROCR-Runtime-rocm-5.0.0/src/core/util/os.h000066400000000000000000000236761420110115200201270ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // Minimal operating system abstraction interfaces. #ifndef HSA_RUNTIME_CORE_UTIL_OS_H_ #define HSA_RUNTIME_CORE_UTIL_OS_H_ #include #include "utils.h" namespace rocr { namespace os { typedef void* LibHandle; typedef void* Mutex; typedef void* SharedMutex; typedef void* Thread; typedef void* EventHandle; enum class os_t { OS_WIN = 0, OS_LINUX, COUNT }; static __forceinline std::underlying_type::type os_index(os_t val) { return std::underlying_type::type(val); } #ifdef _WIN32 static const os_t current_os = os_t::OS_WIN; #elif __linux__ static const os_t current_os = os_t::OS_LINUX; #else static_assert(false, "Operating System not detected!"); #endif /// @brief: Loads dynamic library based on file name. Return value will be NULL /// if failed. /// @param: filename(Input), file name of the library. /// @return: LibHandle. LibHandle LoadLib(std::string filename); /// @brief: Gets the address of exported symbol. Return NULl if failed. /// @param: lib(Input), library handle which exporting from. /// @param: export_name(Input), the name of the exported symbol. /// @return: void*. void* GetExportAddress(LibHandle lib, std::string export_name); /// @brief: Unloads the dynamic library. /// @param: lib(Input), library handle which will be unloaded. void CloseLib(LibHandle lib); /// @brief: Creates a mutex, will return NULL if failed. /// @param: void. /// @return: Mutex. Mutex CreateMutex(); /// @brief: Tries to acquire the mutex once, if successed, return true. /// @param: lock(Input), handle to the mutex. /// @return: bool. bool TryAcquireMutex(Mutex lock); /// @brief: Aquires the mutex, if the mutex is locked, it will wait until it is /// released. If the mutex is acquired successfully, it will return true. /// @param: lock(Input), handle to the mutex. /// @return: bool. bool AcquireMutex(Mutex lock); /// @brief: Releases the mutex. /// @param: lock(Input), handle to the mutex. /// @return: void. void ReleaseMutex(Mutex lock); /// @brief: Destroys the mutex. /// @param: lock(Input), handle to the mutex. /// @return: void. void DestroyMutex(Mutex lock); /// @brief: Creates a shared mutex, will return NULL if failed. /// @param: void. /// @return: SharedMutex. SharedMutex CreateSharedMutex(); /// @brief: Tries to acquire the mutex in exclusive mode once, if successed, return true. /// @param: lock(Input), handle to the shared mutex. /// @return: bool. bool TryAcquireSharedMutex(SharedMutex lock); /// @brief: Aquires the mutex in exclusive mode, if the mutex is locked, it will wait until it is /// released. If the mutex is acquired successfully, it will return true. /// @param: lock(Input), handle to the mutex. /// @return: bool. bool AcquireSharedMutex(SharedMutex lock); /// @brief: Releases the mutex from exclusive mode. /// @param: lock(Input), handle to the mutex. /// @return: void. void ReleaseSharedMutex(SharedMutex lock); /// @brief: Tries to acquire the mutex in shared mode once, if successed, return true. /// @param: lock(Input), handle to the mutex. /// @return: bool. bool TrySharedAcquireSharedMutex(SharedMutex lock); /// @brief: Aquires the mutex in shared mode, if the mutex in exclusive mode, it will wait until it /// is released. If the mutex is acquired successfully, it will return true. /// @param: lock(Input), handle to the mutex. /// @return: bool. bool SharedAcquireSharedMutex(SharedMutex lock); /// @brief: Releases the mutex from shared mode. /// @param: lock(Input), handle to the mutex. /// @return: void. void SharedReleaseSharedMutex(SharedMutex lock); /// @brief: Destroys the mutex. /// @param: lock(Input), handle to the mutex. /// @return: void. void DestroySharedMutex(SharedMutex lock); /// @brief: Puts current thread to sleep. /// @param: delayInMs(Input), time in millisecond for sleeping. /// @return: void. void Sleep(int delayInMs); /// @brief: Puts current thread to sleep. /// @param: delayInMs(Input), time in millisecond for sleeping. /// @return: void. void uSleep(int delayInUs); /// @brief: Yields current thread. /// @param: void. /// @return: void. void YieldThread(); typedef void (*ThreadEntry)(void*); /// @brief: Creates a thread will return NULL if failed. /// @param: entry_function(Input), a pointer to the function which the thread /// starts from. /// @param: entry_argument(Input), a pointer to the argument of the thread /// function. /// @param: stack_size(Input), size of the thread's stack, 0 by default. /// @return: Thread, a handle to thread created. Thread CreateThread(ThreadEntry entry_function, void* entry_argument, uint stack_size = 0); /// @brief: Destroys the thread. /// @param: thread(Input), thread handle to what will be destroyed. /// @return: void. void CloseThread(Thread thread); /// @brief: Waits for specific thread to finish, if successful, return true. /// @param: thread(Input), handle to waiting thread. /// @return: bool. bool WaitForThread(Thread thread); /// @brief: Waits for multiple threads to finish, if successful, return true. /// @param; threads(Input), a pointer to a list of thread handle. /// @param: thread_count(Input), number of threads to be waited on. /// @return: bool. bool WaitForAllThreads(Thread* threads, uint thread_count); /// @brief: Determines if environment key is set. /// @param: env_var_name(Input), name of the environment value. /// @return: bool, true for binding any value to environment key, /// including an empty string. False otherwise bool IsEnvVarSet(std::string env_var_name); /// @brief: Sets the environment value. /// @param: env_var_name(Input), name of the environment value. /// @param: env_var_value(Input), value of the environment value.s /// @return: void. void SetEnvVar(std::string env_var_name, std::string env_var_value); /// @brief: Gets the value of environment value. /// @param: env_var_name(Input), name of the environment value. /// @return: std::string, value of the environment value, returned as string. std::string GetEnvVar(std::string env_var_name); /// @brief: Gets the max virtual memory size accessible to the application. /// @param: void. /// @return: size_t, size of the accessible memory to the application. size_t GetUserModeVirtualMemorySize(); /// @brief: Gets the max physical host system memory size. /// @param: void. /// @return: size_t, size of the physical host system memory. size_t GetUsablePhysicalHostMemorySize(); /// @brief: Gets the virtual memory base address. It is hardcoded to 0. /// @param: void. /// @return: uintptr_t, always 0. uintptr_t GetUserModeVirtualMemoryBase(); /// @brief os event api, create an event /// @param: auto_reset whether an event can reset the status automatically /// @param: init_state initial state of the event /// @return: event handle EventHandle CreateOsEvent(bool auto_reset, bool init_state); /// @brief os event api, destroy an event /// @param: event handle /// @return: whether destroy is correct int DestroyOsEvent(EventHandle event); /// @brief os event api, wait on event /// @param: event Event handle /// @param: milli_seconds wait time /// @return: Indicate success or timeout int WaitForOsEvent(EventHandle event, unsigned int milli_seconds); /// @brief os event api, set event state /// @param: event Event handle /// @return: Whether event set is correct int SetOsEvent(EventHandle event); /// @brief os event api, reset event state /// @param: event Event handle /// @return: Whether event reset is correct int ResetOsEvent(EventHandle event); /// @brief reads a clock which is deemed to be accurate for elapsed time /// measurements, though not necessarilly fast to query /// @return clock counter value uint64_t ReadAccurateClock(); /// @brief retrieves the frequency in Hz of the unit used in ReadAccurateClock. /// It does not necessarilly reflect the resolution of the clock, but is the /// value needed to convert a difference in the clock's counter value to elapsed /// seconds. This frequency does not change at runtime. /// @return returns the frequency uint64_t AccurateClockFrequency(); } // namespace os } // namespace rocr #endif // HSA_RUNTIME_CORE_UTIL_OS_H_ ROCR-Runtime-rocm-5.0.0/src/core/util/simple_heap.h000066400000000000000000000251331420110115200217620ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // A simple best fit memory allocator with eager compaction. Manages block sub-allocation. // For use when memory efficiency is more important than allocation speed. // O(log n) time. #ifndef HSA_RUNTME_CORE_UTIL_SIMPLE_HEAP_H_ #define HSA_RUNTME_CORE_UTIL_SIMPLE_HEAP_H_ #include #include #include #include "core/util/utils.h" namespace rocr { template class SimpleHeap { private: struct Fragment_T { typedef std::multimap::iterator ptr_t; ptr_t free_list_entry_; struct { size_t size : 62; bool discard : 1; bool free : 1; }; Fragment_T(ptr_t Iterator, size_t Len, bool Free) : free_list_entry_(Iterator), size(Len), discard(false), free(Free) {} Fragment_T() = default; }; struct Block { uintptr_t base_ptr_; size_t length_; Block(uintptr_t base, size_t length) : base_ptr_(base), length_(length) {} Block() = default; }; Allocator block_allocator_; std::multimap free_list_; std::map> block_list_; std::deque block_cache_; // Size of blocks that are at least partially in use. size_t in_use_size_; // Total size of block cache size_t cache_size_; __forceinline bool isFree(const Fragment_T& node) { return node.free; } __forceinline void setUsed(Fragment_T& node) { node.free = false; node.free_list_entry_ = free_list_.end(); } __forceinline void setFree(Fragment_T& node, typename Fragment_T::ptr_t Iterator) { node.free_list_entry_ = Iterator; node.free = true; } __forceinline Fragment_T makeFragment(size_t Len) { return Fragment_T(free_list_.end(), Len, false); } __forceinline Fragment_T makeFragment(typename Fragment_T::ptr_t Iterator, size_t Len) { return Fragment_T(Iterator, Len, true); } __forceinline void removeFreeListEntry(Fragment_T& node) { if (node.free_list_entry_ != free_list_.end()) { free_list_.erase(node.free_list_entry_); node.free_list_entry_ = free_list_.end(); } } __forceinline void discard(Fragment_T& node) { removeFreeListEntry(node); node.discard = true; } public: explicit SimpleHeap(const Allocator& BlockAllocator = Allocator()) : block_allocator_(BlockAllocator), in_use_size_(0), cache_size_(0) {} ~SimpleHeap() { trim(); // Leak here may be due to the user. Check is for debugging only. // assert(in_use_size_ == 0 && "Leak in SimpleHeap."); } SimpleHeap(const SimpleHeap& rhs) = delete; SimpleHeap(SimpleHeap&& rhs) = delete; SimpleHeap& operator=(const SimpleHeap& rhs) = delete; SimpleHeap& operator=(SimpleHeap&& rhs) = delete; void* alloc(size_t bytes) { // Find best fit. auto free_fragment = free_list_.lower_bound(bytes); uintptr_t base; size_t size; if (free_fragment != free_list_.end()) { base = free_fragment->second; size = free_fragment->first; free_list_.erase(free_fragment); assert(size >= bytes && "SimpleHeap: map lower_bound failure."); // Find the containing block and fragment auto it = block_list_.upper_bound(base); it--; auto& frag_map = it->second; const auto& fragment = frag_map.find(base); assert(fragment != frag_map.end() && "Inconsistency in SimpleHeap."); assert(size == fragment->second.size && "Inconsistency in SimpleHeap."); // Sub-allocate from fragment. fragment->second.size = bytes; setUsed(fragment->second); // Record remaining free space. if (size > bytes) { free_fragment = free_list_.insert(std::make_pair(size - bytes, base + bytes)); frag_map[base + bytes] = makeFragment(free_fragment, size - bytes); } return reinterpret_cast(base); } // No usable fragment, check block cache if (bytes < default_block_size() && !block_cache_.empty()) { const auto& block = block_cache_.back(); base = block.base_ptr_; size = block.length_; block_cache_.pop_back(); cache_size_ -= size; } else { // Alloc new block - new block may be larger than default. void* ptr = block_allocator_.alloc(bytes, size); base = reinterpret_cast(ptr); assert(ptr != nullptr && "Block allocation failed, Allocator is expected to throw."); } in_use_size_ += size; assert(size >= bytes && "Alloc exceeds block size."); // Sub alloc and insert free region. if (size > bytes) { free_fragment = free_list_.insert(std::make_pair(size - bytes, base + bytes)); block_list_[base][base + bytes] = makeFragment(free_fragment, size - bytes); } // Track used region block_list_[base][base] = makeFragment(bytes); // Disallow multiple suballocation from large blocks. // Prevents a small allocation from retaining a large block. if (bytes > default_block_size()) { bool err = discardBlock(reinterpret_cast(base)); assert(err && "Large block discard failed."); } return reinterpret_cast(base); } bool free(void* ptr) { if (ptr == nullptr) return true; uintptr_t base = reinterpret_cast(ptr); // Find fragment and validate. auto frag_map_it = block_list_.upper_bound(base); if (frag_map_it == block_list_.begin()) return false; frag_map_it--; auto& frag_map = frag_map_it->second; auto fragment = frag_map.find(base); if (fragment == frag_map.end() || isFree(fragment->second)) return false; bool discard = fragment->second.discard; // Merge lower if (fragment != frag_map.begin()) { auto lower = fragment; lower--; if (isFree(lower->second)) { removeFreeListEntry(lower->second); lower->second.size += fragment->second.size; frag_map.erase(fragment); fragment = lower; } } // Merge upper { auto upper = fragment; upper++; if ((upper != frag_map.end()) && isFree(upper->second)) { removeFreeListEntry(upper->second); fragment->second.size += upper->second.size; frag_map.erase(upper); } } // Release whole free blocks. if (frag_map.size() == 1) { Block block(fragment->first, fragment->second.size); block_list_.erase(frag_map_it); // Discard or add to the block cache. if (discard) { block_allocator_.free(reinterpret_cast(block.base_ptr_), block.length_); } else { block_cache_.push_back(block); cache_size_ += block.length_; in_use_size_ -= block.length_; } balance(); // Don't publish free space since block was moved to the cache. return true; } // Don't report free memory if discarding the fragment. if (discard) return true; // Report free fragment const auto& freeEntry = free_list_.insert(std::make_pair(size_t(fragment->second.size), fragment->first)); setFree(fragment->second, freeEntry); return true; } void balance() { // Release old blocks when over cache limit. while ((block_cache_.size() > 1) && (cache_size_ > in_use_size_ * 2)) { const auto& block = block_cache_.front(); block_allocator_.free(reinterpret_cast(block.base_ptr_), block.length_); cache_size_ -= block.length_; block_cache_.pop_front(); } } void trim() { for (const auto& block : block_cache_) block_allocator_.free(reinterpret_cast(block.base_ptr_), block.length_); block_cache_.clear(); cache_size_ = 0; } size_t default_block_size() const { return block_allocator_.block_size(); } // Prevent reuse of the block containing ptr. No further fragments will be allocated from the // block and the block will not be added to the block cache when it is free. bool discardBlock(void* ptr) { if (ptr == nullptr) return true; uintptr_t base = reinterpret_cast(ptr); // Find block validate. auto frag_map_it = block_list_.upper_bound(base); if (frag_map_it == block_list_.begin()) return false; frag_map_it--; auto& frag_map = frag_map_it->second; if ((base < frag_map.begin()->first) || (frag_map.rbegin()->first + frag_map.rbegin()->second.size <= base)) return false; // Mark all fragments for discard and compute block size. Removes freelist records for all // fragments in the block. size_t size = 0; for (auto& frag : frag_map) { discard(frag.second); size += frag.second.size; } // Remove discarded block from in-use tracking and rebalance the block cache. in_use_size_ -= size; balance(); return true; } }; } // namespace rocr #endif // HSA_RUNTME_CORE_UTIL_SIMPLE_HEAP_H_ ROCR-Runtime-rocm-5.0.0/src/core/util/small_heap.cpp000066400000000000000000000141161420110115200221330ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "small_heap.h" namespace rocr { // Inserts node into freelist after place. // Assumes node will not be an end of the list (list has guard nodes). void SmallHeap::insertafter(SmallHeap::iterator_t place, SmallHeap::iterator_t node) { assert(place->first < node->first && "Order violation"); assert(isfree(place->second) && "Freelist operation error."); iterator_t next = place->second.next; node->second.next = next; node->second.prior = place; place->second.next = node; next->second.prior = node; } // Removes node from freelist. // Assumes node will not be an end of the list (list has guard nodes). void SmallHeap::remove(SmallHeap::iterator_t node) { assert(isfree(node->second) && "Freelist operation error."); node->second.prior->second.next = node->second.next; node->second.next->second.prior = node->second.prior; setused(node->second); } // Returns high if merge failed or the merged node. SmallHeap::memory_t::iterator SmallHeap::merge(SmallHeap::memory_t::iterator low, SmallHeap::memory_t::iterator high) { assert(isfree(low->second) && "Merge with allocated block"); assert(isfree(high->second) && "Merge with allocated block"); if ((char*)low->first + low->second.len != (char*)high->first) return high; assert(!islastfree(high->second) && "Illegal merge."); low->second.len += high->second.len; low->second.next = high->second.next; high->second.next->second.prior = low; memory.erase(high); return low; } void SmallHeap::free(void* ptr) { if (ptr == nullptr) return; auto iterator = memory.find(ptr); // Check for illegal free if (iterator == memory.end()) { assert(false && "Illegal free."); return; } // Return memory to total and link node into free list total_free += iterator->second.len; // Could also traverse the free list which might be faster in some cases. auto before = iterator; before--; while (!isfree(before->second)) before--; assert(before->second.next->first > iterator->first && "Inconsistency in small heap."); insertafter(before, iterator); // Attempt compaction iterator = merge(before, iterator); merge(iterator, iterator->second.next); // Update lowHighBondary high.erase(ptr); } void* SmallHeap::alloc(size_t bytes) { // Is enough memory available? if ((bytes > total_free) || (bytes == 0)) return nullptr; iterator_t current; // Walk the free list and allocate at first fitting location current = firstfree(); while (!islastfree(current->second)) { if (bytes <= current->second.len) { // Decrement from total total_free -= bytes; // Split node if (bytes != current->second.len) { void* remaining = (char*)current->first + bytes; Node& node = memory[remaining]; node.len = current->second.len - bytes; current->second.len = bytes; insertafter(current, memory.find(remaining)); } remove(current); return current->first; } current = current->second.next; } assert(current->second.len == 0 && "Freelist corruption."); // Can't service the request due to fragmentation return nullptr; } void* SmallHeap::alloc_high(size_t bytes) { // Is enough memory available? if ((bytes > total_free) || (bytes == 0)) return nullptr; iterator_t current; // Walk the free list and allocate at first fitting location current = lastfree(); while (!isfirstfree(current->second)) { if (bytes <= current->second.len) { // Decrement from total total_free -= bytes; void* alloc; // Split node if (bytes != current->second.len) { alloc = (char*)current->first + current->second.len - bytes; current->second.len -= bytes; Node& node = memory[alloc]; node.len = bytes; setused(node); } else { alloc = current->first; remove(current); } high.insert(alloc); return alloc; } current = current->second.prior; } assert(current->second.len == 0 && "Freelist corruption."); // Can't service the request due to fragmentation return nullptr; } } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/util/small_heap.h000066400000000000000000000110661420110115200216010ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // A simple first fit memory allocator with eager compaction. For use with few // items (where list iteration is faster than trees). // Not thread safe! #ifndef HSA_RUNTME_CORE_UTIL_SMALL_HEAP_H_ #define HSA_RUNTME_CORE_UTIL_SMALL_HEAP_H_ #include #include #include "utils.h" namespace rocr { class SmallHeap { private: struct Node; typedef std::map memory_t; typedef memory_t::iterator iterator_t; struct Node { size_t len; iterator_t next; iterator_t prior; }; SmallHeap(const SmallHeap& rhs) = delete; SmallHeap& operator=(const SmallHeap& rhs) = delete; void* const pool; const size_t length; size_t total_free; memory_t memory; std::set high; __forceinline bool isfree(const Node& node) const { return node.next != memory.begin(); } __forceinline bool islastfree(const Node& node) const { return node.next == memory.end(); } __forceinline bool isfirstfree(const Node& node) const { return node.prior == memory.end(); } __forceinline void setlastfree(Node& node) { node.next = memory.end(); } __forceinline void setfirstfree(Node& node) { node.prior = memory.end(); } __forceinline void setused(Node& node) { node.next = memory.begin(); } __forceinline iterator_t firstfree() { return memory.begin()->second.next; } __forceinline iterator_t lastfree() { return memory.rbegin()->second.prior; } void insertafter(iterator_t place, iterator_t node); void remove(iterator_t node); iterator_t merge(iterator_t low, iterator_t high); public: SmallHeap() : pool(nullptr), length(0), total_free(0) {} SmallHeap(void* base, size_t length) : pool(base), length(length), total_free(length) { assert(pool != nullptr && "Invalid base address."); assert(pool != (void*)0xFFFFFFFFFFFFFFFFull && "Invalid base address."); assert((char*)pool + length != (char*)0xFFFFFFFFFFFFFFFFull && "Invalid pool bounds."); Node& start = memory[0]; Node& node = memory[pool]; Node& end = memory[(void*)0xFFFFFFFFFFFFFFFFull]; start.len = 0; start.next = memory.find(pool); setfirstfree(start); node.len = length; node.prior = memory.begin(); node.next = --memory.end(); end.len = 0; end.prior = start.next; setlastfree(end); high.insert((void*)0xFFFFFFFFFFFFFFFFull); } void* alloc(size_t bytes); void* alloc_high(size_t bytes); void free(void* ptr); void* base() const { return pool; } size_t size() const { return length; } size_t remaining() const { return total_free; } void* high_split() const { return *high.begin(); } }; } // namespace rocr #endif ROCR-Runtime-rocm-5.0.0/src/core/util/timer.cpp000066400000000000000000000100651420110115200211450ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/util/timer.h" namespace rocr { namespace timer { accurate_clock::init::init() { freq = os::AccurateClockFrequency(); accurate_clock::period_ns = 1e9 / double(freq); } // Calibrates the fast clock using the accurate clock. fast_clock::init::init() { typedef accurate_clock clock; clock::duration delay(std::chrono::milliseconds(1)); // calibrate clock fast_clock::raw_rep min = 0; clock::duration elapsed = clock::duration::max(); do { for (int t = 0; t < 10; t++) { fast_clock::raw_rep r1, r2; clock::time_point t0, t1, t2, t3; t0 = clock::now(); std::atomic_signal_fence(std::memory_order_acq_rel); r1 = fast_clock::raw_now(); std::atomic_signal_fence(std::memory_order_acq_rel); t1 = clock::now(); std::atomic_signal_fence(std::memory_order_acq_rel); do { t2 = clock::now(); } while (t2 - t1 < delay); std::atomic_signal_fence(std::memory_order_acq_rel); r2 = fast_clock::raw_now(); std::atomic_signal_fence(std::memory_order_acq_rel); t3 = clock::now(); // If elapsed time is shorter than last recorded time and both the start // and end times are confirmed correlated then record the clock readings. // This protects against inaccuracy due to thread switching if ((t3 - t1 < elapsed) && ((t1 - t0) * 10 < (t2 - t1)) && ((t3 - t2) * 10 < (t2 - t1))) { elapsed = t3 - t1; min = r2 - r1; } } delay += delay; } while (min < 1000); fast_clock::freq = double(min) / duration_in_seconds(elapsed); fast_clock::period_ps = 1e12 / fast_clock::freq; // printf("Timer setup took %f ms\n", duration_in_seconds(elapsed)*1000.0f); // printf("Fast clock frequency: %f MHz\n", double(fast_clock::freq)/1e6); } double accurate_clock::period_ns; accurate_clock::raw_frequency accurate_clock::freq; accurate_clock::init accurate_clock::accurate_clock_init; double fast_clock::period_ps; fast_clock::raw_frequency fast_clock::freq; fast_clock::init fast_clock::fast_clock_init; } // namespace timer } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/core/util/timer.h000066400000000000000000000127631420110115200206210ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_UTIL_TIMER_H_ #define HSA_RUNTIME_CORE_UTIL_TIMER_H_ #include "core/util/utils.h" #include "core/util/os.h" #include #include #include namespace rocr { namespace timer { // Needed to patch around a mixed arithmetic bug in MSVC's duration_cast as of // VS 2013. template struct wide_type { typedef double type; }; template <> struct wide_type { typedef uintmax_t type; }; template <> struct wide_type { typedef intmax_t type; }; template static __forceinline To duration_cast(const std::chrono::duration& d) { typedef typename wide_type::value, std::is_signed::value>::type wide; typedef std::chrono::duration unit_convert_t; unit_convert_t temp = std::chrono::duration_cast(d); return To(static_cast(temp.count())); } // End patch template static __forceinline double duration_in_seconds( std::chrono::duration delta) { typedef std::chrono::duration> seconds; return seconds(delta).count(); } template static __forceinline rep duration_from_seconds(double delta) { typedef std::chrono::duration> seconds; return std::chrono::duration_cast(seconds(delta)); } // Provices a C++11 standard clock interface to the os::AccurateClock functions class accurate_clock { public: typedef double rep; typedef std::nano period; typedef std::chrono::duration duration; typedef std::chrono::time_point time_point; static const bool is_steady = true; static __forceinline time_point now() { return time_point(duration(raw_now() * period_ns)); } // These two extra APIs and types let us use clocks without conversion to the // arbitrary period unit typedef uint64_t raw_rep; typedef uint64_t raw_frequency; static __forceinline raw_rep raw_now() { return os::ReadAccurateClock(); } static __forceinline raw_frequency raw_freq() { return freq; } private: static double period_ns; static raw_frequency freq; class init { public: init(); }; static init accurate_clock_init; }; // Provices a C++11 standard clock interface to the lowest latency approximate // clock class fast_clock { public: typedef double rep; typedef std::pico period; typedef std::chrono::duration duration; typedef std::chrono::time_point time_point; static const bool is_steady = true; static __forceinline time_point now() { return time_point(duration(raw_now() * period_ps)); } // These two extra APIs and types let us use clocks without conversion to the // arbitrary period unit typedef uint64_t raw_rep; typedef double raw_frequency; #ifdef __x86_64__ static __forceinline raw_rep raw_now() { return __rdtsc(); } static __forceinline raw_frequency raw_freq() { return freq; } #else static __forceinline raw_rep raw_now() { struct timespec ts; clock_gettime(CLOCK_MONOTONIC_RAW, &ts); return (raw_rep(ts.tv_sec) * 1000000000 + raw_rep(ts.tv_nsec)); } static __forceinline raw_frequency raw_freq() { return 1.e-9; } #endif private: static double period_ps; static raw_frequency freq; class init { public: init(); }; static init fast_clock_init; }; } // namespace timer } // namespace rocr #endif ROCR-Runtime-rocm-5.0.0/src/core/util/utils.h000066400000000000000000000322011420110115200206260ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // Generally useful utility functions #ifndef HSA_RUNTIME_CORE_UTIL_UTILS_H_ #define HSA_RUNTIME_CORE_UTIL_UTILS_H_ #include "stdint.h" #include "stddef.h" #include "stdlib.h" #include #include #include #include namespace rocr { typedef unsigned int uint; typedef uint64_t uint64; #if defined(__GNUC__) #if defined(__i386__) || defined(__x86_64__) #include #endif #define __forceinline __inline__ __attribute__((always_inline)) #define __declspec(x) __attribute__((x)) #undef __stdcall #define __stdcall // __attribute__((__stdcall__)) #define __ALIGNED__(x) __attribute__((aligned(x))) static __forceinline void* _aligned_malloc(size_t size, size_t alignment) { #ifdef _ISOC11_SOURCE return aligned_alloc(alignment, size); #else void *mem = NULL; if (NULL != posix_memalign(&mem, alignment, size)) return NULL; return mem; #endif } static __forceinline void _aligned_free(void* ptr) { return free(ptr); } #elif defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64)) #include "intrin.h" #define __ALIGNED__(x) __declspec(align(x)) #if (_MSC_VER < 1800) // < VS 2013 static __forceinline unsigned long long int strtoull(const char* str, char** endptr, int base) { return static_cast(_strtoui64(str, endptr, base)); } #endif #if (_MSC_VER < 1900) // < VS 2015 #define thread_local __declspec(thread) #endif #else #error "Compiler and/or processor not identified." #endif #define STRING2(x) #x #define STRING(x) STRING2(x) #define PASTE2(x, y) x##y #define PASTE(x, y) PASTE2(x, y) #ifdef NDEBUG #define debug_warning_n(exp, limit) \ do { \ } while (false) #else #define debug_warning_n(exp, limit) \ do { \ static std::atomic count(0); \ if (!(exp) && (limit == 0 || count < limit)) { \ fprintf(stderr, "Warning: " STRING(exp) " in %s, " __FILE__ ":" STRING(__LINE__) "\n", \ __PRETTY_FUNCTION__); \ count++; \ } \ } while (false) #endif #define debug_warning(exp) debug_warning_n((exp), 0) #ifdef NDEBUG #define debug_print(fmt, ...) \ do { \ } while (false) #else #define debug_print(fmt, ...) \ do { \ fprintf(stderr, fmt, ##__VA_ARGS__); \ } while (false) #endif #ifdef NDEBUG #define ifdebug if (false) #else #define ifdebug if (true) #endif // A macro to disallow the copy and move constructor and operator= functions #define DISALLOW_COPY_AND_ASSIGN(TypeName) \ TypeName(const TypeName&) = delete; \ TypeName(TypeName&&) = delete; \ void operator=(const TypeName&) = delete; \ void operator=(TypeName&&) = delete; template class ScopeGuard { public: explicit __forceinline ScopeGuard(const lambda& release) : release_(release), dismiss_(false) {} ScopeGuard(ScopeGuard& rhs) { *this = rhs; } __forceinline ~ScopeGuard() { if (!dismiss_) release_(); } __forceinline ScopeGuard& operator=(ScopeGuard& rhs) { dismiss_ = rhs.dismiss_; release_ = rhs.release_; rhs.dismiss_ = true; return *this; } __forceinline void Dismiss() { dismiss_ = true; } private: lambda release_; bool dismiss_; }; template static __forceinline ScopeGuard MakeScopeGuard(lambda rel) { return ScopeGuard(rel); } #define MAKE_SCOPE_GUARD_HELPER(lname, sname, ...) \ auto lname = __VA_ARGS__; \ ScopeGuard sname(lname); #define MAKE_SCOPE_GUARD(...) \ MAKE_SCOPE_GUARD_HELPER(PASTE(scopeGuardLambda, __COUNTER__), \ PASTE(scopeGuard, __COUNTER__), __VA_ARGS__) #define MAKE_NAMED_SCOPE_GUARD(name, ...) \ MAKE_SCOPE_GUARD_HELPER(PASTE(scopeGuardLambda, __COUNTER__), name, \ __VA_ARGS__) /// @brief: Finds out the min one of two inputs, input must support ">" /// operator. /// @param: a(Input), a reference to type T. /// @param: b(Input), a reference to type T. /// @return: T. template static __forceinline T Min(const T& a, const T& b) { return (a > b) ? b : a; } template static __forceinline T Min(const T& a, const T& b, Arg... args) { return Min(a, Min(b, args...)); } /// @brief: Find out the max one of two inputs, input must support ">" operator. /// @param: a(Input), a reference to type T. /// @param: b(Input), a reference to type T. /// @return: T. template static __forceinline T Max(const T& a, const T& b) { return (b > a) ? b : a; } template static __forceinline T Max(const T& a, const T& b, Arg... args) { return Max(a, Max(b, args...)); } /// @brief: Free the memory space which is newed previously. /// @param: ptr(Input), a pointer to memory space. Can't be NULL. /// @return: void. struct DeleteObject { template void operator()(const T* ptr) const { delete ptr; } }; /// @brief: Checks if a value is power of two, if it is, return true. Be careful /// when passing 0. /// @param: val(Input), the data to be checked. /// @return: bool. template static __forceinline bool IsPowerOfTwo(T val) { return (val & (val - 1)) == 0; } /// @brief: Calculates the floor value aligned based on parameter of alignment. /// If value is at the boundary of alignment, it is unchanged. /// @param: value(Input), value to be calculated. /// @param: alignment(Input), alignment value. /// @return: T. template static __forceinline T AlignDown(T value, size_t alignment) { assert(IsPowerOfTwo(alignment)); return (T)(value & ~(alignment - 1)); } /// @brief: Same as previous one, but first parameter becomes pointer, for more /// info, see the previous desciption. /// @param: value(Input), pointer to type T. /// @param: alignment(Input), alignment value. /// @return: T*, pointer to type T. template static __forceinline T* AlignDown(T* value, size_t alignment) { return (T*)AlignDown((intptr_t)value, alignment); } /// @brief: Calculates the ceiling value aligned based on parameter of /// alignment. /// If value is at the boundary of alignment, it is unchanged. /// @param: value(Input), value to be calculated. /// @param: alignment(Input), alignment value. /// @param: T. template static __forceinline T AlignUp(T value, size_t alignment) { return AlignDown((T)(value + alignment - 1), alignment); } /// @brief: Same as previous one, but first parameter becomes pointer, for more /// info, see the previous desciption. /// @param: value(Input), pointer to type T. /// @param: alignment(Input), alignment value. /// @return: T*, pointer to type T. template static __forceinline T* AlignUp(T* value, size_t alignment) { return (T*)AlignDown((intptr_t)((uint8_t*)value + alignment - 1), alignment); } /// @brief: Checks if the input value is at the boundary of alignment, if it is, /// @return true. /// @param: value(Input), value to be checked. /// @param: alignment(Input), alignment value. /// @return: bool. template static __forceinline bool IsMultipleOf(T value, size_t alignment) { return (AlignUp(value, alignment) == value); } /// @brief: Same as previous one, but first parameter becomes pointer, for more /// info, see the previous desciption. /// @param: value(Input), pointer to type T. /// @param: alignment(Input), alignment value. /// @return: bool. template static __forceinline bool IsMultipleOf(T* value, size_t alignment) { return (AlignUp(value, alignment) == value); } static __forceinline uint32_t NextPow2(uint32_t value) { if (value == 0) return 1; uint32_t v = value - 1; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; return v + 1; } static __forceinline uint64_t NextPow2(uint64_t value) { if (value == 0) return 1; uint64_t v = value - 1; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; v |= v >> 32; return v + 1; } static __forceinline bool strIsEmpty(const char* str) noexcept { return str[0] == '\0'; } static __forceinline std::string& ltrim(std::string& s) { auto it = std::find_if(s.begin(), s.end(), [](char c) { return !std::isspace(c, std::locale::classic()); }); s.erase(s.begin(), it); return s; } static __forceinline std::string& rtrim(std::string& s) { auto it = std::find_if(s.rbegin(), s.rend(), [](char c) { return !std::isspace(c, std::locale::classic()); }); s.erase(it.base(), s.end()); return s; } static __forceinline std::string& trim(std::string& s) { return ltrim(rtrim(s)); } } // namespace rocr template static __forceinline uint32_t BitSelect(T p) { static_assert(sizeof(T) <= sizeof(uintptr_t), "Type out of range."); static_assert(highBit < sizeof(uintptr_t) * 8, "Bit index out of range."); uintptr_t ptr = p; if (highBit != (sizeof(uintptr_t) * 8 - 1)) return (uint32_t)((ptr & ((1ull << (highBit + 1)) - 1)) >> lowBit); else return (uint32_t)(ptr >> lowBit); } inline uint32_t PtrLow16Shift8(const void* p) { uintptr_t ptr = reinterpret_cast(p); return (uint32_t)((ptr & 0xFFFFULL) >> 8); } inline uint32_t PtrHigh64Shift16(const void* p) { uintptr_t ptr = reinterpret_cast(p); return (uint32_t)((ptr & 0xFFFFFFFFFFFF0000ULL) >> 16); } inline uint32_t PtrLow40Shift8(const void* p) { uintptr_t ptr = reinterpret_cast(p); return (uint32_t)((ptr & 0xFFFFFFFFFFULL) >> 8); } inline uint32_t PtrHigh64Shift40(const void* p) { uintptr_t ptr = reinterpret_cast(p); return (uint32_t)((ptr & 0xFFFFFF0000000000ULL) >> 40); } inline uint32_t PtrLow32(const void* p) { return static_cast(reinterpret_cast(p)); } inline uint32_t PtrHigh32(const void* p) { uint32_t ptr = 0; #ifdef HSA_LARGE_MODEL ptr = static_cast(reinterpret_cast(p) >> 32); #endif return ptr; } #include "atomic_helpers.h" #endif // HSA_RUNTIME_CORE_UTIL_UTILS_H_ ROCR-Runtime-rocm-5.0.0/src/core/util/win/000077500000000000000000000000001420110115200201145ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/core/util/win/os_win.cpp000066400000000000000000000175031420110115200221240ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifdef _WIN32 // Are we compiling for windows? #define NOMINMAX #include "core/util/os.h" #include #include #include #include #include #include #include #undef Yield #undef CreateMutex namespace rocr { namespace os { static_assert(sizeof(LibHandle) == sizeof(HMODULE), "OS abstraction size mismatch"); static_assert(sizeof(LibHandle) == sizeof(::HANDLE), "OS abstraction size mismatch"); static_assert(sizeof(Mutex) == sizeof(::HANDLE), "OS abstraction size mismatch"); static_assert(sizeof(Thread) == sizeof(::HANDLE), "OS abstraction size mismatch"); static_assert(sizeof(EventHandle) == sizeof(::HANDLE), "OS abstraction size mismatch"); LibHandle LoadLib(std::string filename) { HMODULE ret = LoadLibrary(filename.c_str()); return *(LibHandle*)&ret; } void* GetExportAddress(LibHandle lib, std::string export_name) { return GetProcAddress(*(HMODULE*)&lib, export_name.c_str()); } void CloseLib(LibHandle lib) { FreeLibrary(*(::HMODULE*)&lib); } Mutex CreateMutex() { return CreateEvent(NULL, false, true, NULL); } bool TryAcquireMutex(Mutex lock) { return WaitForSingleObject(*(::HANDLE*)&lock, 0) == WAIT_OBJECT_0; } bool AcquireMutex(Mutex lock) { return WaitForSingleObject(*(::HANDLE*)&lock, INFINITE) == WAIT_OBJECT_0; } void ReleaseMutex(Mutex lock) { SetEvent(*(::HANDLE*)&lock); } void DestroyMutex(Mutex lock) { CloseHandle(*(::HANDLE*)&lock); } void Sleep(int delay_in_millisecond) { ::Sleep(delay_in_millisecond); } void uSleep(int delayInUs) { ::Sleep(delayInUs / 1000); } void YieldThread() { ::Sleep(0); } struct ThreadArgs { void* entry_args; ThreadEntry entry_function; }; unsigned __stdcall ThreadTrampoline(void* arg) { ThreadArgs* thread_args = (ThreadArgs*)arg; ThreadEntry entry = thread_args->entry_function; void* data = thread_args->entry_args; delete thread_args; entry(data); _endthreadex(0); return 0; } Thread CreateThread(ThreadEntry entry_function, void* entry_argument, uint stack_size) { ThreadArgs* thread_args = new ThreadArgs(); thread_args->entry_args = entry_argument; thread_args->entry_function = entry_function; uintptr_t ret = _beginthreadex(NULL, stack_size, ThreadTrampoline, thread_args, 0, NULL); return *(Thread*)&ret; } void CloseThread(Thread thread) { CloseHandle(*(::HANDLE*)&thread); } bool WaitForThread(Thread thread) { return WaitForSingleObject(*(::HANDLE*)&thread, INFINITE) == WAIT_OBJECT_0; } bool WaitForAllThreads(Thread* threads, uint thread_count) { return WaitForMultipleObjects(thread_count, threads, TRUE, INFINITE) == WAIT_OBJECT_0; } void SetEnvVar(std::string env_var_name, std::string env_var_value) { SetEnvironmentVariable(env_var_name.c_str(), env_var_value.c_str()); } std::string GetEnvVar(std::string env_var_name) { char* buff; DWORD char_count = GetEnvironmentVariable(env_var_name.c_str(), NULL, 0); if (char_count == 0) return ""; buff = (char*)alloca(sizeof(char) * char_count); GetEnvironmentVariable(env_var_name.c_str(), buff, char_count); buff[char_count - 1] = '\0'; std::string ret = buff; return ret; } size_t GetUserModeVirtualMemorySize() { SYSTEM_INFO system_info = {0}; GetSystemInfo(&system_info); return ((size_t)system_info.lpMaximumApplicationAddress + 1); } size_t GetUsablePhysicalHostMemorySize() { MEMORYSTATUSEX memory_status = {0}; memory_status.dwLength = sizeof(memory_status); if (GlobalMemoryStatusEx(&memory_status) == 0) { return 0; } const size_t physical_size = static_cast(memory_status.ullTotalPhys); return std::min(GetUserModeVirtualMemorySize(), physical_size); } uintptr_t GetUserModeVirtualMemoryBase() { return (uintptr_t)0; } // Os event wrappers EventHandle CreateOsEvent(bool auto_reset, bool init_state) { EventHandle evt = reinterpret_cast( CreateEvent(NULL, (BOOL)(!auto_reset), (BOOL)init_state, NULL)); return evt; } int DestroyOsEvent(EventHandle event) { if (event == NULL) { return -1; } return CloseHandle(reinterpret_cast<::HANDLE>(event)); } int WaitForOsEvent(EventHandle event, unsigned int milli_seconds) { if (event == NULL) { return -1; } int ret_code = WaitForSingleObject(reinterpret_cast<::HANDLE>(event), milli_seconds); if (ret_code == WAIT_TIMEOUT) { ret_code = 0x14003; // 0x14003 indicates timeout } return ret_code; } int SetOsEvent(EventHandle event) { if (event == NULL) { return -1; } return SetEvent(reinterpret_cast<::HANDLE>(event)); } int ResetOsEvent(EventHandle event) { if (event == NULL) { return -1; } return ResetEvent(reinterpret_cast<::HANDLE>(event)); } uint64_t ReadAccurateClock() { uint64_t ret; QueryPerformanceCounter((LARGE_INTEGER*)&ret); return ret; } uint64_t AccurateClockFrequency() { uint64_t ret; QueryPerformanceFrequency((LARGE_INTEGER*)&ret); return ret; } SharedMutex CreateSharedMutex() { assert(false && "Not implemented."); abort(); return nullptr; } bool TryAcquireSharedMutex(SharedMutex lock) { assert(false && "Not implemented."); abort(); return false; } bool AcquireSharedMutex(SharedMutex lock) { assert(false && "Not implemented."); abort(); return false; } void ReleaseSharedMutex(SharedMutex lock) { assert(false && "Not implemented."); abort(); } bool TrySharedAcquireSharedMutex(SharedMutex lock) { assert(false && "Not implemented."); abort(); return false; } bool SharedAcquireSharedMutex(SharedMutex lock) { assert(false && "Not implemented."); abort(); return false; } void SharedReleaseSharedMutex(SharedMutex lock) { assert(false && "Not implemented."); abort(); } void DestroySharedMutex(SharedMutex lock) { assert(false && "Not implemented."); abort(); } } // namespace os } // namespace rocr #endif ROCR-Runtime-rocm-5.0.0/src/hsa-runtime64-config.cmake.in000066400000000000000000000046761420110115200227070ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2020-2021, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ @PACKAGE_INIT@ include( CMakeFindDependencyMacro ) # Client apps only need our private dependencies if rocr is a static lib. set( _is_hsa_runtime_dynamic @BUILD_SHARED_LIBS@ ) if( NOT _is_hsa_runtime_dynamic ) set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_LIST_DIR}") find_dependency(hsakmt 1.0) find_dependency(LibElf) endif() include( "${CMAKE_CURRENT_LIST_DIR}/@CORE_RUNTIME_NAME@Targets.cmake" ) check_required_components(@CORE_RUNTIME_NAME@) ROCR-Runtime-rocm-5.0.0/src/hsacore.so.def000066400000000000000000000151071420110115200201420ustar00rootroot00000000000000ROCR_1 { global: hsa_init; hsa_shut_down; hsa_system_get_info; hsa_extension_get_name; hsa_system_extension_supported; hsa_system_major_extension_supported; hsa_system_get_extension_table; hsa_system_get_major_extension_table; hsa_iterate_agents; hsa_agent_get_info; hsa_agent_get_exception_policies; hsa_cache_get_info; hsa_agent_iterate_caches; hsa_agent_extension_supported; hsa_agent_major_extension_supported; hsa_queue_create; hsa_soft_queue_create; hsa_queue_destroy; hsa_queue_inactivate; hsa_queue_load_read_index_scacquire; hsa_queue_load_read_index_acquire; hsa_queue_load_read_index_relaxed; hsa_queue_load_write_index_scacquire; hsa_queue_load_write_index_acquire; hsa_queue_load_write_index_relaxed; hsa_queue_store_write_index_relaxed; hsa_queue_store_write_index_screlease; hsa_queue_store_write_index_release; hsa_queue_cas_write_index_scacq_screl; hsa_queue_cas_write_index_acq_rel; hsa_queue_cas_write_index_scacquire; hsa_queue_cas_write_index_acquire; hsa_queue_cas_write_index_relaxed; hsa_queue_cas_write_index_screlease; hsa_queue_cas_write_index_release; hsa_queue_add_write_index_scacq_screl; hsa_queue_add_write_index_acq_rel; hsa_queue_add_write_index_scacquire; hsa_queue_add_write_index_acquire; hsa_queue_add_write_index_relaxed; hsa_queue_add_write_index_screlease; hsa_queue_add_write_index_release; hsa_queue_store_read_index_relaxed; hsa_queue_store_read_index_screlease; hsa_queue_store_read_index_release; hsa_agent_iterate_regions; hsa_region_get_info; hsa_memory_register; hsa_memory_deregister; hsa_memory_allocate; hsa_memory_free; hsa_memory_copy; hsa_memory_assign_agent; hsa_signal_create; hsa_signal_destroy; hsa_signal_load_relaxed; hsa_signal_load_scacquire; hsa_signal_load_acquire; hsa_signal_store_relaxed; hsa_signal_store_screlease; hsa_signal_store_release; hsa_signal_silent_store_relaxed; hsa_signal_silent_store_screlease; hsa_signal_wait_relaxed; hsa_signal_wait_scacquire; hsa_signal_wait_acquire; hsa_signal_group_create; hsa_signal_group_destroy; hsa_signal_group_wait_any_scacquire; hsa_signal_group_wait_any_relaxed; hsa_signal_and_relaxed; hsa_signal_and_scacquire; hsa_signal_and_acquire; hsa_signal_and_screlease; hsa_signal_and_release; hsa_signal_and_scacq_screl; hsa_signal_and_acq_rel; hsa_signal_or_relaxed; hsa_signal_or_scacquire; hsa_signal_or_acquire; hsa_signal_or_screlease; hsa_signal_or_release; hsa_signal_or_scacq_screl; hsa_signal_or_acq_rel; hsa_signal_xor_relaxed; hsa_signal_xor_scacquire; hsa_signal_xor_acquire; hsa_signal_xor_screlease; hsa_signal_xor_release; hsa_signal_xor_scacq_screl; hsa_signal_xor_acq_rel; hsa_signal_exchange_relaxed; hsa_signal_exchange_scacquire; hsa_signal_exchange_acquire; hsa_signal_exchange_screlease; hsa_signal_exchange_release; hsa_signal_exchange_scacq_screl; hsa_signal_exchange_acq_rel; hsa_signal_add_relaxed; hsa_signal_add_scacquire; hsa_signal_add_acquire; hsa_signal_add_screlease; hsa_signal_add_release; hsa_signal_add_scacq_screl; hsa_signal_add_acq_rel; hsa_signal_subtract_relaxed; hsa_signal_subtract_scacquire; hsa_signal_subtract_acquire; hsa_signal_subtract_screlease; hsa_signal_subtract_release; hsa_signal_subtract_scacq_screl; hsa_signal_subtract_acq_rel; hsa_signal_cas_relaxed; hsa_signal_cas_scacquire; hsa_signal_cas_acquire; hsa_signal_cas_screlease; hsa_signal_cas_release; hsa_signal_cas_scacq_screl; hsa_signal_cas_acq_rel; hsa_isa_from_name; hsa_agent_iterate_isas; hsa_isa_get_info; hsa_isa_get_info_alt; hsa_isa_get_exception_policies; hsa_isa_get_round_method; hsa_wavefront_get_info; hsa_isa_iterate_wavefronts; hsa_isa_compatible; hsa_code_object_serialize; hsa_code_object_deserialize; hsa_code_object_destroy; hsa_code_object_get_info; hsa_code_object_get_symbol; hsa_code_object_get_symbol_from_name; hsa_code_symbol_get_info; hsa_code_object_iterate_symbols; hsa_code_object_reader_create_from_file; hsa_code_object_reader_create_from_memory; hsa_code_object_reader_destroy; hsa_executable_create; hsa_executable_create_alt; hsa_executable_destroy; hsa_executable_load_code_object; hsa_executable_load_program_code_object; hsa_executable_load_agent_code_object; hsa_executable_freeze; hsa_executable_get_info; hsa_executable_global_variable_define; hsa_executable_agent_global_variable_define; hsa_executable_readonly_variable_define; hsa_executable_validate; hsa_executable_validate_alt; hsa_executable_get_symbol; hsa_executable_get_symbol_by_name; hsa_executable_symbol_get_info; hsa_executable_iterate_symbols; hsa_executable_iterate_agent_symbols; hsa_executable_iterate_program_symbols; hsa_status_string; hsa_ext_program_create; hsa_ext_program_destroy; hsa_ext_program_add_module; hsa_ext_program_iterate_modules; hsa_ext_program_get_info; hsa_ext_program_finalize; hsa_amd_coherency_get_type; hsa_amd_coherency_set_type; hsa_amd_profiling_set_profiler_enabled; hsa_amd_profiling_get_dispatch_time; hsa_amd_profiling_async_copy_enable; hsa_amd_profiling_get_async_copy_time; hsa_amd_profiling_convert_tick_to_system_domain; hsa_amd_signal_create; hsa_amd_signal_wait_any; hsa_amd_signal_async_handler; hsa_amd_async_function; hsa_amd_image_get_info_max_dim; hsa_amd_queue_cu_set_mask; hsa_amd_queue_cu_get_mask; hsa_amd_memory_fill; hsa_amd_memory_async_copy; hsa_amd_memory_async_copy_rect; hsa_amd_memory_lock; hsa_amd_memory_lock_to_pool; hsa_amd_memory_unlock; hsa_amd_agent_iterate_memory_pools; hsa_amd_agent_memory_pool_get_info; hsa_amd_agents_allow_access; hsa_amd_memory_pool_get_info; hsa_amd_memory_pool_allocate; hsa_amd_memory_pool_free; hsa_amd_memory_pool_can_migrate; hsa_amd_memory_migrate; hsa_amd_interop_map_buffer; hsa_amd_interop_unmap_buffer; hsa_amd_image_create; hsa_ext_image_get_capability; hsa_ext_image_data_get_info; hsa_ext_image_create; hsa_ext_image_import; hsa_ext_image_export; hsa_ext_image_copy; hsa_ext_image_clear; hsa_ext_image_destroy; hsa_ext_sampler_create; hsa_ext_sampler_destroy; hsa_ext_image_get_capability_with_layout; hsa_ext_image_data_get_info_with_layout; hsa_ext_image_create_with_layout; hsa_amd_pointer_info; hsa_amd_pointer_info_set_userdata; hsa_amd_ipc_memory_create; hsa_amd_ipc_memory_attach; hsa_amd_ipc_memory_detach; hsa_amd_ipc_signal_create; hsa_amd_ipc_signal_attach; hsa_amd_register_system_event_handler; hsa_amd_queue_set_priority; hsa_amd_register_deallocation_callback; hsa_amd_deregister_deallocation_callback; hsa_amd_signal_value_pointer; _amdgpu_r_debug; hsa_amd_svm_attributes_set; hsa_amd_svm_attributes_get; hsa_amd_svm_prefetch_async; local: *; }; ROCR-Runtime-rocm-5.0.0/src/hsacore.so.link000066400000000000000000000037721420110115200203460ustar00rootroot00000000000000hsa_queue_load_read_index_acquire = hsa_queue_load_read_index_scacquire; hsa_queue_load_write_index_acquire = hsa_queue_load_write_index_scacquire; hsa_queue_store_write_index_release = hsa_queue_store_write_index_screlease; hsa_queue_cas_write_index_acq_rel = hsa_queue_cas_write_index_scacq_screl; hsa_queue_cas_write_index_acquire = hsa_queue_cas_write_index_scacquire; hsa_queue_cas_write_index_release = hsa_queue_cas_write_index_screlease; hsa_queue_add_write_index_acq_rel = hsa_queue_add_write_index_scacq_screl; hsa_queue_add_write_index_acquire = hsa_queue_add_write_index_scacquire; hsa_queue_add_write_index_release = hsa_queue_add_write_index_screlease; hsa_queue_store_read_index_release = hsa_queue_store_read_index_screlease; hsa_signal_load_acquire = hsa_signal_load_scacquire; hsa_signal_store_release = hsa_signal_store_screlease; hsa_signal_wait_acquire = hsa_signal_wait_scacquire; hsa_signal_and_acquire = hsa_signal_and_scacquire; hsa_signal_and_release = hsa_signal_and_screlease; hsa_signal_and_acq_rel = hsa_signal_and_scacq_screl; hsa_signal_or_acquire = hsa_signal_or_scacquire; hsa_signal_or_release = hsa_signal_or_screlease; hsa_signal_or_acq_rel = hsa_signal_or_scacq_screl; hsa_signal_xor_acquire = hsa_signal_xor_scacquire; hsa_signal_xor_release = hsa_signal_xor_screlease; hsa_signal_xor_acq_rel = hsa_signal_xor_scacq_screl; hsa_signal_exchange_acquire = hsa_signal_exchange_scacquire; hsa_signal_exchange_release = hsa_signal_exchange_screlease; hsa_signal_exchange_acq_rel = hsa_signal_exchange_scacq_screl; hsa_signal_add_acquire = hsa_signal_add_scacquire; hsa_signal_add_release = hsa_signal_add_screlease; hsa_signal_add_acq_rel = hsa_signal_add_scacq_screl; hsa_signal_subtract_acquire = hsa_signal_subtract_scacquire; hsa_signal_subtract_release = hsa_signal_subtract_screlease; hsa_signal_subtract_acq_rel = hsa_signal_subtract_scacq_screl; hsa_signal_cas_acquire = hsa_signal_cas_scacquire; hsa_signal_cas_release = hsa_signal_cas_screlease; hsa_signal_cas_acq_rel = hsa_signal_cas_scacq_screl; ROCR-Runtime-rocm-5.0.0/src/image/000077500000000000000000000000001420110115200164745ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/000077500000000000000000000000001420110115200200755ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/inc/000077500000000000000000000000001420110115200206465ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/inc/addrinterface.h000066400000000000000000005110661420110115200236230ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrinterface.h * @brief Contains the addrlib interfaces declaration and parameter defines **************************************************************************************************** */ #ifndef __ADDR_INTERFACE_H__ #define __ADDR_INTERFACE_H__ #include "addrtypes.h" namespace rocr { #define ADDRLIB_VERSION_MAJOR 6 #define ADDRLIB_VERSION_MINOR 2 #define ADDRLIB_VERSION ((ADDRLIB_VERSION_MAJOR << 16) | ADDRLIB_VERSION_MINOR) /// Virtually all interface functions need ADDR_HANDLE as first parameter typedef VOID* ADDR_HANDLE; /// Client handle used in callbacks typedef VOID* ADDR_CLIENT_HANDLE; /** * ///////////////////////////////////////////////////////////////////////////////////////////////// * // Callback functions * ///////////////////////////////////////////////////////////////////////////////////////////////// * typedef VOID* (ADDR_API* ADDR_ALLOCSYSMEM)( * const ADDR_ALLOCSYSMEM_INPUT* pInput); * typedef ADDR_E_RETURNCODE (ADDR_API* ADDR_FREESYSMEM)( * VOID* pVirtAddr); * typedef ADDR_E_RETURNCODE (ADDR_API* ADDR_DEBUGPRINT)( * const ADDR_DEBUGPRINT_INPUT* pInput); * * ///////////////////////////////////////////////////////////////////////////////////////////////// * // Create/Destroy/Config functions * ///////////////////////////////////////////////////////////////////////////////////////////////// * AddrCreate() * AddrDestroy() * * ///////////////////////////////////////////////////////////////////////////////////////////////// * // Surface functions * ///////////////////////////////////////////////////////////////////////////////////////////////// * AddrComputeSurfaceInfo() * AddrComputeSurfaceAddrFromCoord() * AddrComputeSurfaceCoordFromAddr() * * ///////////////////////////////////////////////////////////////////////////////////////////////// * // HTile functions * ///////////////////////////////////////////////////////////////////////////////////////////////// * AddrComputeHtileInfo() * AddrComputeHtileAddrFromCoord() * AddrComputeHtileCoordFromAddr() * * ///////////////////////////////////////////////////////////////////////////////////////////////// * // C-mask functions * ///////////////////////////////////////////////////////////////////////////////////////////////// * AddrComputeCmaskInfo() * AddrComputeCmaskAddrFromCoord() * AddrComputeCmaskCoordFromAddr() * * ///////////////////////////////////////////////////////////////////////////////////////////////// * // F-mask functions * ///////////////////////////////////////////////////////////////////////////////////////////////// * AddrComputeFmaskInfo() * AddrComputeFmaskAddrFromCoord() * AddrComputeFmaskCoordFromAddr() * * ///////////////////////////////////////////////////////////////////////////////////////////////// * // Element/Utility functions * ///////////////////////////////////////////////////////////////////////////////////////////////// * ElemFlt32ToDepthPixel() * ElemFlt32ToColorPixel() * AddrExtractBankPipeSwizzle() * AddrCombineBankPipeSwizzle() * AddrComputeSliceSwizzle() * AddrConvertTileInfoToHW() * AddrConvertTileIndex() * AddrConvertTileIndex1() * AddrGetTileIndex() * AddrComputeBaseSwizzle() * AddrUseTileIndex() * AddrUseCombinedSwizzle() * **/ //////////////////////////////////////////////////////////////////////////////////////////////////// // Callback functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * @brief channel setting structure **************************************************************************************************** */ typedef union _ADDR_CHANNEL_SETTING { struct { UINT_8 valid : 1; ///< Indicate whehter this channel setting is valid UINT_8 channel : 2; ///< 0 for x channel, 1 for y channel, 2 for z channel UINT_8 index : 5; ///< Channel index }; UINT_8 value; ///< Value } ADDR_CHANNEL_SETTING; /** **************************************************************************************************** * @brief address equation key structure **************************************************************************************************** */ typedef union _ADDR_EQUATION_KEY { struct { UINT_32 log2ElementBytes : 3; ///< Log2 of Bytes per pixel UINT_32 tileMode : 5; ///< Tile mode UINT_32 microTileType : 3; ///< Micro tile type UINT_32 pipeConfig : 5; ///< pipe config UINT_32 numBanksLog2 : 3; ///< Number of banks log2 UINT_32 bankWidth : 4; ///< Bank width UINT_32 bankHeight : 4; ///< Bank height UINT_32 macroAspectRatio : 3; ///< Macro tile aspect ratio UINT_32 prt : 1; ///< SI only, indicate whether this equation is for prt UINT_32 reserved : 1; ///< Reserved bit } fields; UINT_32 value; } ADDR_EQUATION_KEY; /** **************************************************************************************************** * @brief address equation structure **************************************************************************************************** */ #define ADDR_MAX_EQUATION_BIT 20u // Invalid equation index #define ADDR_INVALID_EQUATION_INDEX 0xFFFFFFFF typedef struct _ADDR_EQUATION { ADDR_CHANNEL_SETTING addr[ADDR_MAX_EQUATION_BIT]; ///< addr setting ///< each bit is result of addr ^ xor ^ xor2 ADDR_CHANNEL_SETTING xor1[ADDR_MAX_EQUATION_BIT]; ///< xor setting ADDR_CHANNEL_SETTING xor2[ADDR_MAX_EQUATION_BIT]; ///< xor2 setting UINT_32 numBits; ///< The number of bits in equation BOOL_32 stackedDepthSlices; ///< TRUE if depth slices are treated as being ///< stacked vertically prior to swizzling } ADDR_EQUATION; /** **************************************************************************************************** * @brief Alloc system memory flags. * @note These flags are reserved for future use and if flags are added will minimize the impact * of the client. **************************************************************************************************** */ typedef union _ADDR_ALLOCSYSMEM_FLAGS { struct { UINT_32 reserved : 32; ///< Reserved for future use. } fields; UINT_32 value; } ADDR_ALLOCSYSMEM_FLAGS; /** **************************************************************************************************** * @brief Alloc system memory input structure **************************************************************************************************** */ typedef struct _ADDR_ALLOCSYSMEM_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR_ALLOCSYSMEM_FLAGS flags; ///< System memory flags. UINT_32 sizeInBytes; ///< System memory allocation size in bytes. ADDR_CLIENT_HANDLE hClient; ///< Client handle } ADDR_ALLOCSYSMEM_INPUT; /** **************************************************************************************************** * ADDR_ALLOCSYSMEM * @brief * Allocate system memory callback function. Returns valid pointer on success. **************************************************************************************************** */ typedef VOID* (ADDR_API* ADDR_ALLOCSYSMEM)( const ADDR_ALLOCSYSMEM_INPUT* pInput); /** **************************************************************************************************** * @brief Free system memory input structure **************************************************************************************************** */ typedef struct _ADDR_FREESYSMEM_INPUT { UINT_32 size; ///< Size of this structure in bytes VOID* pVirtAddr; ///< Virtual address ADDR_CLIENT_HANDLE hClient; ///< Client handle } ADDR_FREESYSMEM_INPUT; /** **************************************************************************************************** * ADDR_FREESYSMEM * @brief * Free system memory callback function. * Returns ADDR_OK on success. **************************************************************************************************** */ typedef ADDR_E_RETURNCODE (ADDR_API* ADDR_FREESYSMEM)( const ADDR_FREESYSMEM_INPUT* pInput); /** **************************************************************************************************** * @brief Print debug message input structure **************************************************************************************************** */ typedef struct _ADDR_DEBUGPRINT_INPUT { UINT_32 size; ///< Size of this structure in bytes CHAR* pDebugString; ///< Debug print string va_list ap; ///< Variable argument list ADDR_CLIENT_HANDLE hClient; ///< Client handle } ADDR_DEBUGPRINT_INPUT; /** **************************************************************************************************** * ADDR_DEBUGPRINT * @brief * Print debug message callback function. * Returns ADDR_OK on success. **************************************************************************************************** */ typedef ADDR_E_RETURNCODE (ADDR_API* ADDR_DEBUGPRINT)( const ADDR_DEBUGPRINT_INPUT* pInput); /** **************************************************************************************************** * ADDR_CALLBACKS * * @brief * Address Library needs client to provide system memory alloc/free routines. **************************************************************************************************** */ typedef struct _ADDR_CALLBACKS { ADDR_ALLOCSYSMEM allocSysMem; ///< Routine to allocate system memory ADDR_FREESYSMEM freeSysMem; ///< Routine to free system memory ADDR_DEBUGPRINT debugPrint; ///< Routine to print debug message } ADDR_CALLBACKS; //////////////////////////////////////////////////////////////////////////////////////////////////// // Create/Destroy functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR_CREATE_FLAGS * * @brief * This structure is used to pass some setup in creation of AddrLib * @note **************************************************************************************************** */ typedef union _ADDR_CREATE_FLAGS { struct { UINT_32 noCubeMipSlicesPad : 1; ///< Turn cubemap faces padding off UINT_32 fillSizeFields : 1; ///< If clients fill size fields in all input and /// output structure UINT_32 useTileIndex : 1; ///< Make tileIndex field in input valid UINT_32 useCombinedSwizzle : 1; ///< Use combined tile swizzle UINT_32 checkLast2DLevel : 1; ///< Check the last 2D mip sub level UINT_32 useHtileSliceAlign : 1; ///< Do htile single slice alignment UINT_32 allowLargeThickTile : 1; ///< Allow 64*thickness*bytesPerPixel > rowSize UINT_32 forceDccAndTcCompat : 1; ///< Force enable DCC and TC compatibility UINT_32 nonPower2MemConfig : 1; ///< Physical video memory size is not power of 2 UINT_32 reserved : 23; ///< Reserved bits for future use }; UINT_32 value; } ADDR_CREATE_FLAGS; /** **************************************************************************************************** * ADDR_REGISTER_VALUE * * @brief * Data from registers to setup AddrLib global data, used in AddrCreate **************************************************************************************************** */ typedef struct _ADDR_REGISTER_VALUE { UINT_32 gbAddrConfig; ///< For R8xx, use GB_ADDR_CONFIG register value. /// For R6xx/R7xx, use GB_TILING_CONFIG. /// But they can be treated as the same. /// if this value is 0, use chip to set default value UINT_32 backendDisables; ///< 1 bit per backend, starting with LSB. 1=disabled,0=enabled. /// Register value of CC_RB_BACKEND_DISABLE.BACKEND_DISABLE /// R800 registers----------------------------------------------- UINT_32 noOfBanks; ///< Number of h/w ram banks - For r800: MC_ARB_RAMCFG.NOOFBANK /// No enums for this value in h/w header files /// 0: 4 /// 1: 8 /// 2: 16 UINT_32 noOfRanks; /// MC_ARB_RAMCFG.NOOFRANK /// 0: 1 /// 1: 2 /// SI (R1000) registers----------------------------------------- const UINT_32* pTileConfig; ///< Global tile setting tables UINT_32 noOfEntries; ///< Number of entries in pTileConfig ///< CI registers------------------------------------------------- const UINT_32* pMacroTileConfig; ///< Global macro tile mode table UINT_32 noOfMacroEntries; ///< Number of entries in pMacroTileConfig } ADDR_REGISTER_VALUE; /** **************************************************************************************************** * ADDR_CREATE_INPUT * * @brief * Parameters use to create an AddrLib Object. Caller must provide all fields. * **************************************************************************************************** */ typedef struct _ADDR_CREATE_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 chipEngine; ///< Chip Engine UINT_32 chipFamily; ///< Chip Family UINT_32 chipRevision; ///< Chip Revision ADDR_CALLBACKS callbacks; ///< Callbacks for sysmem alloc/free/print ADDR_CREATE_FLAGS createFlags; ///< Flags to setup AddrLib ADDR_REGISTER_VALUE regValue; ///< Data from registers to setup AddrLib global data ADDR_CLIENT_HANDLE hClient; ///< Client handle UINT_32 minPitchAlignPixels; ///< Minimum pitch alignment in pixels } ADDR_CREATE_INPUT; /** **************************************************************************************************** * ADDR_CREATEINFO_OUTPUT * * @brief * Return AddrLib handle to client driver * **************************************************************************************************** */ typedef struct _ADDR_CREATE_OUTPUT { UINT_32 size; ///< Size of this structure in bytes ADDR_HANDLE hLib; ///< Address lib handle UINT_32 numEquations; ///< Number of equations in the table const ADDR_EQUATION* pEquationTable; ///< Pointer to the equation table } ADDR_CREATE_OUTPUT; /** **************************************************************************************************** * AddrCreate * * @brief * Create AddrLib object, must be called before any interface calls * * @return * ADDR_OK if successful **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrCreate( const ADDR_CREATE_INPUT* pAddrCreateIn, ADDR_CREATE_OUTPUT* pAddrCreateOut); /** **************************************************************************************************** * AddrDestroy * * @brief * Destroy AddrLib object, must be called to free internally allocated resources. * * @return * ADDR_OK if successful **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrDestroy( ADDR_HANDLE hLib); //////////////////////////////////////////////////////////////////////////////////////////////////// // Surface functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * @brief * Bank/tiling parameters. On function input, these can be set as desired or * left 0 for AddrLib to calculate/default. On function output, these are the actual * parameters used. * @note * Valid bankWidth/bankHeight value: * 1,2,4,8. They are factors instead of pixels or bytes. * * The bank number remains constant across each row of the * macro tile as each pipe is selected, so the number of * tiles in the x direction with the same bank number will * be bank_width * num_pipes. **************************************************************************************************** */ typedef struct _ADDR_TILEINFO { /// Any of these parameters can be set to 0 to use the HW default. UINT_32 banks; ///< Number of banks, numerical value UINT_32 bankWidth; ///< Number of tiles in the X direction in the same bank UINT_32 bankHeight; ///< Number of tiles in the Y direction in the same bank UINT_32 macroAspectRatio; ///< Macro tile aspect ratio. 1-1:1, 2-4:1, 4-16:1, 8-64:1 UINT_32 tileSplitBytes; ///< Tile split size, in bytes AddrPipeCfg pipeConfig; ///< Pipe Config = HW enum + 1 } ADDR_TILEINFO; // Create a define to avoid client change. The removal of R800 is because we plan to implement SI // within 800 HWL - An AddrPipeCfg is added in above data structure typedef ADDR_TILEINFO ADDR_R800_TILEINFO; /** **************************************************************************************************** * @brief * Information needed by quad buffer stereo support **************************************************************************************************** */ typedef struct _ADDR_QBSTEREOINFO { UINT_32 eyeHeight; ///< Height (in pixel rows) to right eye UINT_32 rightOffset; ///< Offset (in bytes) to right eye UINT_32 rightSwizzle; ///< TileSwizzle for right eyes } ADDR_QBSTEREOINFO; /** **************************************************************************************************** * ADDR_SURFACE_FLAGS * * @brief * Surface flags **************************************************************************************************** */ typedef union _ADDR_SURFACE_FLAGS { struct { UINT_32 color : 1; ///< Flag indicates this is a color buffer UINT_32 depth : 1; ///< Flag indicates this is a depth/stencil buffer UINT_32 stencil : 1; ///< Flag indicates this is a stencil buffer UINT_32 texture : 1; ///< Flag indicates this is a texture UINT_32 cube : 1; ///< Flag indicates this is a cubemap UINT_32 volume : 1; ///< Flag indicates this is a volume texture UINT_32 fmask : 1; ///< Flag indicates this is an fmask UINT_32 cubeAsArray : 1; ///< Flag indicates if treat cubemap as arrays UINT_32 compressZ : 1; ///< Flag indicates z buffer is compressed UINT_32 overlay : 1; ///< Flag indicates this is an overlay surface UINT_32 noStencil : 1; ///< Flag indicates this depth has no separate stencil UINT_32 display : 1; ///< Flag indicates this should match display controller req. UINT_32 opt4Space : 1; ///< Flag indicates this surface should be optimized for space /// i.e. save some memory but may lose performance UINT_32 prt : 1; ///< Flag for partially resident texture UINT_32 qbStereo : 1; ///< Quad buffer stereo surface UINT_32 pow2Pad : 1; ///< SI: Pad to pow2, must set for mipmap (include level0) UINT_32 interleaved : 1; ///< Special flag for interleaved YUV surface padding UINT_32 tcCompatible : 1; ///< Flag indicates surface needs to be shader readable UINT_32 dispTileType : 1; ///< NI: force display Tiling for 128 bit shared resoruce UINT_32 dccCompatible : 1; ///< VI: whether to make MSAA surface support dcc fast clear UINT_32 dccPipeWorkaround : 1; ///< VI: whether to workaround the HW limit that /// dcc can't be enabled if pipe config of tile mode /// is different from that of ASIC, this flag /// is address lib internal flag, client should ignore it UINT_32 czDispCompatible : 1; ///< SI+: CZ family has a HW bug needs special alignment. /// This flag indicates we need to follow the /// alignment with CZ families or other ASICs under /// PX configuration + CZ. UINT_32 nonSplit : 1; ///< CI: depth texture should not be split UINT_32 disableLinearOpt : 1; ///< Disable tile mode optimization to linear UINT_32 needEquation : 1; ///< Make the surface tile setting equation compatible. /// This flag indicates we need to override tile /// mode to PRT_* tile mode to disable slice rotation, /// which is needed by swizzle pattern equation. UINT_32 skipIndicesOutput : 1; ///< Skipping indices in output. UINT_32 rotateDisplay : 1; ///< Rotate micro tile type UINT_32 minimizeAlignment : 1; ///< Minimize alignment UINT_32 preferEquation : 1; ///< Return equation index without adjusting tile mode UINT_32 matchStencilTileCfg : 1; ///< Select tile index of stencil as well as depth surface /// to make sure they share same tile config parameters UINT_32 disallowLargeThickDegrade : 1; ///< Disallow large thick tile degrade UINT_32 reserved : 1; ///< Reserved bits }; UINT_32 value; } ADDR_SURFACE_FLAGS; /** **************************************************************************************************** * ADDR_COMPUTE_SURFACE_INFO_INPUT * * @brief * Input structure for AddrComputeSurfaceInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_SURFACE_INFO_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrTileMode tileMode; ///< Tile mode AddrFormat format; ///< If format is set to valid one, bpp/width/height /// might be overwritten UINT_32 bpp; ///< Bits per pixel UINT_32 numSamples; ///< Number of samples UINT_32 width; ///< Width, in pixels UINT_32 height; ///< Height, in pixels UINT_32 numSlices; ///< Number of surface slices or depth UINT_32 slice; ///< Slice index UINT_32 mipLevel; ///< Current mipmap level UINT_32 numMipLevels; ///< Number of mips in mip chain ADDR_SURFACE_FLAGS flags; ///< Surface type flags UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA /// r800 and later HWL parameters // Needed by 2D tiling, for linear and 1D tiling, just keep them 0's ADDR_TILEINFO* pTileInfo; ///< 2D tile parameters. Set to 0 to default/calculate AddrTileType tileType; ///< Micro tiling type, not needed when tileIndex != -1 INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 UINT_32 basePitch; ///< Base level pitch in pixels, 0 means ignored, is a /// must for mip levels from SI+. /// Don't use pitch in blocks for compressed formats! UINT_32 maxBaseAlign; ///< Max base alignment request from client UINT_32 pitchAlign; ///< Pitch alignment request from client UINT_32 heightAlign; ///< Height alignment request from client } ADDR_COMPUTE_SURFACE_INFO_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_SURFACE_INFO_OUTPUT * * @brief * Output structure for AddrComputeSurfInfo * @note Element: AddrLib unit for computing. e.g. BCn: 4x4 blocks; R32B32B32: 32bit with 3x pitch Pixel: Original pixel **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_SURFACE_INFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch in elements (in blocks for compressed formats) UINT_32 height; ///< Height in elements (in blocks for compressed formats) UINT_32 depth; ///< Number of slice/depth UINT_64 surfSize; ///< Surface size in bytes AddrTileMode tileMode; ///< Actual tile mode. May differ from that in input UINT_32 baseAlign; ///< Base address alignment UINT_32 pitchAlign; ///< Pitch alignment, in elements UINT_32 heightAlign; ///< Height alignment, in elements UINT_32 depthAlign; ///< Depth alignment, aligned to thickness, for 3d texture UINT_32 bpp; ///< Bits per elements (e.g. blocks for BCn, 1/3 for 96bit) UINT_32 pixelPitch; ///< Pitch in original pixels UINT_32 pixelHeight; ///< Height in original pixels UINT_32 pixelBits; ///< Original bits per pixel, passed from input UINT_64 sliceSize; ///< Size of slice specified by input's slice /// The result is controlled by surface flags & createFlags /// By default this value equals to surfSize for volume UINT_32 pitchTileMax; ///< PITCH_TILE_MAX value for h/w register UINT_32 heightTileMax; ///< HEIGHT_TILE_MAX value for h/w register UINT_32 sliceTileMax; ///< SLICE_TILE_MAX value for h/w register UINT_32 numSamples; ///< Pass the effective numSamples processed in this call /// r800 and later HWL parameters ADDR_TILEINFO* pTileInfo; ///< Tile parameters used. Filled in if 0 on input AddrTileType tileType; ///< Micro tiling type, only valid when tileIndex != -1 INT_32 tileIndex; ///< Tile index, MAY be "downgraded" INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) /// Output flags struct { /// Special information to work around SI mipmap swizzle bug UBTS #317508 UINT_32 last2DLevel : 1; ///< TRUE if this is the last 2D(3D) tiled ///< Only meaningful when create flag checkLast2DLevel is set UINT_32 tcCompatible : 1; ///< If the surface can be shader compatible UINT_32 dccUnsupport : 1; ///< If the surface can support DCC compressed rendering UINT_32 prtTileIndex : 1; ///< SI only, indicate the returned tile index is for PRT ///< If address lib return true for mip 0, client should set prt flag ///< for child mips in subsequent compute surface info calls UINT_32 reserved :28; ///< Reserved bits }; UINT_32 equationIndex; ///< Equation index in the equation table; UINT_32 blockWidth; ///< Width in element inside one block(1D->Micro, 2D->Macro) UINT_32 blockHeight; ///< Height in element inside one block(1D->Micro, 2D->Macro) UINT_32 blockSlices; ///< Slice number inside one block(1D->Micro, 2D->Macro) /// Stereo info ADDR_QBSTEREOINFO* pStereoInfo;///< Stereo information, needed when .qbStereo flag is TRUE INT_32 stencilTileIdx; ///< stencil tile index output when matchStencilTileCfg was set } ADDR_COMPUTE_SURFACE_INFO_OUTPUT; /** **************************************************************************************************** * AddrComputeSurfaceInfo * * @brief * Compute surface width/height/depth/alignments and suitable tiling mode **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeSurfaceInfo( ADDR_HANDLE hLib, const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT * * @brief * Input structure for AddrComputeSurfaceAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Slice index UINT_32 sample; ///< Sample index, use fragment index for EQAA UINT_32 bpp; ///< Bits per pixel UINT_32 pitch; ///< Surface pitch, in pixels UINT_32 height; ///< Surface height, in pixels UINT_32 numSlices; ///< Surface depth UINT_32 numSamples; ///< Number of samples AddrTileMode tileMode; ///< Tile mode BOOL_32 isDepth; ///< TRUE if the surface uses depth sample ordering within /// micro tile. Textures can also choose depth sample order UINT_32 tileBase; ///< Base offset (in bits) inside micro tile which handles /// the case that components are stored separately UINT_32 compBits; ///< The component bits actually needed(for planar surface) UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA /// r800 and later HWL parameters // Used for 1D tiling above AddrTileType tileType; ///< See defintion of AddrTileType struct { UINT_32 ignoreSE : 1; ///< TRUE if shader engines are ignored. This is texture /// only flag. Only non-RT texture can set this to TRUE UINT_32 reserved :31; ///< Reserved for future use. }; // 2D tiling needs following structure ADDR_TILEINFO* pTileInfo; ///< 2D tile parameters. Client must provide all data INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 union { struct { UINT_32 bankSwizzle; ///< Bank swizzle UINT_32 pipeSwizzle; ///< Pipe swizzle }; UINT_32 tileSwizzle; ///< Combined swizzle, if useCombinedSwizzle is TRUE }; } ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for AddrComputeSurfaceAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Byte address UINT_32 bitPosition; ///< Bit position within surfaceAddr, 0-7. /// For surface bpp < 8, e.g. FMT_1. UINT_32 prtBlockIndex; ///< Index of a PRT tile (64K block) } ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * AddrComputeSurfaceAddrFromCoord * * @brief * Compute surface address from a given coordinate. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeSurfaceAddrFromCoord( ADDR_HANDLE hLib, const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT * * @brief * Input structure for AddrComputeSurfaceCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Address in bytes UINT_32 bitPosition; ///< Bit position in addr. 0-7. for surface bpp < 8, /// e.g. FMT_1; UINT_32 bpp; ///< Bits per pixel UINT_32 pitch; ///< Pitch, in pixels UINT_32 height; ///< Height in pixels UINT_32 numSlices; ///< Surface depth UINT_32 numSamples; ///< Number of samples AddrTileMode tileMode; ///< Tile mode BOOL_32 isDepth; ///< Surface uses depth sample ordering within micro tile. /// Note: Textures can choose depth sample order as well. UINT_32 tileBase; ///< Base offset (in bits) inside micro tile which handles /// the case that components are stored separately UINT_32 compBits; ///< The component bits actually needed(for planar surface) UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA /// r800 and later HWL parameters // Used for 1D tiling above AddrTileType tileType; ///< See defintion of AddrTileType struct { UINT_32 ignoreSE : 1; ///< TRUE if shader engines are ignored. This is texture /// only flag. Only non-RT texture can set this to TRUE UINT_32 reserved :31; ///< Reserved for future use. }; // 2D tiling needs following structure ADDR_TILEINFO* pTileInfo; ///< 2D tile parameters. Client must provide all data INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 union { struct { UINT_32 bankSwizzle; ///< Bank swizzle UINT_32 pipeSwizzle; ///< Pipe swizzle }; UINT_32 tileSwizzle; ///< Combined swizzle, if useCombinedSwizzle is TRUE }; } ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT * * @brief * Output structure for AddrComputeSurfaceCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Index of slices UINT_32 sample; ///< Index of samples, means fragment index for EQAA } ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT; /** **************************************************************************************************** * AddrComputeSurfaceCoordFromAddr * * @brief * Compute coordinate from a given surface address **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeSurfaceCoordFromAddr( ADDR_HANDLE hLib, const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // HTile functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR_HTILE_FLAGS * * @brief * HTILE flags **************************************************************************************************** */ typedef union _ADDR_HTILE_FLAGS { struct { UINT_32 tcCompatible : 1; ///< Flag indicates surface needs to be shader readable UINT_32 skipTcCompatSizeAlign : 1; ///< Flag indicates that addrLib will not align htile /// size to 256xBankxPipe when computing tc-compatible /// htile info. UINT_32 reserved : 30; ///< Reserved bits }; UINT_32 value; } ADDR_HTILE_FLAGS; /** **************************************************************************************************** * ADDR_COMPUTE_HTILE_INFO_INPUT * * @brief * Input structure of AddrComputeHtileInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_HTILE_INFO_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR_HTILE_FLAGS flags; ///< HTILE flags UINT_32 pitch; ///< Surface pitch, in pixels UINT_32 height; ///< Surface height, in pixels UINT_32 numSlices; ///< Number of slices BOOL_32 isLinear; ///< Linear or tiled HTILE layout AddrHtileBlockSize blockWidth; ///< 4 or 8. EG above only support 8 AddrHtileBlockSize blockHeight; ///< 4 or 8. EG above only support 8 ADDR_TILEINFO* pTileInfo; ///< Tile info INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMPUTE_HTILE_INFO_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_HTILE_INFO_OUTPUT * * @brief * Output structure of AddrComputeHtileInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_HTILE_INFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch in pixels of depth buffer represented in this /// HTile buffer. This might be larger than original depth /// buffer pitch when called with an unaligned pitch. UINT_32 height; ///< Height in pixels, as above UINT_64 htileBytes; ///< Size of HTILE buffer, in bytes UINT_32 baseAlign; ///< Base alignment UINT_32 bpp; ///< Bits per pixel for HTILE is how many bits for an 8x8 block! UINT_32 macroWidth; ///< Macro width in pixels, actually squared cache shape UINT_32 macroHeight; ///< Macro height in pixels UINT_64 sliceSize; ///< Slice size, in bytes. BOOL_32 sliceInterleaved; ///< Flag to indicate if different slice's htile is interleaved /// Compute engine clear can't be used if htile is interleaved BOOL_32 nextMipLevelCompressible; ///< Flag to indicate whether HTILE can be enabled in /// next mip level, it also indicates if memory set based /// fast clear can be used for current mip level. } ADDR_COMPUTE_HTILE_INFO_OUTPUT; /** **************************************************************************************************** * AddrComputeHtileInfo * * @brief * Compute Htile pitch, height, base alignment and size in bytes **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeHtileInfo( ADDR_HANDLE hLib, const ADDR_COMPUTE_HTILE_INFO_INPUT* pIn, ADDR_COMPUTE_HTILE_INFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT * * @brief * Input structure for AddrComputeHtileAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch, in pixels UINT_32 height; ///< Height in pixels UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Index of slice UINT_32 numSlices; ///< Number of slices BOOL_32 isLinear; ///< Linear or tiled HTILE layout ADDR_HTILE_FLAGS flags; ///< htile flags AddrHtileBlockSize blockWidth; ///< 4 or 8. 1 means 8, 0 means 4. EG above only support 8 AddrHtileBlockSize blockHeight; ///< 4 or 8. 1 means 8, 0 means 4. EG above only support 8 ADDR_TILEINFO* pTileInfo; ///< Tile info INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid UINT_32 bpp; ///< depth/stencil buffer bit per pixel size UINT_32 zStencilAddr; ///< tcCompatible Z/Stencil surface address } ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for AddrComputeHtileAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Address in bytes UINT_32 bitPosition; ///< Bit position, 0 or 4. CMASK and HTILE shares some lib method. /// So we keep bitPosition for HTILE as well } ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * AddrComputeHtileAddrFromCoord * * @brief * Compute Htile address according to coordinates (of depth buffer) **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeHtileAddrFromCoord( ADDR_HANDLE hLib, const ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT * * @brief * Input structure for AddrComputeHtileCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Address UINT_32 bitPosition; ///< Bit position 0 or 4. CMASK and HTILE share some methods /// so we keep bitPosition for HTILE as well UINT_32 pitch; ///< Pitch, in pixels UINT_32 height; ///< Height, in pixels UINT_32 numSlices; ///< Number of slices BOOL_32 isLinear; ///< Linear or tiled HTILE layout AddrHtileBlockSize blockWidth; ///< 4 or 8. 1 means 8, 0 means 4. R8xx/R9xx only support 8 AddrHtileBlockSize blockHeight; ///< 4 or 8. 1 means 8, 0 means 4. R8xx/R9xx only support 8 ADDR_TILEINFO* pTileInfo; ///< Tile info INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_HTILE_COORDFROMADDR_OUTPUT * * @brief * Output structure for AddrComputeHtileCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_HTILE_COORDFROMADDR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Slice index } ADDR_COMPUTE_HTILE_COORDFROMADDR_OUTPUT; /** **************************************************************************************************** * AddrComputeHtileCoordFromAddr * * @brief * Compute coordinates within depth buffer (1st pixel of a micro tile) according to * Htile address **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeHtileCoordFromAddr( ADDR_HANDLE hLib, const ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // C-mask functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR_CMASK_FLAGS * * @brief * CMASK flags **************************************************************************************************** */ typedef union _ADDR_CMASK_FLAGS { struct { UINT_32 tcCompatible : 1; ///< Flag indicates surface needs to be shader readable UINT_32 reserved :31; ///< Reserved bits }; UINT_32 value; } ADDR_CMASK_FLAGS; /** **************************************************************************************************** * ADDR_COMPUTE_CMASK_INFO_INPUT * * @brief * Input structure of AddrComputeCmaskInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_CMASKINFO_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR_CMASK_FLAGS flags; ///< CMASK flags UINT_32 pitch; ///< Pitch, in pixels, of color buffer UINT_32 height; ///< Height, in pixels, of color buffer UINT_32 numSlices; ///< Number of slices, of color buffer BOOL_32 isLinear; ///< Linear or tiled layout, Only SI can be linear ADDR_TILEINFO* pTileInfo; ///< Tile info INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMPUTE_CMASK_INFO_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_CMASK_INFO_OUTPUT * * @brief * Output structure of AddrComputeCmaskInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_CMASK_INFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch in pixels of color buffer which /// this Cmask matches. The size might be larger than /// original color buffer pitch when called with /// an unaligned pitch. UINT_32 height; ///< Height in pixels, as above UINT_64 cmaskBytes; ///< Size in bytes of CMask buffer UINT_32 baseAlign; ///< Base alignment UINT_32 blockMax; ///< Cmask block size. Need this to set CB_COLORn_MASK register UINT_32 macroWidth; ///< Macro width in pixels, actually squared cache shape UINT_32 macroHeight; ///< Macro height in pixels UINT_64 sliceSize; ///< Slice size, in bytes. } ADDR_COMPUTE_CMASK_INFO_OUTPUT; /** **************************************************************************************************** * AddrComputeCmaskInfo * * @brief * Compute Cmask pitch, height, base alignment and size in bytes from color buffer * info **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeCmaskInfo( ADDR_HANDLE hLib, const ADDR_COMPUTE_CMASK_INFO_INPUT* pIn, ADDR_COMPUTE_CMASK_INFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT * * @brief * Input structure for AddrComputeCmaskAddrFromCoord * **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_64 fmaskAddr; ///< Fmask addr for tc compatible Cmask UINT_32 slice; ///< Slice index UINT_32 pitch; ///< Pitch in pixels, of color buffer UINT_32 height; ///< Height in pixels, of color buffer UINT_32 numSlices; ///< Number of slices UINT_32 bpp; BOOL_32 isLinear; ///< Linear or tiled layout, Only SI can be linear ADDR_CMASK_FLAGS flags; ///< CMASK flags ADDR_TILEINFO* pTileInfo; ///< Tile info INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it ///< while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for AddrComputeCmaskAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< CMASK address in bytes UINT_32 bitPosition; ///< Bit position within addr, 0-7. CMASK is 4 bpp, /// so the address may be located in bit 0 (0) or 4 (4) } ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * AddrComputeCmaskAddrFromCoord * * @brief * Compute Cmask address according to coordinates (of MSAA color buffer) **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeCmaskAddrFromCoord( ADDR_HANDLE hLib, const ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT * * @brief * Input structure for AddrComputeCmaskCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< CMASK address in bytes UINT_32 bitPosition; ///< Bit position within addr, 0-7. CMASK is 4 bpp, /// so the address may be located in bit 0 (0) or 4 (4) UINT_32 pitch; ///< Pitch, in pixels UINT_32 height; ///< Height in pixels UINT_32 numSlices; ///< Number of slices BOOL_32 isLinear; ///< Linear or tiled layout, Only SI can be linear ADDR_TILEINFO* pTileInfo; ///< Tile info INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_CMASK_COORDFROMADDR_OUTPUT * * @brief * Output structure for AddrComputeCmaskCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_CMASK_COORDFROMADDR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Slice index } ADDR_COMPUTE_CMASK_COORDFROMADDR_OUTPUT; /** **************************************************************************************************** * AddrComputeCmaskCoordFromAddr * * @brief * Compute coordinates within color buffer (1st pixel of a micro tile) according to * Cmask address **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeCmaskCoordFromAddr( ADDR_HANDLE hLib, const ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_CMASK_COORDFROMADDR_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // F-mask functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR_COMPUTE_FMASK_INFO_INPUT * * @brief * Input structure for AddrComputeFmaskInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_FMASK_INFO_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrTileMode tileMode; ///< Tile mode UINT_32 pitch; ///< Surface pitch, in pixels UINT_32 height; ///< Surface height, in pixels UINT_32 numSlices; ///< Number of slice/depth UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA /// r800 and later HWL parameters struct { UINT_32 resolved: 1; ///< TRUE if the surface is for resolved fmask, only used /// by H/W clients. S/W should always set it to FALSE. UINT_32 reserved: 31; ///< Reserved for future use. }; ADDR_TILEINFO* pTileInfo; ///< 2D tiling parameters. Clients must give valid data INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 } ADDR_COMPUTE_FMASK_INFO_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_FMASK_INFO_OUTPUT * * @brief * Output structure for AddrComputeFmaskInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_FMASK_INFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch of fmask in pixels UINT_32 height; ///< Height of fmask in pixels UINT_32 numSlices; ///< Slices of fmask UINT_64 fmaskBytes; ///< Size of fmask in bytes UINT_32 baseAlign; ///< Base address alignment UINT_32 pitchAlign; ///< Pitch alignment UINT_32 heightAlign; ///< Height alignment UINT_32 bpp; ///< Bits per pixel of FMASK is: number of bit planes UINT_32 numSamples; ///< Number of samples, used for dump, export this since input /// may be changed in 9xx and above /// r800 and later HWL parameters ADDR_TILEINFO* pTileInfo; ///< Tile parameters used. Fmask can have different /// bank_height from color buffer INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) UINT_64 sliceSize; ///< Size of slice in bytes } ADDR_COMPUTE_FMASK_INFO_OUTPUT; /** **************************************************************************************************** * AddrComputeFmaskInfo * * @brief * Compute Fmask pitch/height/depth/alignments and size in bytes **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeFmaskInfo( ADDR_HANDLE hLib, const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT * * @brief * Input structure for AddrComputeFmaskAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Slice index UINT_32 plane; ///< Plane number UINT_32 sample; ///< Sample index (fragment index for EQAA) UINT_32 pitch; ///< Surface pitch, in pixels UINT_32 height; ///< Surface height, in pixels UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA AddrTileMode tileMode; ///< Tile mode union { struct { UINT_32 bankSwizzle; ///< Bank swizzle UINT_32 pipeSwizzle; ///< Pipe swizzle }; UINT_32 tileSwizzle; ///< Combined swizzle, if useCombinedSwizzle is TRUE }; /// r800 and later HWL parameters struct { UINT_32 resolved: 1; ///< TRUE if this is a resolved fmask, used by H/W clients UINT_32 ignoreSE: 1; ///< TRUE if shader engines are ignored. UINT_32 reserved: 30; ///< Reserved for future use. }; ADDR_TILEINFO* pTileInfo; ///< 2D tiling parameters. Client must provide all data } ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for AddrComputeFmaskAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Fmask address UINT_32 bitPosition; ///< Bit position within fmaskAddr, 0-7. } ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * AddrComputeFmaskAddrFromCoord * * @brief * Compute Fmask address according to coordinates (x,y,slice,sample,plane) **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeFmaskAddrFromCoord( ADDR_HANDLE hLib, const ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT * * @brief * Input structure for AddrComputeFmaskCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Address UINT_32 bitPosition; ///< Bit position within addr, 0-7. UINT_32 pitch; ///< Pitch, in pixels UINT_32 height; ///< Height in pixels UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments AddrTileMode tileMode; ///< Tile mode union { struct { UINT_32 bankSwizzle; ///< Bank swizzle UINT_32 pipeSwizzle; ///< Pipe swizzle }; UINT_32 tileSwizzle; ///< Combined swizzle, if useCombinedSwizzle is TRUE }; /// r800 and later HWL parameters struct { UINT_32 resolved: 1; ///< TRUE if this is a resolved fmask, used by HW components UINT_32 ignoreSE: 1; ///< TRUE if shader engines are ignored. UINT_32 reserved: 30; ///< Reserved for future use. }; ADDR_TILEINFO* pTileInfo; ///< 2D tile parameters. Client must provide all data } ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT * * @brief * Output structure for AddrComputeFmaskCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Slice index UINT_32 plane; ///< Plane number UINT_32 sample; ///< Sample index (fragment index for EQAA) } ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT; /** **************************************************************************************************** * AddrComputeFmaskCoordFromAddr * * @brief * Compute FMASK coordinate from an given address **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeFmaskCoordFromAddr( ADDR_HANDLE hLib, const ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // Element/utility functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrGetVersion * * @brief * Get AddrLib version number **************************************************************************************************** */ UINT_32 ADDR_API AddrGetVersion(ADDR_HANDLE hLib); /** **************************************************************************************************** * AddrUseTileIndex * * @brief * Return TRUE if tileIndex is enabled in this address library **************************************************************************************************** */ BOOL_32 ADDR_API AddrUseTileIndex(ADDR_HANDLE hLib); /** **************************************************************************************************** * AddrUseCombinedSwizzle * * @brief * Return TRUE if combined swizzle is enabled in this address library **************************************************************************************************** */ BOOL_32 ADDR_API AddrUseCombinedSwizzle(ADDR_HANDLE hLib); /** **************************************************************************************************** * ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT * * @brief * Input structure of AddrExtractBankPipeSwizzle **************************************************************************************************** */ typedef struct _ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 base256b; ///< Base256b value /// r800 and later HWL parameters ADDR_TILEINFO* pTileInfo; ///< 2D tile parameters. Client must provide all data INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT; /** **************************************************************************************************** * ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT * * @brief * Output structure of AddrExtractBankPipeSwizzle **************************************************************************************************** */ typedef struct _ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 bankSwizzle; ///< Bank swizzle UINT_32 pipeSwizzle; ///< Pipe swizzle } ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT; /** **************************************************************************************************** * AddrExtractBankPipeSwizzle * * @brief * Extract Bank and Pipe swizzle from base256b * @return * ADDR_OK if no error **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrExtractBankPipeSwizzle( ADDR_HANDLE hLib, const ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT* pIn, ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMBINE_BANKPIPE_SWIZZLE_INPUT * * @brief * Input structure of AddrCombineBankPipeSwizzle **************************************************************************************************** */ typedef struct _ADDR_COMBINE_BANKPIPE_SWIZZLE_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 bankSwizzle; ///< Bank swizzle UINT_32 pipeSwizzle; ///< Pipe swizzle UINT_64 baseAddr; ///< Base address (leave it zero for driver clients) /// r800 and later HWL parameters ADDR_TILEINFO* pTileInfo; ///< 2D tile parameters. Client must provide all data INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMBINE_BANKPIPE_SWIZZLE_INPUT; /** **************************************************************************************************** * ADDR_COMBINE_BANKPIPE_SWIZZLE_OUTPUT * * @brief * Output structure of AddrCombineBankPipeSwizzle **************************************************************************************************** */ typedef struct _ADDR_COMBINE_BANKPIPE_SWIZZLE_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 tileSwizzle; ///< Combined swizzle } ADDR_COMBINE_BANKPIPE_SWIZZLE_OUTPUT; /** **************************************************************************************************** * AddrCombineBankPipeSwizzle * * @brief * Combine Bank and Pipe swizzle * @return * ADDR_OK if no error * @note * baseAddr here is full MCAddress instead of base256b **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrCombineBankPipeSwizzle( ADDR_HANDLE hLib, const ADDR_COMBINE_BANKPIPE_SWIZZLE_INPUT* pIn, ADDR_COMBINE_BANKPIPE_SWIZZLE_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_COMPUTE_SLICESWIZZLE_INPUT * * @brief * Input structure of AddrComputeSliceSwizzle **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_SLICESWIZZLE_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrTileMode tileMode; ///< Tile Mode UINT_32 baseSwizzle; ///< Base tile swizzle UINT_32 slice; ///< Slice index UINT_64 baseAddr; ///< Base address, driver should leave it 0 in most cases /// r800 and later HWL parameters ADDR_TILEINFO* pTileInfo; ///< 2D tile parameters. Actually banks needed here! INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMPUTE_SLICESWIZZLE_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_SLICESWIZZLE_OUTPUT * * @brief * Output structure of AddrComputeSliceSwizzle **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_SLICESWIZZLE_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 tileSwizzle; ///< Recalculated tileSwizzle value } ADDR_COMPUTE_SLICESWIZZLE_OUTPUT; /** **************************************************************************************************** * AddrComputeSliceSwizzle * * @brief * Extract Bank and Pipe swizzle from base256b * @return * ADDR_OK if no error **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeSliceSwizzle( ADDR_HANDLE hLib, const ADDR_COMPUTE_SLICESWIZZLE_INPUT* pIn, ADDR_COMPUTE_SLICESWIZZLE_OUTPUT* pOut); /** **************************************************************************************************** * AddrSwizzleGenOption * * @brief * Which swizzle generating options: legacy or linear **************************************************************************************************** */ typedef enum _AddrSwizzleGenOption { ADDR_SWIZZLE_GEN_DEFAULT = 0, ///< As is in client driver implemention for swizzle ADDR_SWIZZLE_GEN_LINEAR = 1, ///< Using a linear increment of swizzle } AddrSwizzleGenOption; /** **************************************************************************************************** * AddrSwizzleOption * * @brief * Controls how swizzle is generated **************************************************************************************************** */ typedef union _ADDR_SWIZZLE_OPTION { struct { UINT_32 genOption : 1; ///< The way swizzle is generated, see AddrSwizzleGenOption UINT_32 reduceBankBit : 1; ///< TRUE if we need reduce swizzle bits UINT_32 reserved :30; ///< Reserved bits }; UINT_32 value; } ADDR_SWIZZLE_OPTION; /** **************************************************************************************************** * ADDR_COMPUTE_BASE_SWIZZLE_INPUT * * @brief * Input structure of AddrComputeBaseSwizzle **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_BASE_SWIZZLE_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR_SWIZZLE_OPTION option; ///< Swizzle option UINT_32 surfIndex; ///< Index of this surface type AddrTileMode tileMode; ///< Tile Mode /// r800 and later HWL parameters ADDR_TILEINFO* pTileInfo; ///< 2D tile parameters. Actually banks needed here! INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMPUTE_BASE_SWIZZLE_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT * * @brief * Output structure of AddrComputeBaseSwizzle **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 tileSwizzle; ///< Combined swizzle } ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT; /** **************************************************************************************************** * AddrComputeBaseSwizzle * * @brief * Return a Combined Bank and Pipe swizzle base on surface based on surface type/index * @return * ADDR_OK if no error **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeBaseSwizzle( ADDR_HANDLE hLib, const ADDR_COMPUTE_BASE_SWIZZLE_INPUT* pIn, ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT* pOut); /** **************************************************************************************************** * ELEM_GETEXPORTNORM_INPUT * * @brief * Input structure for ElemGetExportNorm * **************************************************************************************************** */ typedef struct _ELEM_GETEXPORTNORM_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrColorFormat format; ///< Color buffer format; Client should use ColorFormat AddrSurfaceNumber num; ///< Surface number type; Client should use NumberType AddrSurfaceSwap swap; ///< Surface swap byte swap; Client should use SurfaceSwap UINT_32 numSamples; ///< Number of samples } ELEM_GETEXPORTNORM_INPUT; /** **************************************************************************************************** * ElemGetExportNorm * * @brief * Helper function to check one format can be EXPORT_NUM, which is a register * CB_COLOR_INFO.SURFACE_FORMAT. FP16 can be reported as EXPORT_NORM for rv770 in r600 * family * @note * The implementation is only for r600. * 00 - EXPORT_FULL: PS exports are 4 pixels with 4 components with 32-bits-per-component. (two * clocks per export) * 01 - EXPORT_NORM: PS exports are 4 pixels with 4 components with 16-bits-per-component. (one * clock per export) * **************************************************************************************************** */ BOOL_32 ADDR_API ElemGetExportNorm( ADDR_HANDLE hLib, const ELEM_GETEXPORTNORM_INPUT* pIn); /** **************************************************************************************************** * ELEM_FLT32TODEPTHPIXEL_INPUT * * @brief * Input structure for addrFlt32ToDepthPixel * **************************************************************************************************** */ typedef struct _ELEM_FLT32TODEPTHPIXEL_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrDepthFormat format; ///< Depth buffer format ADDR_FLT_32 comps[2]; ///< Component values (Z/stencil) } ELEM_FLT32TODEPTHPIXEL_INPUT; /** **************************************************************************************************** * ELEM_FLT32TODEPTHPIXEL_INPUT * * @brief * Output structure for ElemFlt32ToDepthPixel * **************************************************************************************************** */ typedef struct _ELEM_FLT32TODEPTHPIXEL_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_8* pPixel; ///< Real depth value. Same data type as depth buffer. /// Client must provide enough storage for this type. UINT_32 depthBase; ///< Tile base in bits for depth bits UINT_32 stencilBase; ///< Tile base in bits for stencil bits UINT_32 depthBits; ///< Bits for depth UINT_32 stencilBits; ///< Bits for stencil } ELEM_FLT32TODEPTHPIXEL_OUTPUT; /** **************************************************************************************************** * ElemFlt32ToDepthPixel * * @brief * Convert a FLT_32 value to a depth/stencil pixel value * * @return * Return code * **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API ElemFlt32ToDepthPixel( ADDR_HANDLE hLib, const ELEM_FLT32TODEPTHPIXEL_INPUT* pIn, ELEM_FLT32TODEPTHPIXEL_OUTPUT* pOut); /** **************************************************************************************************** * ELEM_FLT32TOCOLORPIXEL_INPUT * * @brief * Input structure for addrFlt32ToColorPixel * **************************************************************************************************** */ typedef struct _ELEM_FLT32TOCOLORPIXEL_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrColorFormat format; ///< Color buffer format AddrSurfaceNumber surfNum; ///< Surface number AddrSurfaceSwap surfSwap; ///< Surface swap ADDR_FLT_32 comps[4]; ///< Component values (r/g/b/a) } ELEM_FLT32TOCOLORPIXEL_INPUT; /** **************************************************************************************************** * ELEM_FLT32TOCOLORPIXEL_INPUT * * @brief * Output structure for ElemFlt32ToColorPixel * **************************************************************************************************** */ typedef struct _ELEM_FLT32TOCOLORPIXEL_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_8* pPixel; ///< Real color value. Same data type as color buffer. /// Client must provide enough storage for this type. } ELEM_FLT32TOCOLORPIXEL_OUTPUT; /** **************************************************************************************************** * ElemFlt32ToColorPixel * * @brief * Convert a FLT_32 value to a red/green/blue/alpha pixel value * * @return * Return code * **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API ElemFlt32ToColorPixel( ADDR_HANDLE hLib, const ELEM_FLT32TOCOLORPIXEL_INPUT* pIn, ELEM_FLT32TOCOLORPIXEL_OUTPUT* pOut); /** **************************************************************************************************** * ElemSize * * @brief * Get bits-per-element for specified format * * @return * Bits-per-element of specified format * **************************************************************************************************** */ UINT_32 ADDR_API ElemSize( ADDR_HANDLE hLib, AddrFormat format); /** **************************************************************************************************** * ADDR_CONVERT_TILEINFOTOHW_INPUT * * @brief * Input structure for AddrConvertTileInfoToHW * @note * When reverse is TRUE, indices are igonred **************************************************************************************************** */ typedef struct _ADDR_CONVERT_TILEINFOTOHW_INPUT { UINT_32 size; ///< Size of this structure in bytes BOOL_32 reverse; ///< Convert control flag. /// FALSE: convert from real value to HW value; /// TRUE: convert from HW value to real value. /// r800 and later HWL parameters ADDR_TILEINFO* pTileInfo; ///< Tile parameters with real value INT_32 tileIndex; ///< Tile index, MUST be -1 if you don't want to use it /// while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid UINT_32 bpp; ///< Bits per pixel } ADDR_CONVERT_TILEINFOTOHW_INPUT; /** **************************************************************************************************** * ADDR_CONVERT_TILEINFOTOHW_OUTPUT * * @brief * Output structure for AddrConvertTileInfoToHW **************************************************************************************************** */ typedef struct _ADDR_CONVERT_TILEINFOTOHW_OUTPUT { UINT_32 size; ///< Size of this structure in bytes /// r800 and later HWL parameters ADDR_TILEINFO* pTileInfo; ///< Tile parameters with hardware register value } ADDR_CONVERT_TILEINFOTOHW_OUTPUT; /** **************************************************************************************************** * AddrConvertTileInfoToHW * * @brief * Convert tile info from real value to hardware register value **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrConvertTileInfoToHW( ADDR_HANDLE hLib, const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_CONVERT_TILEINDEX_INPUT * * @brief * Input structure for AddrConvertTileIndex **************************************************************************************************** */ typedef struct _ADDR_CONVERT_TILEINDEX_INPUT { UINT_32 size; ///< Size of this structure in bytes INT_32 tileIndex; ///< Tile index INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) UINT_32 bpp; ///< Bits per pixel BOOL_32 tileInfoHw; ///< Set to TRUE if client wants HW enum, otherwise actual } ADDR_CONVERT_TILEINDEX_INPUT; /** **************************************************************************************************** * ADDR_CONVERT_TILEINDEX_OUTPUT * * @brief * Output structure for AddrConvertTileIndex **************************************************************************************************** */ typedef struct _ADDR_CONVERT_TILEINDEX_OUTPUT { UINT_32 size; ///< Size of this structure in bytes AddrTileMode tileMode; ///< Tile mode AddrTileType tileType; ///< Tile type ADDR_TILEINFO* pTileInfo; ///< Tile info } ADDR_CONVERT_TILEINDEX_OUTPUT; /** **************************************************************************************************** * AddrConvertTileIndex * * @brief * Convert tile index to tile mode/type/info **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrConvertTileIndex( ADDR_HANDLE hLib, const ADDR_CONVERT_TILEINDEX_INPUT* pIn, ADDR_CONVERT_TILEINDEX_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_GET_MACROMODEINDEX_INPUT * * @brief * Input structure for AddrGetMacroModeIndex **************************************************************************************************** */ typedef struct _ADDR_GET_MACROMODEINDEX_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR_SURFACE_FLAGS flags; ///< Surface flag INT_32 tileIndex; ///< Tile index UINT_32 bpp; ///< Bits per pixel UINT_32 numFrags; ///< Number of color fragments } ADDR_GET_MACROMODEINDEX_INPUT; /** **************************************************************************************************** * ADDR_GET_MACROMODEINDEX_OUTPUT * * @brief * Output structure for AddrGetMacroModeIndex **************************************************************************************************** */ typedef struct _ADDR_GET_MACROMODEINDEX_OUTPUT { UINT_32 size; ///< Size of this structure in bytes INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) } ADDR_GET_MACROMODEINDEX_OUTPUT; /** **************************************************************************************************** * AddrGetMacroModeIndex * * @brief * Get macro mode index based on input parameters **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrGetMacroModeIndex( ADDR_HANDLE hLib, const ADDR_GET_MACROMODEINDEX_INPUT* pIn, ADDR_GET_MACROMODEINDEX_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_CONVERT_TILEINDEX1_INPUT * * @brief * Input structure for AddrConvertTileIndex1 (without macro mode index) **************************************************************************************************** */ typedef struct _ADDR_CONVERT_TILEINDEX1_INPUT { UINT_32 size; ///< Size of this structure in bytes INT_32 tileIndex; ///< Tile index UINT_32 bpp; ///< Bits per pixel UINT_32 numSamples; ///< Number of samples BOOL_32 tileInfoHw; ///< Set to TRUE if client wants HW enum, otherwise actual } ADDR_CONVERT_TILEINDEX1_INPUT; /** **************************************************************************************************** * AddrConvertTileIndex1 * * @brief * Convert tile index to tile mode/type/info **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrConvertTileIndex1( ADDR_HANDLE hLib, const ADDR_CONVERT_TILEINDEX1_INPUT* pIn, ADDR_CONVERT_TILEINDEX_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_GET_TILEINDEX_INPUT * * @brief * Input structure for AddrGetTileIndex **************************************************************************************************** */ typedef struct _ADDR_GET_TILEINDEX_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrTileMode tileMode; ///< Tile mode AddrTileType tileType; ///< Tile-type: disp/non-disp/... ADDR_TILEINFO* pTileInfo; ///< Pointer to tile-info structure, can be NULL for linear/1D } ADDR_GET_TILEINDEX_INPUT; /** **************************************************************************************************** * ADDR_GET_TILEINDEX_OUTPUT * * @brief * Output structure for AddrGetTileIndex **************************************************************************************************** */ typedef struct _ADDR_GET_TILEINDEX_OUTPUT { UINT_32 size; ///< Size of this structure in bytes INT_32 index; ///< index in table } ADDR_GET_TILEINDEX_OUTPUT; /** **************************************************************************************************** * AddrGetTileIndex * * @brief * Get the tiling mode index in table **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrGetTileIndex( ADDR_HANDLE hLib, const ADDR_GET_TILEINDEX_INPUT* pIn, ADDR_GET_TILEINDEX_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_PRT_INFO_INPUT * * @brief * Input structure for AddrComputePrtInfo **************************************************************************************************** */ typedef struct _ADDR_PRT_INFO_INPUT { AddrFormat format; ///< Surface format UINT_32 baseMipWidth; ///< Base mipmap width UINT_32 baseMipHeight; ///< Base mipmap height UINT_32 baseMipDepth; ///< Base mipmap depth UINT_32 numFrags; ///< Number of fragments, } ADDR_PRT_INFO_INPUT; /** **************************************************************************************************** * ADDR_PRT_INFO_OUTPUT * * @brief * Input structure for AddrComputePrtInfo **************************************************************************************************** */ typedef struct _ADDR_PRT_INFO_OUTPUT { UINT_32 prtTileWidth; UINT_32 prtTileHeight; } ADDR_PRT_INFO_OUTPUT; /** **************************************************************************************************** * AddrComputePrtInfo * * @brief * Compute prt surface related information **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputePrtInfo( ADDR_HANDLE hLib, const ADDR_PRT_INFO_INPUT* pIn, ADDR_PRT_INFO_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // DCC key functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * _ADDR_COMPUTE_DCCINFO_INPUT * * @brief * Input structure of AddrComputeDccInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_DCCINFO_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 bpp; ///< BitPP of color surface UINT_32 numSamples; ///< Sample number of color surface UINT_64 colorSurfSize; ///< Size of color surface to which dcc key is bound AddrTileMode tileMode; ///< Tile mode of color surface ADDR_TILEINFO tileInfo; ///< Tile info of color surface UINT_32 tileSwizzle; ///< Tile swizzle INT_32 tileIndex; ///< Tile index of color surface, ///< MUST be -1 if you don't want to use it ///< while the global useTileIndex is set to 1 INT_32 macroModeIndex; ///< Index in macro tile mode table if there is one (CI) ///< README: When tileIndex is not -1, this must be valid } ADDR_COMPUTE_DCCINFO_INPUT; /** **************************************************************************************************** * ADDR_COMPUTE_DCCINFO_OUTPUT * * @brief * Output structure of AddrComputeDccInfo **************************************************************************************************** */ typedef struct _ADDR_COMPUTE_DCCINFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 dccRamBaseAlign; ///< Base alignment of dcc key UINT_64 dccRamSize; ///< Size of dcc key UINT_64 dccFastClearSize; ///< Size of dcc key portion that can be fast cleared BOOL_32 subLvlCompressible; ///< Whether sub resource is compressiable BOOL_32 dccRamSizeAligned; ///< Whether the dcc key size is aligned } ADDR_COMPUTE_DCCINFO_OUTPUT; /** **************************************************************************************************** * AddrComputeDccInfo * * @brief * Compute DCC key size, base alignment * info **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeDccInfo( ADDR_HANDLE hLib, const ADDR_COMPUTE_DCCINFO_INPUT* pIn, ADDR_COMPUTE_DCCINFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR_GET_MAX_ALIGNMENTS_OUTPUT * * @brief * Output structure of AddrGetMaxAlignments **************************************************************************************************** */ typedef struct _ADDR_GET_MAX_ALIGNMENTS_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 baseAlign; ///< Maximum base alignment in bytes } ADDR_GET_MAX_ALIGNMENTS_OUTPUT; /** **************************************************************************************************** * AddrGetMaxAlignments * * @brief * Gets maximnum alignments **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrGetMaxAlignments( ADDR_HANDLE hLib, ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut); /** **************************************************************************************************** * AddrGetMaxMetaAlignments * * @brief * Gets maximnum alignments for metadata **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrGetMaxMetaAlignments( ADDR_HANDLE hLib, ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut); /** **************************************************************************************************** * Address library interface version 2 * available from Gfx9 hardware **************************************************************************************************** * Addr2ComputeSurfaceInfo() * Addr2ComputeSurfaceAddrFromCoord() * Addr2ComputeSurfaceCoordFromAddr() * Addr2ComputeHtileInfo() * Addr2ComputeHtileAddrFromCoord() * Addr2ComputeHtileCoordFromAddr() * * Addr2ComputeCmaskInfo() * Addr2ComputeCmaskAddrFromCoord() * Addr2ComputeCmaskCoordFromAddr() * * Addr2ComputeFmaskInfo() * Addr2ComputeFmaskAddrFromCoord() * Addr2ComputeFmaskCoordFromAddr() * * Addr2ComputeDccInfo() * **/ //////////////////////////////////////////////////////////////////////////////////////////////////// // Surface functions for Gfx9 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR2_SURFACE_FLAGS * * @brief * Surface flags **************************************************************************************************** */ typedef union _ADDR2_SURFACE_FLAGS { struct { UINT_32 color : 1; ///< This resource is a color buffer, can be used with RTV UINT_32 depth : 1; ///< Thie resource is a depth buffer, can be used with DSV UINT_32 stencil : 1; ///< Thie resource is a stencil buffer, can be used with DSV UINT_32 fmask : 1; ///< This is an fmask surface UINT_32 overlay : 1; ///< This is an overlay surface UINT_32 display : 1; ///< This resource is displable, can be used with DRV UINT_32 prt : 1; ///< This is a partially resident texture UINT_32 qbStereo : 1; ///< This is a quad buffer stereo surface UINT_32 interleaved : 1; ///< Special flag for interleaved YUV surface padding UINT_32 texture : 1; ///< This resource can be used with SRV UINT_32 unordered : 1; ///< This resource can be used with UAV UINT_32 rotated : 1; ///< This resource is rotated and displable UINT_32 needEquation : 1; ///< This resource needs equation to be generated if possible UINT_32 opt4space : 1; ///< This resource should be optimized for space UINT_32 minimizeAlign : 1; ///< This resource should use minimum alignment UINT_32 noMetadata : 1; ///< This resource has no metadata UINT_32 metaRbUnaligned : 1; ///< This resource has rb unaligned metadata UINT_32 metaPipeUnaligned : 1; ///< This resource has pipe unaligned metadata UINT_32 view3dAs2dArray : 1; ///< This resource is a 3D resource viewed as 2D array UINT_32 reserved : 13; ///< Reserved bits }; UINT_32 value; } ADDR2_SURFACE_FLAGS; /** **************************************************************************************************** * ADDR2_COMPUTE_SURFACE_INFO_INPUT * * @brief * Input structure for Addr2ComputeSurfaceInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SURFACE_INFO_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR2_SURFACE_FLAGS flags; ///< Surface flags AddrSwizzleMode swizzleMode; ///< Swizzle Mode for Gfx9 AddrResourceType resourceType; ///< Surface type AddrFormat format; ///< Surface format UINT_32 bpp; ///< bits per pixel UINT_32 width; ///< Width (of mip0), in pixels UINT_32 height; ///< Height (of mip0), in pixels UINT_32 numSlices; ///< Number surface slice/depth (of mip0), UINT_32 numMipLevels; ///< Total mipmap levels. UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA UINT_32 pitchInElement; ///< Pitch in elements (blocks for compressed formats) UINT_32 sliceAlign; ///< Required slice size in bytes } ADDR2_COMPUTE_SURFACE_INFO_INPUT; /** **************************************************************************************************** * ADDR2_MIP_INFO * * @brief * Structure that contains information for mip level * **************************************************************************************************** */ typedef struct _ADDR2_MIP_INFO { UINT_32 pitch; ///< Pitch in elements UINT_32 height; ///< Padded height in elements UINT_32 depth; ///< Padded depth UINT_32 pixelPitch; ///< Pitch in pixels UINT_32 pixelHeight; ///< Padded height in pixels UINT_32 equationIndex; ///< Equation index in the equation table UINT_64 offset; ///< Offset in bytes from mip base, should only be used ///< to setup vam surface descriptor, can't be used ///< to setup swizzle pattern UINT_64 macroBlockOffset; ///< macro block offset in bytes from mip base UINT_32 mipTailOffset; ///< mip tail offset in bytes UINT_32 mipTailCoordX; ///< mip tail coord x UINT_32 mipTailCoordY; ///< mip tail coord y UINT_32 mipTailCoordZ; ///< mip tail coord z } ADDR2_MIP_INFO; /** **************************************************************************************************** * ADDR2_COMPUTE_SURFACE_INFO_OUTPUT * * @brief * Output structure for Addr2ComputeSurfInfo * @note Element: AddrLib unit for computing. e.g. BCn: 4x4 blocks; R32B32B32: 32bit with 3x pitch Pixel: Original pixel **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SURFACE_INFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch in elements (blocks for compressed formats) UINT_32 height; ///< Padded height (of mip0) in elements UINT_32 numSlices; ///< Padded depth for 3d resource ///< or padded number of slices for 2d array resource UINT_32 mipChainPitch; ///< Pitch (of total mip chain) in elements UINT_32 mipChainHeight; ///< Padded height (of total mip chain) in elements UINT_32 mipChainSlice; ///< Padded depth (of total mip chain) UINT_64 sliceSize; ///< Slice (total mip chain) size in bytes UINT_64 surfSize; ///< Surface (total mip chain) size in bytes UINT_32 baseAlign; ///< Base address alignment UINT_32 bpp; ///< Bits per elements /// (e.g. blocks for BCn, 1/3 for 96bit) UINT_32 pixelMipChainPitch; ///< Mip chain pitch in original pixels UINT_32 pixelMipChainHeight; ///< Mip chain height in original pixels UINT_32 pixelPitch; ///< Pitch in original pixels UINT_32 pixelHeight; ///< Height in original pixels UINT_32 pixelBits; ///< Original bits per pixel, passed from input UINT_32 blockWidth; ///< Width in element inside one block UINT_32 blockHeight; ///< Height in element inside one block UINT_32 blockSlices; ///< Slice number inside one block ///< Prt tile is one block, its width/height/slice ///< equals to blcok width/height/slice BOOL_32 epitchIsHeight; ///< Whether to use height to program epitch register /// Stereo info ADDR_QBSTEREOINFO* pStereoInfo; ///< Stereo info, needed if qbStereo flag is TRUE /// Mip info ADDR2_MIP_INFO* pMipInfo; ///< Pointer to mip information array /// if it is not NULL, the array is assumed to /// contain numMipLevels entries UINT_32 equationIndex; ///< Equation index in the equation table of mip0 BOOL_32 mipChainInTail; ///< If whole mipchain falls into mip tail block UINT_32 firstMipIdInTail; ///< The id of first mip in tail, if there is no mip /// in tail, it will be set to number of mip levels } ADDR2_COMPUTE_SURFACE_INFO_OUTPUT; /** **************************************************************************************************** * Addr2ComputeSurfaceInfo * * @brief * Compute surface width/height/slices/alignments and suitable tiling mode **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSurfaceInfo( ADDR_HANDLE hLib, const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT * * @brief * Input structure for Addr2ComputeSurfaceAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Slice index UINT_32 sample; ///< Sample index, use fragment index for EQAA UINT_32 mipId; ///< the mip ID in mip chain AddrSwizzleMode swizzleMode; ///< Swizzle mode for Gfx9 ADDR2_SURFACE_FLAGS flags; ///< Surface flags AddrResourceType resourceType; ///< Surface type UINT_32 bpp; ///< Bits per pixel UINT_32 unalignedWidth; ///< Surface original width (of mip0) UINT_32 unalignedHeight; ///< Surface original height (of mip0) UINT_32 numSlices; ///< Surface original slices (of mip0) UINT_32 numMipLevels; ///< Total mipmap levels UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA UINT_32 pipeBankXor; ///< Combined swizzle used to do bank/pipe rotation UINT_32 pitchInElement; ///< Pitch in elements (blocks for compressed formats) } ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for Addr2ComputeSurfaceAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Byte address UINT_32 bitPosition; ///< Bit position within surfaceAddr, 0-7. /// For surface bpp < 8, e.g. FMT_1. UINT_32 prtBlockIndex; ///< Index of a PRT tile (64K block) } ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * Addr2ComputeSurfaceAddrFromCoord * * @brief * Compute surface address from a given coordinate. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSurfaceAddrFromCoord( ADDR_HANDLE hLib, const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT * * @brief * Input structure for Addr2ComputeSurfaceCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Address in bytes UINT_32 bitPosition; ///< Bit position in addr. 0-7. for surface bpp < 8, /// e.g. FMT_1; AddrSwizzleMode swizzleMode; ///< Swizzle mode for Gfx9 ADDR2_SURFACE_FLAGS flags; ///< Surface flags AddrResourceType resourceType; ///< Surface type UINT_32 bpp; ///< Bits per pixel UINT_32 unalignedWidth; ///< Surface original width (of mip0) UINT_32 unalignedHeight; ///< Surface original height (of mip0) UINT_32 numSlices; ///< Surface original slices (of mip0) UINT_32 numMipLevels; ///< Total mipmap levels. UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA UINT_32 pipeBankXor; ///< Combined swizzle used to do bank/pipe rotation UINT_32 pitchInElement; ///< Pitch in elements (blocks for compressed formats) } ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT * * @brief * Output structure for Addr2ComputeSurfaceCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Index of slices UINT_32 sample; ///< Index of samples, means fragment index for EQAA UINT_32 mipId; ///< mipmap level id } ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT; /** **************************************************************************************************** * Addr2ComputeSurfaceCoordFromAddr * * @brief * Compute coordinate from a given surface address **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSurfaceCoordFromAddr( ADDR_HANDLE hLib, const ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // HTile functions for Gfx9 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR2_META_FLAGS * * @brief * Metadata flags **************************************************************************************************** */ typedef union _ADDR2_META_FLAGS { struct { UINT_32 pipeAligned : 1; ///< if Metadata being pipe aligned UINT_32 rbAligned : 1; ///< if Metadata being RB aligned UINT_32 linear : 1; ///< if Metadata linear, GFX9 does not suppord this! UINT_32 reserved : 29; ///< Reserved bits }; UINT_32 value; } ADDR2_META_FLAGS; /** **************************************************************************************************** * ADDR2_META_MIP_INFO * * @brief * Structure to store per mip metadata information **************************************************************************************************** */ typedef struct _ADDR2_META_MIP_INFO { BOOL_32 inMiptail; union { struct { UINT_32 startX; UINT_32 startY; UINT_32 startZ; UINT_32 width; UINT_32 height; UINT_32 depth; }; struct { UINT_32 offset; ///< Metadata offset within one slice, /// the thickness of a slice is meta block depth. UINT_32 sliceSize; ///< Metadata size within one slice, /// the thickness of a slice is meta block depth. }; }; } ADDR2_META_MIP_INFO; /** **************************************************************************************************** * ADDR2_COMPUTE_HTILE_INFO_INPUT * * @brief * Input structure of Addr2ComputeHtileInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_HTILE_INFO_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR2_META_FLAGS hTileFlags; ///< HTILE flags ADDR2_SURFACE_FLAGS depthFlags; ///< Depth surface flags AddrSwizzleMode swizzleMode; ///< Depth surface swizzle mode UINT_32 unalignedWidth; ///< Depth surface original width (of mip0) UINT_32 unalignedHeight; ///< Depth surface original height (of mip0) UINT_32 numSlices; ///< Number of slices of depth surface (of mip0) UINT_32 numMipLevels; ///< Total mipmap levels of color surface UINT_32 firstMipIdInTail; /// Id of the first mip in tail, /// if no mip is in tail, it should be set to /// number of mip levels } ADDR2_COMPUTE_HTILE_INFO_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_HTILE_INFO_OUTPUT * * @brief * Output structure of Addr2ComputeHtileInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_HTILE_INFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch in pixels of depth buffer represented in this /// HTile buffer. This might be larger than original depth /// buffer pitch when called with an unaligned pitch. UINT_32 height; ///< Height in pixels, as above UINT_32 baseAlign; ///< Base alignment UINT_32 sliceSize; ///< Slice size, in bytes. UINT_32 htileBytes; ///< Size of HTILE buffer, in bytes UINT_32 metaBlkWidth; ///< Meta block width UINT_32 metaBlkHeight; ///< Meta block height UINT_32 metaBlkNumPerSlice; ///< Number of metablock within one slice ADDR2_META_MIP_INFO* pMipInfo; ///< HTILE mip information } ADDR2_COMPUTE_HTILE_INFO_OUTPUT; /** **************************************************************************************************** * Addr2ComputeHtileInfo * * @brief * Compute Htile pitch, height, base alignment and size in bytes **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeHtileInfo( ADDR_HANDLE hLib, const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT * * @brief * Input structure for Addr2ComputeHtileAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Index of slices UINT_32 mipId; ///< mipmap level id ADDR2_META_FLAGS hTileFlags; ///< HTILE flags ADDR2_SURFACE_FLAGS depthflags; ///< Depth surface flags AddrSwizzleMode swizzleMode; ///< Depth surface swizzle mode UINT_32 bpp; ///< Depth surface bits per pixel UINT_32 unalignedWidth; ///< Depth surface original width (of mip0) UINT_32 unalignedHeight; ///< Depth surface original height (of mip0) UINT_32 numSlices; ///< Depth surface original depth (of mip0) UINT_32 numMipLevels; ///< Depth surface total mipmap levels UINT_32 numSamples; ///< Depth surface number of samples UINT_32 pipeXor; ///< Pipe xor setting } ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for Addr2ComputeHtileAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Address in bytes } ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * Addr2ComputeHtileAddrFromCoord * * @brief * Compute Htile address according to coordinates (of depth buffer) **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeHtileAddrFromCoord( ADDR_HANDLE hLib, const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT * * @brief * Input structure for Addr2ComputeHtileCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Address ADDR2_META_FLAGS hTileFlags; ///< HTILE flags ADDR2_SURFACE_FLAGS depthFlags; ///< Depth surface flags AddrSwizzleMode swizzleMode; ///< Depth surface swizzle mode UINT_32 bpp; ///< Depth surface bits per pixel UINT_32 unalignedWidth; ///< Depth surface original width (of mip0) UINT_32 unalignedHeight; ///< Depth surface original height (of mip0) UINT_32 numSlices; ///< Depth surface original depth (of mip0) UINT_32 numMipLevels; ///< Depth surface total mipmap levels UINT_32 numSamples; ///< Depth surface number of samples UINT_32 pipeXor; ///< Pipe xor setting } ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT * * @brief * Output structure for Addr2ComputeHtileCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Index of slices UINT_32 mipId; ///< mipmap level id } ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT; /** **************************************************************************************************** * Addr2ComputeHtileCoordFromAddr * * @brief * Compute coordinates within depth buffer (1st pixel of a micro tile) according to * Htile address **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeHtileCoordFromAddr( ADDR_HANDLE hLib, const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // C-mask functions for Gfx9 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR2_COMPUTE_CMASK_INFO_INPUT * * @brief * Input structure of Addr2ComputeCmaskInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_CMASKINFO_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR2_META_FLAGS cMaskFlags; ///< CMASK flags ADDR2_SURFACE_FLAGS colorFlags; ///< Color surface flags AddrResourceType resourceType; ///< Color surface type AddrSwizzleMode swizzleMode; ///< FMask surface swizzle mode UINT_32 unalignedWidth; ///< Color surface original width UINT_32 unalignedHeight; ///< Color surface original height UINT_32 numSlices; ///< Number of slices of color buffer UINT_32 numMipLevels; ///< Number of mip levels UINT_32 firstMipIdInTail; ///< The id of first mip in tail, if no mip is in tail, /// it should be number of mip levels } ADDR2_COMPUTE_CMASK_INFO_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_CMASK_INFO_OUTPUT * * @brief * Output structure of Addr2ComputeCmaskInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_CMASK_INFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch in pixels of color buffer which /// this Cmask matches. The size might be larger than /// original color buffer pitch when called with /// an unaligned pitch. UINT_32 height; ///< Height in pixels, as above UINT_32 baseAlign; ///< Base alignment UINT_32 sliceSize; ///< Slice size, in bytes. UINT_32 cmaskBytes; ///< Size in bytes of CMask buffer UINT_32 metaBlkWidth; ///< Meta block width UINT_32 metaBlkHeight; ///< Meta block height UINT_32 metaBlkNumPerSlice; ///< Number of metablock within one slice ADDR2_META_MIP_INFO* pMipInfo; ///< CMASK mip information } ADDR2_COMPUTE_CMASK_INFO_OUTPUT; /** **************************************************************************************************** * Addr2ComputeCmaskInfo * * @brief * Compute Cmask pitch, height, base alignment and size in bytes from color buffer * info **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeCmaskInfo( ADDR_HANDLE hLib, const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT * * @brief * Input structure for Addr2ComputeCmaskAddrFromCoord * **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Index of slices ADDR2_META_FLAGS cMaskFlags; ///< CMASK flags ADDR2_SURFACE_FLAGS colorFlags; ///< Color surface flags AddrResourceType resourceType; ///< Color surface type AddrSwizzleMode swizzleMode; ///< FMask surface swizzle mode UINT_32 unalignedWidth; ///< Color surface original width (of mip0) UINT_32 unalignedHeight; ///< Color surface original height (of mip0) UINT_32 numSlices; ///< Color surface original slices (of mip0) UINT_32 numSamples; ///< Color surfae sample number UINT_32 numFrags; ///< Color surface fragment number UINT_32 pipeXor; ///< pipe Xor setting } ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for Addr2ComputeCmaskAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< CMASK address in bytes UINT_32 bitPosition; ///< Bit position within addr, 0 or 4 } ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * Addr2ComputeCmaskAddrFromCoord * * @brief * Compute Cmask address according to coordinates (of MSAA color buffer) **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeCmaskAddrFromCoord( ADDR_HANDLE hLib, const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_CMASK_COORDFROMADDR_INPUT * * @brief * Input structure for Addr2ComputeCmaskCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_CMASK_COORDFROMADDR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< CMASK address in bytes UINT_32 bitPosition; ///< Bit position within addr, 0 or 4 ADDR2_META_FLAGS cMaskFlags; ///< CMASK flags ADDR2_SURFACE_FLAGS colorFlags; ///< Color surface flags AddrResourceType resourceType; ///< Color surface type AddrSwizzleMode swizzleMode; ///< FMask surface swizzle mode UINT_32 unalignedWidth; ///< Color surface original width (of mip0) UINT_32 unalignedHeight; ///< Color surface original height (of mip0) UINT_32 numSlices; ///< Color surface original slices (of mip0) UINT_32 numMipLevels; ///< Color surface total mipmap levels. } ADDR2_COMPUTE_CMASK_COORDFROMADDR_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_CMASK_COORDFROMADDR_OUTPUT * * @brief * Output structure for Addr2ComputeCmaskCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_CMASK_COORDFROMADDR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Index of slices UINT_32 mipId; ///< mipmap level id } ADDR2_COMPUTE_CMASK_COORDFROMADDR_OUTPUT; /** **************************************************************************************************** * Addr2ComputeCmaskCoordFromAddr * * @brief * Compute coordinates within color buffer (1st pixel of a micro tile) according to * Cmask address **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeCmaskCoordFromAddr( ADDR_HANDLE hLib, const ADDR2_COMPUTE_CMASK_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_CMASK_COORDFROMADDR_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // F-mask functions for Gfx9 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR2_FMASK_FLAGS * * @brief * FMASK flags **************************************************************************************************** */ typedef union _ADDR2_FMASK_FLAGS { struct { UINT_32 resolved : 1; ///< TRUE if this is a resolved fmask, used by H/W clients /// by H/W clients. S/W should always set it to FALSE. UINT_32 reserved : 31; ///< Reserved for future use. }; UINT_32 value; } ADDR2_FMASK_FLAGS; /** **************************************************************************************************** * ADDR2_COMPUTE_FMASK_INFO_INPUT * * @brief * Input structure for Addr2ComputeFmaskInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_FMASK_INFO_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrSwizzleMode swizzleMode; ///< FMask surface swizzle mode UINT_32 unalignedWidth; ///< Color surface original width UINT_32 unalignedHeight; ///< Color surface original height UINT_32 numSlices; ///< Number of slices/depth UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA ADDR2_FMASK_FLAGS fMaskFlags; ///< FMASK flags } ADDR2_COMPUTE_FMASK_INFO_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_FMASK_INFO_OUTPUT * * @brief * Output structure for Addr2ComputeFmaskInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_FMASK_INFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pitch; ///< Pitch of fmask in pixels UINT_32 height; ///< Height of fmask in pixels UINT_32 baseAlign; ///< Base alignment UINT_32 numSlices; ///< Slices of fmask UINT_32 fmaskBytes; ///< Size of fmask in bytes UINT_32 bpp; ///< Bits per pixel of FMASK is: number of bit planes UINT_32 numSamples; ///< Number of samples UINT_32 sliceSize; ///< Size of slice in bytes } ADDR2_COMPUTE_FMASK_INFO_OUTPUT; /** **************************************************************************************************** * Addr2ComputeFmaskInfo * * @brief * Compute Fmask pitch/height/slices/alignments and size in bytes **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeFmaskInfo( ADDR_HANDLE hLib, const ADDR2_COMPUTE_FMASK_INFO_INPUT* pIn, ADDR2_COMPUTE_FMASK_INFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_INPUT * * @brief * Input structure for Addr2ComputeFmaskAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrSwizzleMode swizzleMode; ///< FMask surface swizzle mode UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Slice index UINT_32 sample; ///< Sample index (fragment index for EQAA) UINT_32 plane; ///< Plane number UINT_32 unalignedWidth; ///< Color surface original width UINT_32 unalignedHeight; ///< Color surface original height UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA UINT_32 tileSwizzle; ///< Combined swizzle used to do bank/pipe rotation ADDR2_FMASK_FLAGS fMaskFlags; ///< FMASK flags } ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for Addr2ComputeFmaskAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Fmask address UINT_32 bitPosition; ///< Bit position within fmaskAddr, 0-7. } ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * Addr2ComputeFmaskAddrFromCoord * * @brief * Compute Fmask address according to coordinates (x,y,slice,sample,plane) **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeFmaskAddrFromCoord( ADDR_HANDLE hLib, const ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_FMASK_COORDFROMADDR_INPUT * * @brief * Input structure for Addr2ComputeFmaskCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_FMASK_COORDFROMADDR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< Address UINT_32 bitPosition; ///< Bit position within addr, 0-7. AddrSwizzleMode swizzleMode; ///< FMask surface swizzle mode UINT_32 unalignedWidth; ///< Color surface original width UINT_32 unalignedHeight; ///< Color surface original height UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments UINT_32 tileSwizzle; ///< Combined swizzle used to do bank/pipe rotation ADDR2_FMASK_FLAGS fMaskFlags; ///< FMASK flags } ADDR2_COMPUTE_FMASK_COORDFROMADDR_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_FMASK_COORDFROMADDR_OUTPUT * * @brief * Output structure for Addr2ComputeFmaskCoordFromAddr **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_FMASK_COORDFROMADDR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Slice index UINT_32 sample; ///< Sample index (fragment index for EQAA) UINT_32 plane; ///< Plane number } ADDR2_COMPUTE_FMASK_COORDFROMADDR_OUTPUT; /** **************************************************************************************************** * Addr2ComputeFmaskCoordFromAddr * * @brief * Compute FMASK coordinate from an given address **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeFmaskCoordFromAddr( ADDR_HANDLE hLib, const ADDR2_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // DCC key functions for Gfx9 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * _ADDR2_COMPUTE_DCCINFO_INPUT * * @brief * Input structure of Addr2ComputeDccInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_DCCINFO_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR2_META_FLAGS dccKeyFlags; ///< DCC key flags ADDR2_SURFACE_FLAGS colorFlags; ///< Color surface flags AddrResourceType resourceType; ///< Color surface type AddrSwizzleMode swizzleMode; ///< Color surface swizzle mode UINT_32 bpp; ///< bits per pixel UINT_32 unalignedWidth; ///< Color surface original width (of mip0) UINT_32 unalignedHeight; ///< Color surface original height (of mip0) UINT_32 numSlices; ///< Number of slices, of color surface (of mip0) UINT_32 numFrags; ///< Fragment number of color surface UINT_32 numMipLevels; ///< Total mipmap levels of color surface UINT_32 dataSurfaceSize; ///< The padded size of all slices and mip levels ///< useful in meta linear case UINT_32 firstMipIdInTail; ///< The id of first mip in tail, if no mip is in tail, /// it should be number of mip levels } ADDR2_COMPUTE_DCCINFO_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_DCCINFO_OUTPUT * * @brief * Output structure of Addr2ComputeDccInfo **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_DCCINFO_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 dccRamBaseAlign; ///< Base alignment of dcc key UINT_32 dccRamSize; ///< Size of dcc key UINT_32 pitch; ///< DCC surface mip chain pitch UINT_32 height; ///< DCC surface mip chain height UINT_32 depth; ///< DCC surface mip chain depth UINT_32 compressBlkWidth; ///< DCC compress block width UINT_32 compressBlkHeight; ///< DCC compress block height UINT_32 compressBlkDepth; ///< DCC compress block depth UINT_32 metaBlkWidth; ///< DCC meta block width UINT_32 metaBlkHeight; ///< DCC meta block height UINT_32 metaBlkDepth; ///< DCC meta block depth UINT_32 metaBlkNumPerSlice; ///< Number of metablock within one slice union { UINT_32 fastClearSizePerSlice; ///< Size of DCC within a slice should be fast cleared UINT_32 dccRamSliceSize; ///< DCC ram size per slice. For mipmap, it's /// the slize size of a mip chain, the thickness of a /// a slice is meta block depth }; ADDR2_META_MIP_INFO* pMipInfo; ///< DCC mip information } ADDR2_COMPUTE_DCCINFO_OUTPUT; /** **************************************************************************************************** * Addr2ComputeDccInfo * * @brief * Compute DCC key size, base alignment * info **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeDccInfo( ADDR_HANDLE hLib, const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT * * @brief * Input structure for Addr2ComputeDccAddrFromCoord * **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 x; ///< X coordinate UINT_32 y; ///< Y coordinate UINT_32 slice; ///< Index of slices UINT_32 sample; ///< Index of samples, means fragment index for EQAA UINT_32 mipId; ///< mipmap level id ADDR2_META_FLAGS dccKeyFlags; ///< DCC flags AddrResourceType resourceType; ///< Color surface type AddrSwizzleMode swizzleMode; ///< Color surface swizzle mode UINT_32 bpp; ///< Color surface bits per pixel UINT_32 numSlices; ///< Color surface original slices (of mip0) UINT_32 numMipLevels; ///< Color surface mipmap levels UINT_32 numFrags; ///< Color surface fragment number UINT_32 pipeXor; ///< pipe Xor setting UINT_32 pitch; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::pitch UINT_32 height; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::height UINT_32 compressBlkWidth; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::compressBlkWidth UINT_32 compressBlkHeight; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::compressBlkHeight UINT_32 compressBlkDepth; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::compressBlkDepth UINT_32 metaBlkWidth; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::metaBlkWidth UINT_32 metaBlkHeight; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::metaBlkHeight UINT_32 metaBlkDepth; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::metaBlkDepth UINT_32 dccRamSliceSize; ///< ADDR2_COMPUTE_DCC_INFO_OUTPUT::dccRamSliceSize } ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT * * @brief * Output structure for Addr2ComputeDccAddrFromCoord **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 addr; ///< DCC address in bytes } ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT; /** **************************************************************************************************** * Addr2ComputeDccAddrFromCoord * * @brief * Compute DCC address according to coordinates (of MSAA color buffer) **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeDccAddrFromCoord( ADDR_HANDLE hLib, const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut); //////////////////////////////////////////////////////////////////////////////////////////////////// // Misc functions for Gfx9 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * ADDR2_COMPUTE_PIPEBANKXOR_INPUT * * @brief * Input structure of Addr2ComputePipebankXor **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_PIPEBANKXOR_INPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 surfIndex; ///< Input surface index ADDR2_SURFACE_FLAGS flags; ///< Surface flag AddrSwizzleMode swizzleMode; ///< Surface swizzle mode AddrResourceType resourceType; ///< Surface resource type AddrFormat format; ///< Surface format UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA } ADDR2_COMPUTE_PIPEBANKXOR_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT * * @brief * Output structure of Addr2ComputePipebankXor **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pipeBankXor; ///< Pipe bank xor } ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT; /** **************************************************************************************************** * Addr2ComputePipeBankXor * * @brief * Calculate a valid bank pipe xor value for client to use. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputePipeBankXor( ADDR_HANDLE hLib, const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT * * @brief * Input structure of Addr2ComputeSlicePipeBankXor **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrSwizzleMode swizzleMode; ///< Surface swizzle mode AddrResourceType resourceType; ///< Surface resource type UINT_32 basePipeBankXor; ///< Base pipe bank xor UINT_32 slice; ///< Slice id UINT_32 numSamples; ///< Number of samples } ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT * * @brief * Output structure of Addr2ComputeSlicePipeBankXor **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_32 pipeBankXor; ///< Pipe bank xor } ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT; /** **************************************************************************************************** * Addr2ComputeSlicePipeBankXor * * @brief * Calculate slice pipe bank xor value based on base pipe bank xor and slice id. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSlicePipeBankXor( ADDR_HANDLE hLib, const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT * * @brief * Input structure of Addr2ComputeSubResourceOffsetForSwizzlePattern **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT { UINT_32 size; ///< Size of this structure in bytes AddrSwizzleMode swizzleMode; ///< Surface swizzle mode AddrResourceType resourceType; ///< Surface resource type UINT_32 pipeBankXor; ///< Per resource xor UINT_32 slice; ///< Slice id UINT_64 sliceSize; ///< Slice size of a mip chain UINT_64 macroBlockOffset; ///< Macro block offset, returned in ADDR2_MIP_INFO UINT_32 mipTailOffset; ///< Mip tail offset, returned in ADDR2_MIP_INFO } ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT; /** **************************************************************************************************** * ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT * * @brief * Output structure of Addr2ComputeSubResourceOffsetForSwizzlePattern **************************************************************************************************** */ typedef struct _ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT { UINT_32 size; ///< Size of this structure in bytes UINT_64 offset; ///< offset } ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT; /** **************************************************************************************************** * Addr2ComputeSubResourceOffsetForSwizzlePattern * * @brief * Calculate sub resource offset to support swizzle pattern. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSubResourceOffsetForSwizzlePattern( ADDR_HANDLE hLib, const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut); /** **************************************************************************************************** * ADDR2_BLOCK_SET * * @brief * Bit field that defines block type **************************************************************************************************** */ typedef union _ADDR2_BLOCK_SET { struct { UINT_32 micro : 1; // 256B block for 2D resource UINT_32 macroThin4KB : 1; // Thin 4KB for 2D/3D resource UINT_32 macroThick4KB : 1; // Thick 4KB for 3D resource UINT_32 macroThin64KB : 1; // Thin 64KB for 2D/3D resource UINT_32 macroThick64KB : 1; // Thick 64KB for 3D resource UINT_32 var : 1; // VAR block UINT_32 linear : 1; // Linear block UINT_32 reserved : 25; }; UINT_32 value; } ADDR2_BLOCK_SET; /** **************************************************************************************************** * ADDR2_SWTYPE_SET * * @brief * Bit field that defines swizzle type **************************************************************************************************** */ typedef union _ADDR2_SWTYPE_SET { struct { UINT_32 sw_Z : 1; // SW_*_Z_* UINT_32 sw_S : 1; // SW_*_S_* UINT_32 sw_D : 1; // SW_*_D_* UINT_32 sw_R : 1; // SW_*_R_* UINT_32 reserved : 28; }; UINT_32 value; } ADDR2_SWTYPE_SET; /** **************************************************************************************************** * ADDR2_SWMODE_SET * * @brief * Bit field that defines swizzle type **************************************************************************************************** */ typedef union _ADDR2_SWMODE_SET { struct { UINT_32 swLinear : 1; UINT_32 sw256B_S : 1; UINT_32 sw256B_D : 1; UINT_32 sw256B_R : 1; UINT_32 sw4KB_Z : 1; UINT_32 sw4KB_S : 1; UINT_32 sw4KB_D : 1; UINT_32 sw4KB_R : 1; UINT_32 sw64KB_Z : 1; UINT_32 sw64KB_S : 1; UINT_32 sw64KB_D : 1; UINT_32 sw64KB_R : 1; UINT_32 swReserved0 : 1; UINT_32 swReserved1 : 1; UINT_32 swReserved2 : 1; UINT_32 swReserved3 : 1; UINT_32 sw64KB_Z_T : 1; UINT_32 sw64KB_S_T : 1; UINT_32 sw64KB_D_T : 1; UINT_32 sw64KB_R_T : 1; UINT_32 sw4KB_Z_X : 1; UINT_32 sw4KB_S_X : 1; UINT_32 sw4KB_D_X : 1; UINT_32 sw4KB_R_X : 1; UINT_32 sw64KB_Z_X : 1; UINT_32 sw64KB_S_X : 1; UINT_32 sw64KB_D_X : 1; UINT_32 sw64KB_R_X : 1; UINT_32 swVar_Z_X : 1; UINT_32 swReserved4 : 1; UINT_32 swReserved5 : 1; UINT_32 swVar_R_X : 1; }; UINT_32 value; } ADDR2_SWMODE_SET; /** **************************************************************************************************** * ADDR2_GET_PREFERRED_SURF_SETTING_INPUT * * @brief * Input structure of Addr2GetPreferredSurfaceSetting **************************************************************************************************** */ typedef struct _ADDR2_GET_PREFERRED_SURF_SETTING_INPUT { UINT_32 size; ///< Size of this structure in bytes ADDR2_SURFACE_FLAGS flags; ///< Surface flags AddrResourceType resourceType; ///< Surface type AddrFormat format; ///< Surface format AddrResrouceLocation resourceLoction; ///< Surface heap choice ADDR2_BLOCK_SET forbiddenBlock; ///< Client can use it to disable some block setting ///< such as linear for DXTn, tiled for YUV ADDR2_SWTYPE_SET preferredSwSet; ///< Client can use it to specify sw type(s) wanted BOOL_32 noXor; ///< Do not use xor mode for this resource UINT_32 bpp; ///< bits per pixel UINT_32 width; ///< Width (of mip0), in pixels UINT_32 height; ///< Height (of mip0), in pixels UINT_32 numSlices; ///< Number surface slice/depth (of mip0), UINT_32 numMipLevels; ///< Total mipmap levels. UINT_32 numSamples; ///< Number of samples UINT_32 numFrags; ///< Number of fragments, leave it zero or the same as /// number of samples for normal AA; Set it to the /// number of fragments for EQAA UINT_32 maxAlign; ///< maximum base/size alignment requested by client UINT_32 minSizeAlign; ///< memory allocated for surface in client driver will /// be padded to multiple of this value (in bytes) } ADDR2_GET_PREFERRED_SURF_SETTING_INPUT; /** **************************************************************************************************** * ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT * * @brief * Output structure of Addr2GetPreferredSurfaceSetting **************************************************************************************************** */ typedef struct _ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT { UINT_32 size; ///< Size of this structure in bytes AddrSwizzleMode swizzleMode; ///< Suggested swizzle mode to be used AddrResourceType resourceType; ///< Suggested resource type to program HW ADDR2_BLOCK_SET validBlockSet; ///< Valid block type bit conbination BOOL_32 canXor; ///< If client can use xor on a valid macro block /// type ADDR2_SWTYPE_SET validSwTypeSet; ///< Valid swizzle type bit combination ADDR2_SWTYPE_SET clientPreferredSwSet; ///< Client-preferred swizzle type bit combination ADDR2_SWMODE_SET validSwModeSet; ///< Valid swizzle mode bit combination } ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT; /** **************************************************************************************************** * Addr2GetPreferredSurfaceSetting * * @brief * Suggest a preferred setting for client driver to program HW register **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2GetPreferredSurfaceSetting( ADDR_HANDLE hLib, const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut); /** **************************************************************************************************** * Addr2IsValidDisplaySwizzleMode * * @brief * Return whether the swizzle mode is supported by DCE / DCN. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2IsValidDisplaySwizzleMode( ADDR_HANDLE hLib, AddrSwizzleMode swizzleMode, UINT_32 bpp, bool *result); } // rocr #endif // __ADDR_INTERFACE_H__ ROCR-Runtime-rocm-5.0.0/src/image/addrlib/inc/addrtypes.h000066400000000000000000000722321420110115200230240ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrtypes.h * @brief Contains the helper function and constants **************************************************************************************************** */ #ifndef __ADDR_TYPES_H__ #define __ADDR_TYPES_H__ #if defined(__APPLE__) && !defined(HAVE_TSERVER) // External definitions header maintained by Apple driver team, but not for diag team under Mac. // Helps address compilation issues & reduces code covered by NDA #include "addrExtDef.h" #else // Windows and/or Linux #if !defined(VOID) typedef void VOID; #endif #if !defined(FLOAT) typedef float FLOAT; #endif #if !defined(CHAR) typedef char CHAR; #endif #if !defined(INT) typedef int INT; #endif #include // va_list...etc need this header #endif // defined (__APPLE__) && !defined(HAVE_TSERVER) /** **************************************************************************************************** * Calling conventions **************************************************************************************************** */ #ifndef ADDR_CDECL #if defined(__GNUC__) #define ADDR_CDECL __attribute__((cdecl)) #else #define ADDR_CDECL __cdecl #endif #endif #ifndef ADDR_STDCALL #if defined(__GNUC__) #if defined(__amd64__) || defined(__x86_64__) #define ADDR_STDCALL #else #define ADDR_STDCALL __attribute__((stdcall)) #endif #else #define ADDR_STDCALL __stdcall #endif #endif #ifndef ADDR_FASTCALL #if defined(__GNUC__) #define ADDR_FASTCALL __attribute__((regparm(0))) #else #define ADDR_FASTCALL __fastcall #endif #endif #ifndef GC_CDECL #define GC_CDECL ADDR_CDECL #endif #ifndef GC_STDCALL #define GC_STDCALL ADDR_STDCALL #endif #ifndef GC_FASTCALL #define GC_FASTCALL ADDR_FASTCALL #endif #if defined(__GNUC__) #define ADDR_INLINE static inline // inline needs to be static to link #else // win32, win64, other platforms #define ADDR_INLINE __inline #endif // #if defined(__GNUC__) #if defined(__amd64__) || defined(__x86_64__) || defined(__i386__) #define ADDR_API ADDR_FASTCALL // default call convention is fast call #else #define ADDR_API #endif /** **************************************************************************************************** * Global defines used by other modules **************************************************************************************************** */ #if !defined(TILEINDEX_INVALID) #define TILEINDEX_INVALID -1 #endif #if !defined(TILEINDEX_LINEAR_GENERAL) #define TILEINDEX_LINEAR_GENERAL -2 #endif #if !defined(TILEINDEX_LINEAR_ALIGNED) #define TILEINDEX_LINEAR_ALIGNED 8 #endif /** **************************************************************************************************** * Return codes **************************************************************************************************** */ typedef enum _ADDR_E_RETURNCODE { // General Return ADDR_OK = 0, ADDR_ERROR = 1, // Specific Errors ADDR_OUTOFMEMORY, ADDR_INVALIDPARAMS, ADDR_NOTSUPPORTED, ADDR_NOTIMPLEMENTED, ADDR_PARAMSIZEMISMATCH, ADDR_INVALIDGBREGVALUES, } ADDR_E_RETURNCODE; /** **************************************************************************************************** * @brief * Neutral enums that define tile modes for all H/W * @note * R600/R800 tiling mode can be cast to hw enums directly but never cast into HW enum from * ADDR_TM_2D_TILED_XTHICK * **************************************************************************************************** */ typedef enum _AddrTileMode { ADDR_TM_LINEAR_GENERAL = 0, ///< Least restrictions, pitch: multiple of 8 if not buffer ADDR_TM_LINEAR_ALIGNED = 1, ///< Requests pitch or slice to be multiple of 64 pixels ADDR_TM_1D_TILED_THIN1 = 2, ///< Linear array of 8x8 tiles ADDR_TM_1D_TILED_THICK = 3, ///< Linear array of 8x8x4 tiles ADDR_TM_2D_TILED_THIN1 = 4, ///< A set of macro tiles consist of 8x8 tiles ADDR_TM_2D_TILED_THIN2 = 5, ///< 600 HWL only, macro tile ratio is 1:4 ADDR_TM_2D_TILED_THIN4 = 6, ///< 600 HWL only, macro tile ratio is 1:16 ADDR_TM_2D_TILED_THICK = 7, ///< A set of macro tiles consist of 8x8x4 tiles ADDR_TM_2B_TILED_THIN1 = 8, ///< 600 HWL only, with bank swap ADDR_TM_2B_TILED_THIN2 = 9, ///< 600 HWL only, with bank swap and ratio is 1:4 ADDR_TM_2B_TILED_THIN4 = 10, ///< 600 HWL only, with bank swap and ratio is 1:16 ADDR_TM_2B_TILED_THICK = 11, ///< 600 HWL only, with bank swap, consists of 8x8x4 tiles ADDR_TM_3D_TILED_THIN1 = 12, ///< Macro tiling w/ pipe rotation between slices ADDR_TM_3D_TILED_THICK = 13, ///< Macro tiling w/ pipe rotation bwtween slices, thick ADDR_TM_3B_TILED_THIN1 = 14, ///< 600 HWL only, with bank swap ADDR_TM_3B_TILED_THICK = 15, ///< 600 HWL only, with bank swap, thick ADDR_TM_2D_TILED_XTHICK = 16, ///< Tile is 8x8x8, valid from NI ADDR_TM_3D_TILED_XTHICK = 17, ///< Tile is 8x8x8, valid from NI ADDR_TM_POWER_SAVE = 18, ///< Power save mode, only used by KMD on NI ADDR_TM_PRT_TILED_THIN1 = 19, ///< No bank/pipe rotation or hashing beyond macrotile size ADDR_TM_PRT_2D_TILED_THIN1 = 20, ///< Same as 2D_TILED_THIN1, PRT only ADDR_TM_PRT_3D_TILED_THIN1 = 21, ///< Same as 3D_TILED_THIN1, PRT only ADDR_TM_PRT_TILED_THICK = 22, ///< No bank/pipe rotation or hashing beyond macrotile size ADDR_TM_PRT_2D_TILED_THICK = 23, ///< Same as 2D_TILED_THICK, PRT only ADDR_TM_PRT_3D_TILED_THICK = 24, ///< Same as 3D_TILED_THICK, PRT only ADDR_TM_UNKNOWN = 25, ///< Unkown tile mode, should be decided by address lib ADDR_TM_COUNT = 26, ///< Must be the value of the last tile mode } AddrTileMode; /** **************************************************************************************************** * @brief * Neutral enums that define swizzle modes for Gfx9+ ASIC * @note * * ADDR_SW_LINEAR linear aligned addressing mode, for 1D/2D/3D resource * ADDR_SW_256B_* addressing block aligned size is 256B, for 2D/3D resource * ADDR_SW_4KB_* addressing block aligned size is 4KB, for 2D/3D resource * ADDR_SW_64KB_* addressing block aligned size is 64KB, for 2D/3D resource * * ADDR_SW_*_Z For GFX9: - for 2D resource, represents Z-order swizzle mode for depth/stencil/FMask - for 3D resource, represents a swizzle mode similar to legacy thick tile mode For GFX10: - represents Z-order swizzle mode for depth/stencil/FMask * ADDR_SW_*_S For GFX9+: - represents standard swizzle mode defined by MS * ADDR_SW_*_D For GFX9: - for 2D resource, represents a swizzle mode for displayable resource * - for 3D resource, represents a swizzle mode which places each slice in order & pixel For GFX10: - for 2D resource, represents a swizzle mode for displayable resource - for 3D resource, represents a swizzle mode similar to legacy thick tile mode within slice is placed as 2D ADDR_SW_*_S. Don't use this combination if possible! * ADDR_SW_*_R For GFX9: - 2D resource only, represents a swizzle mode for rotated displayable resource For GFX10: - represents a swizzle mode for render target resource * **************************************************************************************************** */ typedef enum _AddrSwizzleMode { ADDR_SW_LINEAR = 0, ADDR_SW_256B_S = 1, ADDR_SW_256B_D = 2, ADDR_SW_256B_R = 3, ADDR_SW_4KB_Z = 4, ADDR_SW_4KB_S = 5, ADDR_SW_4KB_D = 6, ADDR_SW_4KB_R = 7, ADDR_SW_64KB_Z = 8, ADDR_SW_64KB_S = 9, ADDR_SW_64KB_D = 10, ADDR_SW_64KB_R = 11, ADDR_SW_RESERVED0 = 12, ADDR_SW_RESERVED1 = 13, ADDR_SW_RESERVED2 = 14, ADDR_SW_RESERVED3 = 15, ADDR_SW_64KB_Z_T = 16, ADDR_SW_64KB_S_T = 17, ADDR_SW_64KB_D_T = 18, ADDR_SW_64KB_R_T = 19, ADDR_SW_4KB_Z_X = 20, ADDR_SW_4KB_S_X = 21, ADDR_SW_4KB_D_X = 22, ADDR_SW_4KB_R_X = 23, ADDR_SW_64KB_Z_X = 24, ADDR_SW_64KB_S_X = 25, ADDR_SW_64KB_D_X = 26, ADDR_SW_64KB_R_X = 27, ADDR_SW_VAR_Z_X = 28, ADDR_SW_RESERVED4 = 29, ADDR_SW_RESERVED5 = 30, ADDR_SW_VAR_R_X = 31, ADDR_SW_LINEAR_GENERAL = 32, ADDR_SW_MAX_TYPE = 33, } AddrSwizzleMode; /** **************************************************************************************************** * @brief * Neutral enums that define image type * @note * this is new for address library interface version 2 * **************************************************************************************************** */ typedef enum _AddrResourceType { ADDR_RSRC_TEX_1D = 0, ADDR_RSRC_TEX_2D = 1, ADDR_RSRC_TEX_3D = 2, ADDR_RSRC_MAX_TYPE = 3, } AddrResourceType; /** **************************************************************************************************** * @brief * Neutral enums that define resource heap location * @note * this is new for address library interface version 2 * **************************************************************************************************** */ typedef enum _AddrResrouceLocation { ADDR_RSRC_LOC_UNDEF = 0, // Resource heap is undefined/unknown ADDR_RSRC_LOC_LOCAL = 1, // CPU visable and CPU invisable local heap ADDR_RSRC_LOC_USWC = 2, // CPU write-combined non-cached nonlocal heap ADDR_RSRC_LOC_CACHED = 3, // CPU cached nonlocal heap ADDR_RSRC_LOC_INVIS = 4, // CPU invisable local heap only ADDR_RSRC_LOC_MAX_TYPE = 5, } AddrResrouceLocation; /** **************************************************************************************************** * @brief * Neutral enums that define resource basic swizzle mode * @note * this is new for address library interface version 2 * **************************************************************************************************** */ typedef enum _AddrSwType { ADDR_SW_Z = 0, // Resource basic swizzle mode is ZOrder ADDR_SW_S = 1, // Resource basic swizzle mode is Standard ADDR_SW_D = 2, // Resource basic swizzle mode is Display ADDR_SW_R = 3, // Resource basic swizzle mode is Rotated/Render optimized ADDR_SW_L = 4, // Resource basic swizzle mode is Linear ADDR_SW_MAX_SWTYPE } AddrSwType; /** **************************************************************************************************** * @brief * Neutral enums that define mipmap major mode * @note * this is new for address library interface version 2 * **************************************************************************************************** */ typedef enum _AddrMajorMode { ADDR_MAJOR_X = 0, ADDR_MAJOR_Y = 1, ADDR_MAJOR_Z = 2, ADDR_MAJOR_MAX_TYPE = 3, } AddrMajorMode; /** **************************************************************************************************** * AddrFormat * * @brief * Neutral enum for SurfaceFormat * **************************************************************************************************** */ typedef enum _AddrFormat { ADDR_FMT_INVALID = 0x00000000, ADDR_FMT_8 = 0x00000001, ADDR_FMT_4_4 = 0x00000002, ADDR_FMT_3_3_2 = 0x00000003, ADDR_FMT_RESERVED_4 = 0x00000004, ADDR_FMT_16 = 0x00000005, ADDR_FMT_16_FLOAT = ADDR_FMT_16, ADDR_FMT_8_8 = 0x00000007, ADDR_FMT_5_6_5 = 0x00000008, ADDR_FMT_6_5_5 = 0x00000009, ADDR_FMT_1_5_5_5 = 0x0000000a, ADDR_FMT_4_4_4_4 = 0x0000000b, ADDR_FMT_5_5_5_1 = 0x0000000c, ADDR_FMT_32 = 0x0000000d, ADDR_FMT_32_FLOAT = ADDR_FMT_32, ADDR_FMT_16_16 = 0x0000000f, ADDR_FMT_16_16_FLOAT = ADDR_FMT_16_16, ADDR_FMT_8_24 = 0x00000011, ADDR_FMT_8_24_FLOAT = ADDR_FMT_8_24, ADDR_FMT_24_8 = 0x00000013, ADDR_FMT_24_8_FLOAT = ADDR_FMT_24_8, ADDR_FMT_10_11_11 = 0x00000015, ADDR_FMT_10_11_11_FLOAT = ADDR_FMT_10_11_11, ADDR_FMT_11_11_10 = 0x00000017, ADDR_FMT_11_11_10_FLOAT = ADDR_FMT_11_11_10, ADDR_FMT_2_10_10_10 = 0x00000019, ADDR_FMT_8_8_8_8 = 0x0000001a, ADDR_FMT_10_10_10_2 = 0x0000001b, ADDR_FMT_X24_8_32_FLOAT = 0x0000001c, ADDR_FMT_32_32 = 0x0000001d, ADDR_FMT_32_32_FLOAT = ADDR_FMT_32_32, ADDR_FMT_16_16_16_16 = 0x0000001f, ADDR_FMT_16_16_16_16_FLOAT = ADDR_FMT_16_16_16_16, ADDR_FMT_RESERVED_33 = 0x00000021, ADDR_FMT_32_32_32_32 = 0x00000022, ADDR_FMT_32_32_32_32_FLOAT = ADDR_FMT_32_32_32_32, ADDR_FMT_RESERVED_36 = 0x00000024, ADDR_FMT_1 = 0x00000025, ADDR_FMT_1_REVERSED = 0x00000026, ADDR_FMT_GB_GR = 0x00000027, ADDR_FMT_BG_RG = 0x00000028, ADDR_FMT_32_AS_8 = 0x00000029, ADDR_FMT_32_AS_8_8 = 0x0000002a, ADDR_FMT_5_9_9_9_SHAREDEXP = 0x0000002b, ADDR_FMT_8_8_8 = 0x0000002c, ADDR_FMT_16_16_16 = 0x0000002d, ADDR_FMT_16_16_16_FLOAT = ADDR_FMT_16_16_16, ADDR_FMT_32_32_32 = 0x0000002f, ADDR_FMT_32_32_32_FLOAT = ADDR_FMT_32_32_32, ADDR_FMT_BC1 = 0x00000031, ADDR_FMT_BC2 = 0x00000032, ADDR_FMT_BC3 = 0x00000033, ADDR_FMT_BC4 = 0x00000034, ADDR_FMT_BC5 = 0x00000035, ADDR_FMT_BC6 = 0x00000036, ADDR_FMT_BC7 = 0x00000037, ADDR_FMT_32_AS_32_32_32_32 = 0x00000038, ADDR_FMT_APC3 = 0x00000039, ADDR_FMT_APC4 = 0x0000003a, ADDR_FMT_APC5 = 0x0000003b, ADDR_FMT_APC6 = 0x0000003c, ADDR_FMT_APC7 = 0x0000003d, ADDR_FMT_CTX1 = 0x0000003e, ADDR_FMT_RESERVED_63 = 0x0000003f, ADDR_FMT_ASTC_4x4 = 0x00000040, ADDR_FMT_ASTC_5x4 = 0x00000041, ADDR_FMT_ASTC_5x5 = 0x00000042, ADDR_FMT_ASTC_6x5 = 0x00000043, ADDR_FMT_ASTC_6x6 = 0x00000044, ADDR_FMT_ASTC_8x5 = 0x00000045, ADDR_FMT_ASTC_8x6 = 0x00000046, ADDR_FMT_ASTC_8x8 = 0x00000047, ADDR_FMT_ASTC_10x5 = 0x00000048, ADDR_FMT_ASTC_10x6 = 0x00000049, ADDR_FMT_ASTC_10x8 = 0x0000004a, ADDR_FMT_ASTC_10x10 = 0x0000004b, ADDR_FMT_ASTC_12x10 = 0x0000004c, ADDR_FMT_ASTC_12x12 = 0x0000004d, ADDR_FMT_ETC2_64BPP = 0x0000004e, ADDR_FMT_ETC2_128BPP = 0x0000004f, } AddrFormat; /** **************************************************************************************************** * AddrDepthFormat * * @brief * Neutral enum for addrFlt32ToDepthPixel * **************************************************************************************************** */ typedef enum _AddrDepthFormat { ADDR_DEPTH_INVALID = 0x00000000, ADDR_DEPTH_16 = 0x00000001, ADDR_DEPTH_X8_24 = 0x00000002, ADDR_DEPTH_8_24 = 0x00000003, ADDR_DEPTH_X8_24_FLOAT = 0x00000004, ADDR_DEPTH_8_24_FLOAT = 0x00000005, ADDR_DEPTH_32_FLOAT = 0x00000006, ADDR_DEPTH_X24_8_32_FLOAT = 0x00000007, } AddrDepthFormat; /** **************************************************************************************************** * AddrColorFormat * * @brief * Neutral enum for ColorFormat * **************************************************************************************************** */ typedef enum _AddrColorFormat { ADDR_COLOR_INVALID = 0x00000000, ADDR_COLOR_8 = 0x00000001, ADDR_COLOR_4_4 = 0x00000002, ADDR_COLOR_3_3_2 = 0x00000003, ADDR_COLOR_RESERVED_4 = 0x00000004, ADDR_COLOR_16 = 0x00000005, ADDR_COLOR_16_FLOAT = 0x00000006, ADDR_COLOR_8_8 = 0x00000007, ADDR_COLOR_5_6_5 = 0x00000008, ADDR_COLOR_6_5_5 = 0x00000009, ADDR_COLOR_1_5_5_5 = 0x0000000a, ADDR_COLOR_4_4_4_4 = 0x0000000b, ADDR_COLOR_5_5_5_1 = 0x0000000c, ADDR_COLOR_32 = 0x0000000d, ADDR_COLOR_32_FLOAT = 0x0000000e, ADDR_COLOR_16_16 = 0x0000000f, ADDR_COLOR_16_16_FLOAT = 0x00000010, ADDR_COLOR_8_24 = 0x00000011, ADDR_COLOR_8_24_FLOAT = 0x00000012, ADDR_COLOR_24_8 = 0x00000013, ADDR_COLOR_24_8_FLOAT = 0x00000014, ADDR_COLOR_10_11_11 = 0x00000015, ADDR_COLOR_10_11_11_FLOAT = 0x00000016, ADDR_COLOR_11_11_10 = 0x00000017, ADDR_COLOR_11_11_10_FLOAT = 0x00000018, ADDR_COLOR_2_10_10_10 = 0x00000019, ADDR_COLOR_8_8_8_8 = 0x0000001a, ADDR_COLOR_10_10_10_2 = 0x0000001b, ADDR_COLOR_X24_8_32_FLOAT = 0x0000001c, ADDR_COLOR_32_32 = 0x0000001d, ADDR_COLOR_32_32_FLOAT = 0x0000001e, ADDR_COLOR_16_16_16_16 = 0x0000001f, ADDR_COLOR_16_16_16_16_FLOAT = 0x00000020, ADDR_COLOR_RESERVED_33 = 0x00000021, ADDR_COLOR_32_32_32_32 = 0x00000022, ADDR_COLOR_32_32_32_32_FLOAT = 0x00000023, } AddrColorFormat; /** **************************************************************************************************** * AddrSurfaceNumber * * @brief * Neutral enum for SurfaceNumber * **************************************************************************************************** */ typedef enum _AddrSurfaceNumber { ADDR_NUMBER_UNORM = 0x00000000, ADDR_NUMBER_SNORM = 0x00000001, ADDR_NUMBER_USCALED = 0x00000002, ADDR_NUMBER_SSCALED = 0x00000003, ADDR_NUMBER_UINT = 0x00000004, ADDR_NUMBER_SINT = 0x00000005, ADDR_NUMBER_SRGB = 0x00000006, ADDR_NUMBER_FLOAT = 0x00000007, } AddrSurfaceNumber; /** **************************************************************************************************** * AddrSurfaceSwap * * @brief * Neutral enum for SurfaceSwap * **************************************************************************************************** */ typedef enum _AddrSurfaceSwap { ADDR_SWAP_STD = 0x00000000, ADDR_SWAP_ALT = 0x00000001, ADDR_SWAP_STD_REV = 0x00000002, ADDR_SWAP_ALT_REV = 0x00000003, } AddrSurfaceSwap; /** **************************************************************************************************** * AddrHtileBlockSize * * @brief * Size of HTILE blocks, valid values are 4 or 8 for now **************************************************************************************************** */ typedef enum _AddrHtileBlockSize { ADDR_HTILE_BLOCKSIZE_4 = 4, ADDR_HTILE_BLOCKSIZE_8 = 8, } AddrHtileBlockSize; /** **************************************************************************************************** * AddrPipeCfg * * @brief * The pipe configuration field specifies both the number of pipes and * how pipes are interleaved on the surface. * The expression of number of pipes, the shader engine tile size, and packer tile size * is encoded in a PIPE_CONFIG register field. * In general the number of pipes usually matches the number of memory channels of the * hardware configuration. * For hw configurations w/ non-pow2 memory number of memory channels, it usually matches * the number of ROP units(? TODO: which registers??) * The enum value = hw enum + 1 which is to reserve 0 for requesting default. **************************************************************************************************** */ typedef enum _AddrPipeCfg { ADDR_PIPECFG_INVALID = 0, ADDR_PIPECFG_P2 = 1, /// 2 pipes, ADDR_PIPECFG_P4_8x16 = 5, /// 4 pipes, ADDR_PIPECFG_P4_16x16 = 6, ADDR_PIPECFG_P4_16x32 = 7, ADDR_PIPECFG_P4_32x32 = 8, ADDR_PIPECFG_P8_16x16_8x16 = 9, /// 8 pipes ADDR_PIPECFG_P8_16x32_8x16 = 10, ADDR_PIPECFG_P8_32x32_8x16 = 11, ADDR_PIPECFG_P8_16x32_16x16 = 12, ADDR_PIPECFG_P8_32x32_16x16 = 13, ADDR_PIPECFG_P8_32x32_16x32 = 14, ADDR_PIPECFG_P8_32x64_32x32 = 15, ADDR_PIPECFG_P16_32x32_8x16 = 17, /// 16 pipes ADDR_PIPECFG_P16_32x32_16x16 = 18, ADDR_PIPECFG_UNUSED = 19, ADDR_PIPECFG_MAX = 20, } AddrPipeCfg; /** **************************************************************************************************** * AddrTileType * * @brief * Neutral enums that specifies micro tile type (MICRO_TILE_MODE) **************************************************************************************************** */ typedef enum _AddrTileType { ADDR_DISPLAYABLE = 0, ///< Displayable tiling ADDR_NON_DISPLAYABLE = 1, ///< Non-displayable tiling, a.k.a thin micro tiling ADDR_DEPTH_SAMPLE_ORDER = 2, ///< Same as non-displayable plus depth-sample-order ADDR_ROTATED = 3, ///< Rotated displayable tiling ADDR_THICK = 4, ///< Thick micro-tiling, only valid for THICK and XTHICK } AddrTileType; //////////////////////////////////////////////////////////////////////////////////////////////////// // // Type definitions: short system-independent names for address library types // //////////////////////////////////////////////////////////////////////////////////////////////////// #if !defined(__APPLE__) || defined(HAVE_TSERVER) #ifndef BOOL_32 // no bool type in C /// @brief Boolean type, since none is defined in C /// @ingroup type #define BOOL_32 int #endif #ifndef INT_32 #define INT_32 int #endif #ifndef UINT_32 #define UINT_32 unsigned int #endif #ifndef INT_16 #define INT_16 short #endif #ifndef UINT_16 #define UINT_16 unsigned short #endif #ifndef INT_8 #define INT_8 char #endif #ifndef UINT_8 #define UINT_8 unsigned char #endif #ifndef NULL #define NULL 0 #endif #ifndef TRUE #define TRUE 1 #endif #ifndef FALSE #define FALSE 0 #endif // // 64-bit integer types depend on the compiler // #if defined( __GNUC__ ) || defined( __WATCOMC__ ) #define INT_64 long long #define UINT_64 unsigned long long #elif defined( _WIN32 ) #define INT_64 __int64 #define UINT_64 unsigned __int64 #else #error Unsupported compiler and/or operating system for 64-bit integers /// @brief 64-bit signed integer type (compiler dependent) /// @ingroup type /// /// The addrlib defines a 64-bit signed integer type for either /// Gnu/Watcom compilers (which use the first syntax) or for /// the Windows VCC compiler (which uses the second syntax). #define INT_64 long long OR __int64 /// @brief 64-bit unsigned integer type (compiler dependent) /// @ingroup type /// /// The addrlib defines a 64-bit unsigned integer type for either /// Gnu/Watcom compilers (which use the first syntax) or for /// the Windows VCC compiler (which uses the second syntax). /// #define UINT_64 unsigned long long OR unsigned __int64 #endif #endif // #if !defined(__APPLE__) || defined(HAVE_TSERVER) // ADDR64X is used to print addresses in hex form on both Windows and Linux // #if defined( __GNUC__ ) || defined( __WATCOMC__ ) #define ADDR64X "llx" #define ADDR64D "lld" #elif defined( _WIN32 ) #define ADDR64X "I64x" #define ADDR64D "I64d" #else #error Unsupported compiler and/or operating system for 64-bit integers /// @brief Addrlib device address 64-bit printf tag (compiler dependent) /// @ingroup type /// /// This allows printf to display an ADDR_64 for either the Windows VCC compiler /// (which used this value) or the Gnu/Watcom compilers (which use "llx". /// An example of use is printf("addr 0x%"ADDR64X"\n", address); /// #define ADDR64X "llx" OR "I64x" #define ADDR64D "lld" OR "I64d" #endif /// @brief Union for storing a 32-bit float or 32-bit integer /// @ingroup type /// /// This union provides a simple way to convert between a 32-bit float /// and a 32-bit integer. It also prevents the compiler from producing /// code that alters NaN values when assiging or coying floats. /// Therefore, all address library routines that pass or return 32-bit /// floating point data do so by passing or returning a FLT_32. /// typedef union { INT_32 i; UINT_32 u; float f; } ADDR_FLT_32; //////////////////////////////////////////////////////////////////////////////////////////////////// // // Macros for controlling linking and building on multiple systems // //////////////////////////////////////////////////////////////////////////////////////////////////// #if defined(_MSC_VER) #if defined(va_copy) #undef va_copy //redefine va_copy to support VC2013 #endif #endif #if !defined(va_copy) #define va_copy(dst, src) \ ((void) memcpy(&(dst), &(src), sizeof(va_list))) #endif #endif // __ADDR_TYPES_H__ ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/000077500000000000000000000000001420110115200206645ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/addrinterface.cpp000066400000000000000000001477201420110115200241760ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrinterface.cpp * @brief Contains the addrlib interface functions **************************************************************************************************** */ #include "addrinterface.h" #include "addrlib1.h" #include "addrlib2.h" #include "addrcommon.h" #include "util/macros.h" namespace rocr { using namespace Addr; //////////////////////////////////////////////////////////////////////////////////////////////////// // Create/Destroy/Config functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrCreate * * @brief * Create address lib object * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrCreate( const ADDR_CREATE_INPUT* pAddrCreateIn, ///< [in] infomation for creating address lib object ADDR_CREATE_OUTPUT* pAddrCreateOut) ///< [out] address lib handle { ADDR_E_RETURNCODE returnCode = ADDR_OK; { returnCode = Lib::Create(pAddrCreateIn, pAddrCreateOut); } return returnCode; } /** **************************************************************************************************** * AddrDestroy * * @brief * Destroy address lib object * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrDestroy( ADDR_HANDLE hLib) ///< address lib handle { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (hLib) { Lib* pLib = Lib::GetLib(hLib); pLib->Destroy(); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // Surface functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrComputeSurfaceInfo * * @brief * Calculate surface width/height/depth/alignments and suitable tiling mode * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeSurfaceInfo( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] surface information ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) ///< [out] surface parameters and alignments { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeSurfaceInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeSurfaceAddrFromCoord * * @brief * Compute surface address according to coordinates * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeSurfaceAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] surface info and coordinates ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] surface address { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeSurfaceAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeSurfaceCoordFromAddr * * @brief * Compute coordinates according to surface address * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeSurfaceCoordFromAddr( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ///< [in] surface info and address ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) ///< [out] coordinates { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeSurfaceCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // HTile functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrComputeHtileInfo * * @brief * Compute Htile pitch, height, base alignment and size in bytes * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeHtileInfo( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_HTILE_INFO_INPUT* pIn, ///< [in] Htile information ADDR_COMPUTE_HTILE_INFO_OUTPUT* pOut) ///< [out] Htile pitch, height and size in bytes { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeHtileInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeHtileAddrFromCoord * * @brief * Compute Htile address according to coordinates (of depth buffer) * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeHtileAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ///< [in] Htile info and coordinates ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] Htile address { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeHtileAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeHtileCoordFromAddr * * @brief * Compute coordinates within depth buffer (1st pixel of a micro tile) according to * Htile address * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeHtileCoordFromAddr( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ///< [in] Htile info and address ADDR_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut) ///< [out] Htile coordinates { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeHtileCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // C-mask functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrComputeCmaskInfo * * @brief * Compute Cmask pitch, height, base alignment and size in bytes from color buffer * info * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeCmaskInfo( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_CMASK_INFO_INPUT* pIn, ///< [in] Cmask pitch and height ADDR_COMPUTE_CMASK_INFO_OUTPUT* pOut) ///< [out] Cmask pitch, height and size in bytes { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeCmaskInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeCmaskAddrFromCoord * * @brief * Compute Cmask address according to coordinates (of MSAA color buffer) * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeCmaskAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] Cmask info and coordinates ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] Cmask address { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeCmaskAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeCmaskCoordFromAddr * * @brief * Compute coordinates within color buffer (1st pixel of a micro tile) according to * Cmask address * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeCmaskCoordFromAddr( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT* pIn, ///< [in] Cmask info and address ADDR_COMPUTE_CMASK_COORDFROMADDR_OUTPUT* pOut) ///< [out] Cmask coordinates { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeCmaskCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // F-mask functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrComputeFmaskInfo * * @brief * Compute Fmask pitch/height/depth/alignments and size in bytes * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeFmaskInfo( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ///< [in] Fmask information ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut) ///< [out] Fmask pitch and height { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeFmaskInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeFmaskAddrFromCoord * * @brief * Compute Fmask address according to coordinates (x,y,slice,sample,plane) * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeFmaskAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] Fmask info and coordinates ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] Fmask address { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeFmaskAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeFmaskCoordFromAddr * * @brief * Compute coordinates (x,y,slice,sample,plane) according to Fmask address * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeFmaskCoordFromAddr( ADDR_HANDLE hLib, ///< address lib handle const ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ///< [in] Fmask info and address ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut) ///< [out] Fmask coordinates { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeFmaskCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // DCC key functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrComputeDccInfo * * @brief * Compute DCC key size, base alignment based on color surface size, tile info or tile index * **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeDccInfo( ADDR_HANDLE hLib, ///< handle of addrlib const ADDR_COMPUTE_DCCINFO_INPUT* pIn, ///< [in] input ADDR_COMPUTE_DCCINFO_OUTPUT* pOut) ///< [out] output { ADDR_E_RETURNCODE returnCode; V1::Lib* pLib = V1::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ComputeDccInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /////////////////////////////////////////////////////////////////////////////// // Below functions are element related or helper functions /////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrGetVersion * * @brief * Get AddrLib version number. Client may check this return value against ADDRLIB_VERSION * defined in addrinterface.h to see if there is a mismatch. **************************************************************************************************** */ UINT_32 ADDR_API AddrGetVersion(ADDR_HANDLE hLib) { UINT_32 version = 0; Addr::Lib* pLib = Lib::GetLib(hLib); ADDR_ASSERT(pLib != NULL); if (pLib) { version = pLib->GetVersion(); } return version; } /** **************************************************************************************************** * AddrUseTileIndex * * @brief * Return TRUE if tileIndex is enabled in this address library **************************************************************************************************** */ BOOL_32 ADDR_API AddrUseTileIndex(ADDR_HANDLE hLib) { BOOL_32 useTileIndex = FALSE; V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_ASSERT(pLib != NULL); if (pLib) { useTileIndex = pLib->UseTileIndex(0); } return useTileIndex; } /** **************************************************************************************************** * AddrUseCombinedSwizzle * * @brief * Return TRUE if combined swizzle is enabled in this address library **************************************************************************************************** */ BOOL_32 ADDR_API AddrUseCombinedSwizzle(ADDR_HANDLE hLib) { BOOL_32 useCombinedSwizzle = FALSE; V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_ASSERT(pLib != NULL); if (pLib) { useCombinedSwizzle = pLib->UseCombinedSwizzle(); } return useCombinedSwizzle; } /** **************************************************************************************************** * AddrExtractBankPipeSwizzle * * @brief * Extract Bank and Pipe swizzle from base256b * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrExtractBankPipeSwizzle( ADDR_HANDLE hLib, ///< addrlib handle const ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT* pIn, ///< [in] input structure ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode = ADDR_OK; V1::Lib* pLib = V1::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ExtractBankPipeSwizzle(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrCombineBankPipeSwizzle * * @brief * Combine Bank and Pipe swizzle * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrCombineBankPipeSwizzle( ADDR_HANDLE hLib, const ADDR_COMBINE_BANKPIPE_SWIZZLE_INPUT* pIn, ADDR_COMBINE_BANKPIPE_SWIZZLE_OUTPUT* pOut) { ADDR_E_RETURNCODE returnCode = ADDR_OK; V1::Lib* pLib = V1::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->CombineBankPipeSwizzle(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeSliceSwizzle * * @brief * Compute a swizzle for slice from a base swizzle * @return * ADDR_OK if no error **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeSliceSwizzle( ADDR_HANDLE hLib, const ADDR_COMPUTE_SLICESWIZZLE_INPUT* pIn, ADDR_COMPUTE_SLICESWIZZLE_OUTPUT* pOut) { ADDR_E_RETURNCODE returnCode = ADDR_OK; V1::Lib* pLib = V1::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ComputeSliceTileSwizzle(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputeBaseSwizzle * * @brief * Return a Combined Bank and Pipe swizzle base on surface based on surface type/index * @return * ADDR_OK if no error **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputeBaseSwizzle( ADDR_HANDLE hLib, const ADDR_COMPUTE_BASE_SWIZZLE_INPUT* pIn, ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT* pOut) { ADDR_E_RETURNCODE returnCode = ADDR_OK; V1::Lib* pLib = V1::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ComputeBaseSwizzle(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * ElemFlt32ToDepthPixel * * @brief * Convert a FLT_32 value to a depth/stencil pixel value * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE * **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API ElemFlt32ToDepthPixel( ADDR_HANDLE hLib, ///< addrlib handle const ELEM_FLT32TODEPTHPIXEL_INPUT* pIn, ///< [in] per-component value ELEM_FLT32TODEPTHPIXEL_OUTPUT* pOut) ///< [out] final pixel value { ADDR_E_RETURNCODE returnCode = ADDR_OK; Lib* pLib = Lib::GetLib(hLib); if (pLib != NULL) { pLib->Flt32ToDepthPixel(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * ElemFlt32ToColorPixel * * @brief * Convert a FLT_32 value to a red/green/blue/alpha pixel value * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE * **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API ElemFlt32ToColorPixel( ADDR_HANDLE hLib, ///< addrlib handle const ELEM_FLT32TOCOLORPIXEL_INPUT* pIn, ///< [in] format, surface number and swap value ELEM_FLT32TOCOLORPIXEL_OUTPUT* pOut) ///< [out] final pixel value { ADDR_E_RETURNCODE returnCode = ADDR_OK; Lib* pLib = Lib::GetLib(hLib); if (pLib != NULL) { pLib->Flt32ToColorPixel(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * ElemGetExportNorm * * @brief * Helper function to check one format can be EXPORT_NUM, * which is a register CB_COLOR_INFO.SURFACE_FORMAT. * FP16 can be reported as EXPORT_NORM for rv770 in r600 * family * **************************************************************************************************** */ BOOL_32 ADDR_API ElemGetExportNorm( ADDR_HANDLE hLib, ///< addrlib handle const ELEM_GETEXPORTNORM_INPUT* pIn) ///< [in] input structure { Addr::Lib* pLib = Lib::GetLib(hLib); BOOL_32 enabled = FALSE; ASSERTED ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { enabled = pLib->GetExportNorm(pIn); } else { returnCode = ADDR_ERROR; } ADDR_ASSERT(returnCode == ADDR_OK); return enabled; } /** **************************************************************************************************** * ElemSize * * @brief * Get bits-per-element for specified format * * @return * Bits-per-element of specified format * **************************************************************************************************** */ UINT_32 ADDR_API ElemSize( ADDR_HANDLE hLib, AddrFormat format) { UINT_32 bpe = 0; Addr::Lib* pLib = Lib::GetLib(hLib); if (pLib != NULL) { bpe = pLib->GetBpe(format); } return bpe; } /** **************************************************************************************************** * AddrConvertTileInfoToHW * * @brief * Convert tile info from real value to hardware register value * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrConvertTileInfoToHW( ADDR_HANDLE hLib, ///< address lib handle const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ///< [in] tile info with real value ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut) ///< [out] tile info with HW register value { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ConvertTileInfoToHW(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrConvertTileIndex * * @brief * Convert tile index to tile mode/type/info * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrConvertTileIndex( ADDR_HANDLE hLib, ///< address lib handle const ADDR_CONVERT_TILEINDEX_INPUT* pIn, ///< [in] input - tile index ADDR_CONVERT_TILEINDEX_OUTPUT* pOut) ///< [out] tile mode/type/info { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ConvertTileIndex(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrGetMacroModeIndex * * @brief * Get macro mode index based on input parameters * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrGetMacroModeIndex( ADDR_HANDLE hLib, ///< address lib handle const ADDR_GET_MACROMODEINDEX_INPUT* pIn, ///< [in] input ADDR_GET_MACROMODEINDEX_OUTPUT* pOut) ///< [out] macro mode index { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode; if (pLib != NULL) { returnCode = pLib->GetMacroModeIndex(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrConvertTileIndex1 * * @brief * Convert tile index to tile mode/type/info * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrConvertTileIndex1( ADDR_HANDLE hLib, ///< address lib handle const ADDR_CONVERT_TILEINDEX1_INPUT* pIn, ///< [in] input - tile index ADDR_CONVERT_TILEINDEX_OUTPUT* pOut) ///< [out] tile mode/type/info { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ConvertTileIndex1(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrGetTileIndex * * @brief * Get tile index from tile mode/type/info * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE * * @note * Only meaningful for SI (and above) **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrGetTileIndex( ADDR_HANDLE hLib, const ADDR_GET_TILEINDEX_INPUT* pIn, ADDR_GET_TILEINDEX_OUTPUT* pOut) { V1::Lib* pLib = V1::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->GetTileIndex(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrComputePrtInfo * * @brief * Interface function for ComputePrtInfo * **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrComputePrtInfo( ADDR_HANDLE hLib, const ADDR_PRT_INFO_INPUT* pIn, ADDR_PRT_INFO_OUTPUT* pOut) { ADDR_E_RETURNCODE returnCode = ADDR_OK; V1::Lib* pLib = V1::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ComputePrtInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrGetMaxAlignments * * @brief * Convert maximum alignments * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrGetMaxAlignments( ADDR_HANDLE hLib, ///< address lib handle ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) ///< [out] output structure { Addr::Lib* pLib = Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->GetMaxAlignments(pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * AddrGetMaxMetaAlignments * * @brief * Convert maximum alignments for metadata * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API AddrGetMaxMetaAlignments( ADDR_HANDLE hLib, ///< address lib handle ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) ///< [out] output structure { Addr::Lib* pLib = Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->GetMaxMetaAlignments(pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // Surface functions for Addr2 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Addr2ComputeSurfaceInfo * * @brief * Calculate surface width/height/depth/alignments and suitable tiling mode * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSurfaceInfo( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] surface information ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) ///< [out] surface parameters and alignments { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeSurfaceInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeSurfaceAddrFromCoord * * @brief * Compute surface address according to coordinates * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSurfaceAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] surface info and coordinates ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] surface address { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeSurfaceAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeSurfaceCoordFromAddr * * @brief * Compute coordinates according to surface address * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSurfaceCoordFromAddr( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ///< [in] surface info and address ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) ///< [out] coordinates { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeSurfaceCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // HTile functions for Addr2 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Addr2ComputeHtileInfo * * @brief * Compute Htile pitch, height, base alignment and size in bytes * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeHtileInfo( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ///< [in] Htile information ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut) ///< [out] Htile pitch, height and size in bytes { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeHtileInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeHtileAddrFromCoord * * @brief * Compute Htile address according to coordinates (of depth buffer) * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeHtileAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ///< [in] Htile info and coordinates ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] Htile address { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeHtileAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeHtileCoordFromAddr * * @brief * Compute coordinates within depth buffer (1st pixel of a micro tile) according to * Htile address * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeHtileCoordFromAddr( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ///< [in] Htile info and address ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut) ///< [out] Htile coordinates { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeHtileCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // C-mask functions for Addr2 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Addr2ComputeCmaskInfo * * @brief * Compute Cmask pitch, height, base alignment and size in bytes from color buffer * info * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeCmaskInfo( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ///< [in] Cmask pitch and height ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut) ///< [out] Cmask pitch, height and size in bytes { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeCmaskInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeCmaskAddrFromCoord * * @brief * Compute Cmask address according to coordinates (of MSAA color buffer) * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeCmaskAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] Cmask info and coordinates ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] Cmask address { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeCmaskAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeCmaskCoordFromAddr * * @brief * Compute coordinates within color buffer (1st pixel of a micro tile) according to * Cmask address * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeCmaskCoordFromAddr( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_CMASK_COORDFROMADDR_INPUT* pIn, ///< [in] Cmask info and address ADDR2_COMPUTE_CMASK_COORDFROMADDR_OUTPUT* pOut) ///< [out] Cmask coordinates { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeCmaskCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // F-mask functions for Addr2 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Addr2ComputeFmaskInfo * * @brief * Compute Fmask pitch/height/depth/alignments and size in bytes * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeFmaskInfo( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_FMASK_INFO_INPUT* pIn, ///< [in] Fmask information ADDR2_COMPUTE_FMASK_INFO_OUTPUT* pOut) ///< [out] Fmask pitch and height { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeFmaskInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeFmaskAddrFromCoord * * @brief * Compute Fmask address according to coordinates (x,y,slice,sample,plane) * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeFmaskAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] Fmask info and coordinates ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] Fmask address { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeFmaskAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeFmaskCoordFromAddr * * @brief * Compute coordinates (x,y,slice,sample,plane) according to Fmask address * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeFmaskCoordFromAddr( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ///< [in] Fmask info and address ADDR2_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut) ///< [out] Fmask coordinates { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeFmaskCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // DCC key functions for Addr2 //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Addr2ComputeDccInfo * * @brief * Compute DCC key size, base alignment based on color surface size, tile info or tile index * **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeDccInfo( ADDR_HANDLE hLib, ///< handle of addrlib const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ///< [in] input ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut) ///< [out] output { ADDR_E_RETURNCODE returnCode; V2::Lib* pLib = V2::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ComputeDccInfo(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeDccAddrFromCoord * * @brief * Compute DCC key address according to coordinates * * @return * ADDR_OK if successful, otherwise an error code of ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeDccAddrFromCoord( ADDR_HANDLE hLib, ///< address lib handle const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ///< [in] Dcc info and coordinates ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] Dcc address { V2::Lib* pLib = V2::Lib::GetLib(hLib); ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pLib != NULL) { returnCode = pLib->ComputeDccAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputePipeBankXor * * @brief * Calculate a valid bank pipe xor value for client to use. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputePipeBankXor( ADDR_HANDLE hLib, ///< handle of addrlib const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ///< [in] input ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut) ///< [out] output { ADDR_E_RETURNCODE returnCode; V2::Lib* pLib = V2::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ComputePipeBankXor(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeSlicePipeBankXor * * @brief * Calculate slice pipe bank xor value based on base pipe bank xor and slice id. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSlicePipeBankXor( ADDR_HANDLE hLib, ///< handle of addrlib const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ///< [in] input ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut) ///< [out] output { ADDR_E_RETURNCODE returnCode; V2::Lib* pLib = V2::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ComputeSlicePipeBankXor(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2ComputeSubResourceOffsetForSwizzlePattern * * @brief * Calculate sub resource offset for swizzle pattern. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2ComputeSubResourceOffsetForSwizzlePattern( ADDR_HANDLE hLib, ///< handle of addrlib const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ///< [in] input ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut) ///< [out] output { ADDR_E_RETURNCODE returnCode; V2::Lib* pLib = V2::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->ComputeSubResourceOffsetForSwizzlePattern(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2GetPreferredSurfaceSetting * * @brief * Suggest a preferred setting for client driver to program HW register **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2GetPreferredSurfaceSetting( ADDR_HANDLE hLib, ///< handle of addrlib const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ///< [in] input ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut) ///< [out] output { ADDR_E_RETURNCODE returnCode; V2::Lib* pLib = V2::Lib::GetLib(hLib); if (pLib != NULL) { returnCode = pLib->Addr2GetPreferredSurfaceSetting(pIn, pOut); } else { returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Addr2IsValidDisplaySwizzleMode * * @brief * Return whether the swizzle mode is supported by DCE / DCN. **************************************************************************************************** */ ADDR_E_RETURNCODE ADDR_API Addr2IsValidDisplaySwizzleMode( ADDR_HANDLE hLib, AddrSwizzleMode swizzleMode, UINT_32 bpp, bool *result) { ADDR_E_RETURNCODE returnCode; V2::Lib* pLib = V2::Lib::GetLib(hLib); if (pLib != NULL) { ADDR2_COMPUTE_SURFACE_INFO_INPUT in; in.resourceType = ADDR_RSRC_TEX_2D; in.swizzleMode = swizzleMode; in.bpp = bpp; *result = pLib->IsValidDisplaySwizzleMode(&in); returnCode = ADDR_OK; } else { returnCode = ADDR_ERROR; } return returnCode; } } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/amdgpu_asic_addr.h000066400000000000000000000137241420110115200243120ustar00rootroot00000000000000/* * Copyright © 2017-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ #ifndef _AMDGPU_ASIC_ADDR_H #define _AMDGPU_ASIC_ADDR_H #define ATI_VENDOR_ID 0x1002 #define AMD_VENDOR_ID 0x1022 // AMDGPU_VENDOR_IS_AMD(vendorId) #define AMDGPU_VENDOR_IS_AMD(v) ((v == ATI_VENDOR_ID) || (v == AMD_VENDOR_ID)) #define FAMILY_UNKNOWN 0x00 #define FAMILY_TN 0x69 #define FAMILY_SI 0x6E #define FAMILY_CI 0x78 #define FAMILY_KV 0x7D #define FAMILY_VI 0x82 #define FAMILY_POLARIS 0x82 #define FAMILY_CZ 0x87 #define FAMILY_AI 0x8D #define FAMILY_RV 0x8E #define FAMILY_NV 0x8F // AMDGPU_FAMILY_IS(familyId, familyName) #define FAMILY_IS(f, fn) (f == FAMILY_##fn) #define FAMILY_IS_TN(f) FAMILY_IS(f, TN) #define FAMILY_IS_SI(f) FAMILY_IS(f, SI) #define FAMILY_IS_CI(f) FAMILY_IS(f, CI) #define FAMILY_IS_KV(f) FAMILY_IS(f, KV) #define FAMILY_IS_VI(f) FAMILY_IS(f, VI) #define FAMILY_IS_POLARIS(f) FAMILY_IS(f, POLARIS) #define FAMILY_IS_CZ(f) FAMILY_IS(f, CZ) #define FAMILY_IS_AI(f) FAMILY_IS(f, AI) #define FAMILY_IS_RV(f) FAMILY_IS(f, RV) #define FAMILY_IS_NV(f) FAMILY_IS(f, NV) #define AMDGPU_UNKNOWN 0xFF #define AMDGPU_TAHITI_RANGE 0x05, 0x14 #define AMDGPU_PITCAIRN_RANGE 0x15, 0x28 #define AMDGPU_CAPEVERDE_RANGE 0x29, 0x3C #define AMDGPU_OLAND_RANGE 0x3C, 0x46 #define AMDGPU_HAINAN_RANGE 0x46, 0xFF #define AMDGPU_BONAIRE_RANGE 0x14, 0x28 #define AMDGPU_HAWAII_RANGE 0x28, 0x3C #define AMDGPU_SPECTRE_RANGE 0x01, 0x41 #define AMDGPU_SPOOKY_RANGE 0x41, 0x81 #define AMDGPU_KALINDI_RANGE 0x81, 0xA1 #define AMDGPU_GODAVARI_RANGE 0xA1, 0xFF #define AMDGPU_ICELAND_RANGE 0x01, 0x14 #define AMDGPU_TONGA_RANGE 0x14, 0x28 #define AMDGPU_FIJI_RANGE 0x3C, 0x50 #define AMDGPU_POLARIS10_RANGE 0x50, 0x5A #define AMDGPU_POLARIS11_RANGE 0x5A, 0x64 #define AMDGPU_POLARIS12_RANGE 0x64, 0x6E #define AMDGPU_VEGAM_RANGE 0x6E, 0xFF #define AMDGPU_CARRIZO_RANGE 0x01, 0x21 #define AMDGPU_STONEY_RANGE 0x61, 0xFF #define AMDGPU_VEGA10_RANGE 0x01, 0x14 #define AMDGPU_VEGA12_RANGE 0x14, 0x28 #define AMDGPU_VEGA20_RANGE 0x28, 0x32 #define AMDGPU_ARCTURUS_RANGE 0x32, 0xFF #define AMDGPU_RAVEN_RANGE 0x01, 0x81 #define AMDGPU_RAVEN2_RANGE 0x81, 0x91 #define AMDGPU_RENOIR_RANGE 0x91, 0xFF #define AMDGPU_NAVI10_RANGE 0x01, 0x0A #define AMDGPU_NAVI12_RANGE 0x0A, 0x14 #define AMDGPU_NAVI14_RANGE 0x14, 0x28 #define AMDGPU_SIENNA_RANGE 0x28, 0x32 #define AMDGPU_EXPAND_FIX(x) x #define AMDGPU_RANGE_HELPER(val, min, max) ((val >= min) && (val < max)) #define AMDGPU_IN_RANGE(val, ...) AMDGPU_EXPAND_FIX(AMDGPU_RANGE_HELPER(val, __VA_ARGS__)) // ASICREV_IS(eRevisionId, revisionName) #define ASICREV_IS(r, rn) AMDGPU_IN_RANGE(r, AMDGPU_##rn##_RANGE) #define ASICREV_IS_TAHITI_P(r) ASICREV_IS(r, TAHITI) #define ASICREV_IS_PITCAIRN_PM(r) ASICREV_IS(r, PITCAIRN) #define ASICREV_IS_CAPEVERDE_M(r) ASICREV_IS(r, CAPEVERDE) #define ASICREV_IS_OLAND_M(r) ASICREV_IS(r, OLAND) #define ASICREV_IS_HAINAN_V(r) ASICREV_IS(r, HAINAN) #define ASICREV_IS_BONAIRE_M(r) ASICREV_IS(r, BONAIRE) #define ASICREV_IS_HAWAII_P(r) ASICREV_IS(r, HAWAII) #define ASICREV_IS_SPECTRE(r) ASICREV_IS(r, SPECTRE) #define ASICREV_IS_SPOOKY(r) ASICREV_IS(r, SPOOKY) #define ASICREV_IS_KALINDI(r) ASICREV_IS(r, KALINDI) #define ASICREV_IS_KALINDI_GODAVARI(r) ASICREV_IS(r, GODAVARI) #define ASICREV_IS_ICELAND_M(r) ASICREV_IS(r, ICELAND) #define ASICREV_IS_TONGA_P(r) ASICREV_IS(r, TONGA) #define ASICREV_IS_FIJI_P(r) ASICREV_IS(r, FIJI) #define ASICREV_IS_POLARIS10_P(r) ASICREV_IS(r, POLARIS10) #define ASICREV_IS_POLARIS11_M(r) ASICREV_IS(r, POLARIS11) #define ASICREV_IS_POLARIS12_V(r) ASICREV_IS(r, POLARIS12) #define ASICREV_IS_VEGAM_P(r) ASICREV_IS(r, VEGAM) #define ASICREV_IS_CARRIZO(r) ASICREV_IS(r, CARRIZO) #define ASICREV_IS_STONEY(r) ASICREV_IS(r, STONEY) #define ASICREV_IS_VEGA10_M(r) ASICREV_IS(r, VEGA10) #define ASICREV_IS_VEGA10_P(r) ASICREV_IS(r, VEGA10) #define ASICREV_IS_VEGA12_P(r) ASICREV_IS(r, VEGA12) #define ASICREV_IS_VEGA12_p(r) ASICREV_IS(r, VEGA12) #define ASICREV_IS_VEGA20_P(r) ASICREV_IS(r, VEGA20) #define ASICREV_IS_ARCTURUS(r) ASICREV_IS(r, ARCTURUS) #define ASICREV_IS_RAVEN(r) ASICREV_IS(r, RAVEN) #define ASICREV_IS_RAVEN2(r) ASICREV_IS(r, RAVEN2) #define ASICREV_IS_RENOIR(r) ASICREV_IS(r, RENOIR) #define ASICREV_IS_NAVI10_P(r) ASICREV_IS(r, NAVI10) #define ASICREV_IS_NAVI12(r) ASICREV_IS(r, NAVI12) #define ASICREV_IS_NAVI14(r) ASICREV_IS(r, NAVI14) #define ASICREV_IS_SIENNA_M(r) ASICREV_IS(r, SIENNA) #endif // _AMDGPU_ASIC_ADDR_H ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/chip/000077500000000000000000000000001420110115200216075ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/chip/gfx10/000077500000000000000000000000001420110115200225345ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/chip/gfx10/gfx10_gb_reg.h000066400000000000000000000041541420110115200251430ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ #if !defined (__GFX10_GB_REG_H__) #define __GFX10_GB_REG_H__ /* * gfx10_gb_reg.h * * Register Spec Release: 1.0 * */ union GB_ADDR_CONFIG { struct { #if defined(LITTLEENDIAN_CPU) unsigned int NUM_PIPES : 3; unsigned int PIPE_INTERLEAVE_SIZE : 3; unsigned int MAX_COMPRESSED_FRAGS : 2; unsigned int NUM_PKRS : 3; unsigned int : 21; #elif defined(BIGENDIAN_CPU) unsigned int : 21; unsigned int NUM_PKRS : 3; unsigned int MAX_COMPRESSED_FRAGS : 2; unsigned int PIPE_INTERLEAVE_SIZE : 3; unsigned int NUM_PIPES : 3; #endif } bitfields, bits; unsigned int u32All; int i32All; float f32All; }; #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/chip/gfx9/000077500000000000000000000000001420110115200224645ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/chip/gfx9/gfx9_gb_reg.h000066400000000000000000000063651420110115200250310ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ #if !defined (__GFX9_GB_REG_H__) #define __GFX9_GB_REG_H__ /* * gfx9_gb_reg.h * * Register Spec Release: 1.0 * */ union GB_ADDR_CONFIG_gfx9 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int NUM_PIPES : 3; unsigned int PIPE_INTERLEAVE_SIZE : 3; unsigned int MAX_COMPRESSED_FRAGS : 2; unsigned int BANK_INTERLEAVE_SIZE : 3; unsigned int : 1; unsigned int NUM_BANKS : 3; unsigned int : 1; unsigned int SHADER_ENGINE_TILE_SIZE : 3; unsigned int NUM_SHADER_ENGINES : 2; unsigned int NUM_GPUS : 3; unsigned int MULTI_GPU_TILE_SIZE : 2; unsigned int NUM_RB_PER_SE : 2; unsigned int ROW_SIZE : 2; unsigned int NUM_LOWER_PIPES : 1; unsigned int SE_ENABLE : 1; #elif defined(BIGENDIAN_CPU) unsigned int SE_ENABLE : 1; unsigned int NUM_LOWER_PIPES : 1; unsigned int ROW_SIZE : 2; unsigned int NUM_RB_PER_SE : 2; unsigned int MULTI_GPU_TILE_SIZE : 2; unsigned int NUM_GPUS : 3; unsigned int NUM_SHADER_ENGINES : 2; unsigned int SHADER_ENGINE_TILE_SIZE : 3; unsigned int : 1; unsigned int NUM_BANKS : 3; unsigned int : 1; unsigned int BANK_INTERLEAVE_SIZE : 3; unsigned int MAX_COMPRESSED_FRAGS : 2; unsigned int PIPE_INTERLEAVE_SIZE : 3; unsigned int NUM_PIPES : 3; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/chip/r800/000077500000000000000000000000001420110115200223005ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/chip/r800/si_gb_reg.h000066400000000000000000000143111420110115200243710ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ #if !defined (__SI_GB_REG_H__) #define __SI_GB_REG_H__ /***************************************************************************************************************** * * si_gb_reg.h * * Register Spec Release: Chip Spec 0.28 * *****************************************************************************************************************/ /* * GB_ADDR_CONFIG struct */ #if defined(LITTLEENDIAN_CPU) typedef struct _GB_ADDR_CONFIG_T { unsigned int num_pipes : 3; unsigned int : 1; unsigned int pipe_interleave_size : 3; unsigned int : 1; unsigned int bank_interleave_size : 3; unsigned int : 1; unsigned int num_shader_engines : 2; unsigned int : 2; unsigned int shader_engine_tile_size : 3; unsigned int : 1; unsigned int num_gpus : 3; unsigned int : 1; unsigned int multi_gpu_tile_size : 2; unsigned int : 2; unsigned int row_size : 2; unsigned int num_lower_pipes : 1; unsigned int : 1; } GB_ADDR_CONFIG_T; #elif defined(BIGENDIAN_CPU) typedef struct _GB_ADDR_CONFIG_T { unsigned int : 1; unsigned int num_lower_pipes : 1; unsigned int row_size : 2; unsigned int : 2; unsigned int multi_gpu_tile_size : 2; unsigned int : 1; unsigned int num_gpus : 3; unsigned int : 1; unsigned int shader_engine_tile_size : 3; unsigned int : 2; unsigned int num_shader_engines : 2; unsigned int : 1; unsigned int bank_interleave_size : 3; unsigned int : 1; unsigned int pipe_interleave_size : 3; unsigned int : 1; unsigned int num_pipes : 3; } GB_ADDR_CONFIG_T; #endif typedef union { unsigned int val : 32; GB_ADDR_CONFIG_T f; } GB_ADDR_CONFIG; #if defined(LITTLEENDIAN_CPU) typedef struct _GB_TILE_MODE_T { unsigned int micro_tile_mode : 2; unsigned int array_mode : 4; unsigned int pipe_config : 5; unsigned int tile_split : 3; unsigned int bank_width : 2; unsigned int bank_height : 2; unsigned int macro_tile_aspect : 2; unsigned int num_banks : 2; unsigned int micro_tile_mode_new : 3; unsigned int sample_split : 2; unsigned int : 5; } GB_TILE_MODE_T; typedef struct _GB_MACROTILE_MODE_T { unsigned int bank_width : 2; unsigned int bank_height : 2; unsigned int macro_tile_aspect : 2; unsigned int num_banks : 2; unsigned int : 24; } GB_MACROTILE_MODE_T; #elif defined(BIGENDIAN_CPU) typedef struct _GB_TILE_MODE_T { unsigned int : 5; unsigned int sample_split : 2; unsigned int micro_tile_mode_new : 3; unsigned int num_banks : 2; unsigned int macro_tile_aspect : 2; unsigned int bank_height : 2; unsigned int bank_width : 2; unsigned int tile_split : 3; unsigned int pipe_config : 5; unsigned int array_mode : 4; unsigned int micro_tile_mode : 2; } GB_TILE_MODE_T; typedef struct _GB_MACROTILE_MODE_T { unsigned int : 24; unsigned int num_banks : 2; unsigned int macro_tile_aspect : 2; unsigned int bank_height : 2; unsigned int bank_width : 2; } GB_MACROTILE_MODE_T; #endif typedef union { unsigned int val : 32; GB_TILE_MODE_T f; } GB_TILE_MODE; typedef union { unsigned int val : 32; GB_MACROTILE_MODE_T f; } GB_MACROTILE_MODE; #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/000077500000000000000000000000001420110115200216145ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrcommon.h000066400000000000000000000736111420110115200241200ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrcommon.h * @brief Contains the helper function and constants. **************************************************************************************************** */ #ifndef __ADDR_COMMON_H__ #define __ADDR_COMMON_H__ #include "addrinterface.h" #if !defined(DEBUG) #ifdef NDEBUG #define DEBUG 0 #else #define DEBUG 1 #endif #endif // ADDR_LNX_KERNEL_BUILD is for internal build // Moved from addrinterface.h so __KERNEL__ is not needed any more #if ADDR_LNX_KERNEL_BUILD // || (defined(__GNUC__) && defined(__KERNEL__)) #include #elif !defined(__APPLE__) || defined(HAVE_TSERVER) #include #include #endif #include #include "util/macros.h" //////////////////////////////////////////////////////////////////////////////////////////////////// // Platform specific debug break defines //////////////////////////////////////////////////////////////////////////////////////////////////// #if DEBUG #if defined(__GNUC__) #define ADDR_DBG_BREAK() assert(false) #elif defined(__APPLE__) #define ADDR_DBG_BREAK() { IOPanic("");} #else #define ADDR_DBG_BREAK() { __debugbreak(); } #endif #else #define ADDR_DBG_BREAK() do {} while(0) #endif //////////////////////////////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////////////////////////////// // Debug assertions used in AddrLib //////////////////////////////////////////////////////////////////////////////////////////////////// #if defined(_WIN32) && (_MSC_VER >= 1400) #define ADDR_ANALYSIS_ASSUME(expr) __analysis_assume(expr) #else #define ADDR_ANALYSIS_ASSUME(expr) do { (void)(expr); } while (0) #endif #define ADDR_ASSERT(__e) assert(__e) #define ADDR_ASSERT_ALWAYS() ADDR_DBG_BREAK() #define ADDR_UNHANDLED_CASE() ADDR_ASSERT(!"Unhandled case") #define ADDR_NOT_IMPLEMENTED() ADDR_ASSERT(!"Not implemented"); //////////////////////////////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////////////////////////////// // Debug print macro from legacy address library //////////////////////////////////////////////////////////////////////////////////////////////////// #if DEBUG #define ADDR_PRNT(a) Object::DebugPrint a /// @brief Macro for reporting informational messages /// @ingroup util /// /// This macro optionally prints an informational message to stdout. /// The first parameter is a condition -- if it is true, nothing is done. /// The second pararmeter MUST be a parenthesis-enclosed list of arguments, /// starting with a string. This is passed to printf() or an equivalent /// in order to format the informational message. For example, /// ADDR_INFO(0, ("test %d",3) ); prints out "test 3". /// #define ADDR_INFO(cond, a) \ { if (!(cond)) { ADDR_PRNT(a); } } /// @brief Macro for reporting error warning messages /// @ingroup util /// /// This macro optionally prints an error warning message to stdout, /// followed by the file name and line number where the macro was called. /// The first parameter is a condition -- if it is true, nothing is done. /// The second pararmeter MUST be a parenthesis-enclosed list of arguments, /// starting with a string. This is passed to printf() or an equivalent /// in order to format the informational message. For example, /// ADDR_WARN(0, ("test %d",3) ); prints out "test 3" followed by /// a second line with the file name and line number. /// #define ADDR_WARN(cond, a) \ { if (!(cond)) \ { ADDR_PRNT(a); \ ADDR_PRNT((" WARNING in file %s, line %d\n", __FILE__, __LINE__)); \ } } /// @brief Macro for reporting fatal error conditions /// @ingroup util /// /// This macro optionally stops execution of the current routine /// after printing an error warning message to stdout, /// followed by the file name and line number where the macro was called. /// The first parameter is a condition -- if it is true, nothing is done. /// The second pararmeter MUST be a parenthesis-enclosed list of arguments, /// starting with a string. This is passed to printf() or an equivalent /// in order to format the informational message. For example, /// ADDR_EXIT(0, ("test %d",3) ); prints out "test 3" followed by /// a second line with the file name and line number, then stops execution. /// #define ADDR_EXIT(cond, a) \ { if (!(cond)) \ { ADDR_PRNT(a); ADDR_DBG_BREAK();\ } } #else // DEBUG #define ADDRDPF 1 ? (void)0 : (void) #define ADDR_PRNT(a) do {} while(0) #define ADDR_DBG_BREAK() do {} while(0) #define ADDR_INFO(cond, a) do {} while(0) #define ADDR_WARN(cond, a) do {} while(0) #define ADDR_EXIT(cond, a) do {} while(0) #endif // DEBUG //////////////////////////////////////////////////////////////////////////////////////////////////// #define ADDR_C_ASSERT(__e) STATIC_ASSERT(__e) namespace rocr { namespace Addr { namespace V1 { //////////////////////////////////////////////////////////////////////////////////////////////////// // Common constants //////////////////////////////////////////////////////////////////////////////////////////////////// static const UINT_32 MicroTileWidth = 8; ///< Micro tile width, for 1D and 2D tiling static const UINT_32 MicroTileHeight = 8; ///< Micro tile height, for 1D and 2D tiling static const UINT_32 ThickTileThickness = 4; ///< Micro tile thickness, for THICK modes static const UINT_32 XThickTileThickness = 8; ///< Extra thick tiling thickness static const UINT_32 PowerSaveTileBytes = 64; ///< Nuber of bytes per tile for power save 64 static const UINT_32 CmaskCacheBits = 1024; ///< Number of bits for CMASK cache static const UINT_32 CmaskElemBits = 4; ///< Number of bits for CMASK element static const UINT_32 HtileCacheBits = 16384; ///< Number of bits for HTILE cache 512*32 static const UINT_32 MicroTilePixels = MicroTileWidth * MicroTileHeight; static const INT_32 TileIndexInvalid = TILEINDEX_INVALID; static const INT_32 TileIndexLinearGeneral = TILEINDEX_LINEAR_GENERAL; static const INT_32 TileIndexNoMacroIndex = -3; } // V1 namespace V2 { //////////////////////////////////////////////////////////////////////////////////////////////////// // Common constants //////////////////////////////////////////////////////////////////////////////////////////////////// static const UINT_32 MaxSurfaceHeight = 16384; } // V2 //////////////////////////////////////////////////////////////////////////////////////////////////// // Common macros //////////////////////////////////////////////////////////////////////////////////////////////////// #define BITS_PER_BYTE 8 #define BITS_TO_BYTES(x) ( ((x) + (BITS_PER_BYTE-1)) / BITS_PER_BYTE ) #define BYTES_TO_BITS(x) ( (x) * BITS_PER_BYTE ) /// Helper macros to select a single bit from an int (undefined later in section) #define _BIT(v,b) (((v) >> (b) ) & 1) /** **************************************************************************************************** * @brief Enums to identify AddrLib type **************************************************************************************************** */ enum LibClass { BASE_ADDRLIB = 0x0, R600_ADDRLIB = 0x6, R800_ADDRLIB = 0x8, SI_ADDRLIB = 0xa, CI_ADDRLIB = 0xb, AI_ADDRLIB = 0xd, }; /** **************************************************************************************************** * ChipFamily * * @brief * Neutral enums that specifies chip family. * **************************************************************************************************** */ enum ChipFamily { ADDR_CHIP_FAMILY_IVLD, ///< Invalid family ADDR_CHIP_FAMILY_R6XX, ADDR_CHIP_FAMILY_R7XX, ADDR_CHIP_FAMILY_R8XX, ADDR_CHIP_FAMILY_NI, ADDR_CHIP_FAMILY_SI, ADDR_CHIP_FAMILY_CI, ADDR_CHIP_FAMILY_VI, ADDR_CHIP_FAMILY_AI, ADDR_CHIP_FAMILY_NAVI, }; /** **************************************************************************************************** * ConfigFlags * * @brief * This structure is used to set configuration flags. **************************************************************************************************** */ union ConfigFlags { struct { /// These flags are set up internally thru AddrLib::Create() based on ADDR_CREATE_FLAGS UINT_32 optimalBankSwap : 1; ///< New bank tiling for RV770 only UINT_32 noCubeMipSlicesPad : 1; ///< Disables faces padding for cubemap mipmaps UINT_32 fillSizeFields : 1; ///< If clients fill size fields in all input and /// output structure UINT_32 ignoreTileInfo : 1; ///< Don't use tile info structure UINT_32 useTileIndex : 1; ///< Make tileIndex field in input valid UINT_32 useCombinedSwizzle : 1; ///< Use combined swizzle UINT_32 checkLast2DLevel : 1; ///< Check the last 2D mip sub level UINT_32 useHtileSliceAlign : 1; ///< Do htile single slice alignment UINT_32 allowLargeThickTile : 1; ///< Allow 64*thickness*bytesPerPixel > rowSize UINT_32 disableLinearOpt : 1; ///< Disallow tile modes to be optimized to linear UINT_32 use32bppFor422Fmt : 1; ///< View 422 formats as 32 bits per pixel element UINT_32 forceDccAndTcCompat : 1; ///< Force enable DCC and TC compatibility UINT_32 nonPower2MemConfig : 1; ///< Physical video memory size is not power of 2 UINT_32 reserved : 19; ///< Reserved bits for future use }; UINT_32 value; }; //////////////////////////////////////////////////////////////////////////////////////////////////// // Misc helper functions //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * AddrXorReduce * * @brief * Xor the right-side numberOfBits bits of x. **************************************************************************************************** */ static inline UINT_32 XorReduce( UINT_32 x, UINT_32 numberOfBits) { UINT_32 i; UINT_32 result = x & 1; for (i=1; i>i) & 1); } return result; } /** **************************************************************************************************** * IsPow2 * * @brief * Check if the size (UINT_32) is pow 2 **************************************************************************************************** */ static inline UINT_32 IsPow2( UINT_32 dim) ///< [in] dimension of miplevel { ADDR_ASSERT(dim > 0); return !(dim & (dim - 1)); } /** **************************************************************************************************** * IsPow2 * * @brief * Check if the size (UINT_64) is pow 2 **************************************************************************************************** */ static inline UINT_64 IsPow2( UINT_64 dim) ///< [in] dimension of miplevel { ADDR_ASSERT(dim > 0); return !(dim & (dim - 1)); } /** **************************************************************************************************** * ByteAlign * * @brief * Align UINT_32 "x" to "align" alignment, "align" should be power of 2 **************************************************************************************************** */ static inline UINT_32 PowTwoAlign( UINT_32 x, UINT_32 align) { // // Assert that x is a power of two. // ADDR_ASSERT(IsPow2(align)); return (x + (align - 1)) & (~(align - 1)); } /** **************************************************************************************************** * ByteAlign * * @brief * Align UINT_64 "x" to "align" alignment, "align" should be power of 2 **************************************************************************************************** */ static inline UINT_64 PowTwoAlign( UINT_64 x, UINT_64 align) { // // Assert that x is a power of two. // ADDR_ASSERT(IsPow2(align)); return (x + (align - 1)) & (~(align - 1)); } /** **************************************************************************************************** * Min * * @brief * Get the min value between two unsigned values **************************************************************************************************** */ static inline UINT_32 Min( UINT_32 value1, UINT_32 value2) { return ((value1 < (value2)) ? (value1) : value2); } /** **************************************************************************************************** * Min * * @brief * Get the min value between two signed values **************************************************************************************************** */ static inline INT_32 Min( INT_32 value1, INT_32 value2) { return ((value1 < (value2)) ? (value1) : value2); } /** **************************************************************************************************** * Max * * @brief * Get the max value between two unsigned values **************************************************************************************************** */ static inline UINT_32 Max( UINT_32 value1, UINT_32 value2) { return ((value1 > (value2)) ? (value1) : value2); } /** **************************************************************************************************** * Max * * @brief * Get the max value between two signed values **************************************************************************************************** */ static inline INT_32 Max( INT_32 value1, INT_32 value2) { return ((value1 > (value2)) ? (value1) : value2); } /** **************************************************************************************************** * NextPow2 * * @brief * Compute the mipmap's next level dim size **************************************************************************************************** */ static inline UINT_32 NextPow2( UINT_32 dim) ///< [in] dimension of miplevel { UINT_32 newDim = 1; if (dim > 0x7fffffff) { ADDR_ASSERT_ALWAYS(); newDim = 0x80000000; } else { while (newDim < dim) { newDim <<= 1; } } return newDim; } /** **************************************************************************************************** * Log2NonPow2 * * @brief * Compute log of base 2 no matter the target is power of 2 or not **************************************************************************************************** */ static inline UINT_32 Log2NonPow2( UINT_32 x) ///< [in] the value should calculate log based 2 { UINT_32 y; y = 0; while (x > 1) { x >>= 1; y++; } return y; } /** **************************************************************************************************** * Log2 * * @brief * Compute log of base 2 **************************************************************************************************** */ static inline UINT_32 Log2( UINT_32 x) ///< [in] the value should calculate log based 2 { // Assert that x is a power of two. ADDR_ASSERT(IsPow2(x)); return Log2NonPow2(x); } /** **************************************************************************************************** * QLog2 * * @brief * Compute log of base 2 quickly (<= 16) **************************************************************************************************** */ static inline UINT_32 QLog2( UINT_32 x) ///< [in] the value should calculate log based 2 { ADDR_ASSERT(x <= 16); UINT_32 y = 0; switch (x) { case 1: y = 0; break; case 2: y = 1; break; case 4: y = 2; break; case 8: y = 3; break; case 16: y = 4; break; default: ADDR_ASSERT_ALWAYS(); } return y; } /** **************************************************************************************************** * SafeAssign * * @brief * NULL pointer safe assignment **************************************************************************************************** */ static inline VOID SafeAssign( UINT_32* pLVal, ///< [in] Pointer to left val UINT_32 rVal) ///< [in] Right value { if (pLVal) { *pLVal = rVal; } } /** **************************************************************************************************** * SafeAssign * * @brief * NULL pointer safe assignment for 64bit values **************************************************************************************************** */ static inline VOID SafeAssign( UINT_64* pLVal, ///< [in] Pointer to left val UINT_64 rVal) ///< [in] Right value { if (pLVal) { *pLVal = rVal; } } /** **************************************************************************************************** * SafeAssign * * @brief * NULL pointer safe assignment for AddrTileMode **************************************************************************************************** */ static inline VOID SafeAssign( AddrTileMode* pLVal, ///< [in] Pointer to left val AddrTileMode rVal) ///< [in] Right value { if (pLVal) { *pLVal = rVal; } } /** **************************************************************************************************** * RoundHalf * * @brief * return (x + 1) / 2 **************************************************************************************************** */ static inline UINT_32 RoundHalf( UINT_32 x) ///< [in] input value { ADDR_ASSERT(x != 0); #if 1 return (x >> 1) + (x & 1); #else return (x + 1) >> 1; #endif } /** **************************************************************************************************** * SumGeo * * @brief * Calculate sum of a geometric progression whose ratio is 1/2 **************************************************************************************************** */ static inline UINT_32 SumGeo( UINT_32 base, ///< [in] First term in the geometric progression UINT_32 num) ///< [in] Number of terms to be added into sum { ADDR_ASSERT(base > 0); UINT_32 sum = 0; UINT_32 i = 0; for (; (i < num) && (base > 1); i++) { sum += base; base = RoundHalf(base); } sum += num - i; return sum; } /** **************************************************************************************************** * GetBit * * @brief * Extract bit N value (0 or 1) of a UINT32 value. **************************************************************************************************** */ static inline UINT_32 GetBit( UINT_32 u32, ///< [in] UINT32 value UINT_32 pos) ///< [in] bit position from LSB, valid range is [0..31] { ADDR_ASSERT(pos <= 31); return (u32 >> pos) & 0x1; } /** **************************************************************************************************** * GetBits * * @brief * Copy 'bitsNum' bits from src start from srcStartPos into destination from dstStartPos * srcStartPos: 0~31 for UINT_32 * bitsNum : 1~32 for UINT_32 * srcStartPos: 0~31 for UINT_32 * src start position * | * src : b[31] b[30] b[29] ... ... ... ... ... ... ... ... b[end]..b[beg] ... b[1] b[0] * || Bits num || copy length || Bits num || * dst : b[31] b[30] b[29] ... b[end]..b[beg] ... ... ... ... ... ... ... ... b[1] b[0] * | * dst start position **************************************************************************************************** */ static inline UINT_32 GetBits( UINT_32 src, UINT_32 srcStartPos, UINT_32 bitsNum, UINT_32 dstStartPos) { ADDR_ASSERT((srcStartPos < 32) && (dstStartPos < 32) && (bitsNum > 0)); ADDR_ASSERT((bitsNum + dstStartPos <= 32) && (bitsNum + srcStartPos <= 32)); return ((src >> srcStartPos) << (32 - bitsNum)) >> (32 - bitsNum - dstStartPos); } /** **************************************************************************************************** * MortonGen2d * * @brief * Generate 2D Morton interleave code with num lowest bits in each channel **************************************************************************************************** */ static inline UINT_32 MortonGen2d( UINT_32 x, ///< [in] First channel UINT_32 y, ///< [in] Second channel UINT_32 num) ///< [in] Number of bits extracted from each channel { UINT_32 mort = 0; for (UINT_32 i = 0; i < num; i++) { mort |= (GetBit(y, i) << (2 * i)); mort |= (GetBit(x, i) << (2 * i + 1)); } return mort; } /** **************************************************************************************************** * MortonGen3d * * @brief * Generate 3D Morton interleave code with num lowest bits in each channel **************************************************************************************************** */ static inline UINT_32 MortonGen3d( UINT_32 x, ///< [in] First channel UINT_32 y, ///< [in] Second channel UINT_32 z, ///< [in] Third channel UINT_32 num) ///< [in] Number of bits extracted from each channel { UINT_32 mort = 0; for (UINT_32 i = 0; i < num; i++) { mort |= (GetBit(z, i) << (3 * i)); mort |= (GetBit(y, i) << (3 * i + 1)); mort |= (GetBit(x, i) << (3 * i + 2)); } return mort; } /** **************************************************************************************************** * ReverseBitVector * * @brief * Return reversed lowest num bits of v: v[0]v[1]...v[num-2]v[num-1] **************************************************************************************************** */ static inline UINT_32 ReverseBitVector( UINT_32 v, ///< [in] Reverse operation base value UINT_32 num) ///< [in] Number of bits used in reverse operation { UINT_32 reverse = 0; for (UINT_32 i = 0; i < num; i++) { reverse |= (GetBit(v, num - 1 - i) << i); } return reverse; } /** **************************************************************************************************** * FoldXor2d * * @brief * Xor bit vector v[num-1]v[num-2]...v[1]v[0] with v[num]v[num+1]...v[2*num-2]v[2*num-1] **************************************************************************************************** */ static inline UINT_32 FoldXor2d( UINT_32 v, ///< [in] Xor operation base value UINT_32 num) ///< [in] Number of bits used in fold xor operation { return (v & ((1 << num) - 1)) ^ ReverseBitVector(v >> num, num); } /** **************************************************************************************************** * DeMort * * @brief * Return v[0] | v[2] | v[4] | v[6]... | v[2*num - 2] **************************************************************************************************** */ static inline UINT_32 DeMort( UINT_32 v, ///< [in] DeMort operation base value UINT_32 num) ///< [in] Number of bits used in fold DeMort operation { UINT_32 d = 0; for (UINT_32 i = 0; i < num; i++) { d |= ((v & (1 << (i << 1))) >> i); } return d; } /** **************************************************************************************************** * FoldXor3d * * @brief * v[0]...v[num-1] ^ v[3*num-1]v[3*num-3]...v[num+2]v[num] ^ v[3*num-2]...v[num+1]v[num-1] **************************************************************************************************** */ static inline UINT_32 FoldXor3d( UINT_32 v, ///< [in] Xor operation base value UINT_32 num) ///< [in] Number of bits used in fold xor operation { UINT_32 t = v & ((1 << num) - 1); t ^= ReverseBitVector(DeMort(v >> num, num), num); t ^= ReverseBitVector(DeMort(v >> (num + 1), num), num); return t; } /** **************************************************************************************************** * InitChannel * * @brief * Set channel initialization value via a return value **************************************************************************************************** */ static inline ADDR_CHANNEL_SETTING InitChannel( UINT_32 valid, ///< [in] valid setting UINT_32 channel, ///< [in] channel setting UINT_32 index) ///< [in] index setting { ADDR_CHANNEL_SETTING t; t.valid = valid; t.channel = channel; t.index = index; return t; } /** **************************************************************************************************** * InitChannel * * @brief * Set channel initialization value via channel pointer **************************************************************************************************** */ static inline VOID InitChannel( UINT_32 valid, ///< [in] valid setting UINT_32 channel, ///< [in] channel setting UINT_32 index, ///< [in] index setting ADDR_CHANNEL_SETTING *pChanSet) ///< [out] channel setting to be initialized { pChanSet->valid = valid; pChanSet->channel = channel; pChanSet->index = index; } /** **************************************************************************************************** * InitChannel * * @brief * Set channel initialization value via another channel **************************************************************************************************** */ static inline VOID InitChannel( ADDR_CHANNEL_SETTING *pChanDst, ///< [in] channel setting to be copied from ADDR_CHANNEL_SETTING *pChanSrc) ///< [out] channel setting to be initialized { pChanDst->valid = pChanSrc->valid; pChanDst->channel = pChanSrc->channel; pChanDst->index = pChanSrc->index; } /** **************************************************************************************************** * GetMaxValidChannelIndex * * @brief * Get max valid index for a specific channel **************************************************************************************************** */ static inline UINT_32 GetMaxValidChannelIndex( const ADDR_CHANNEL_SETTING *pChanSet, ///< [in] channel setting to be initialized UINT_32 searchCount,///< [in] number of channel setting to be searched UINT_32 channel) ///< [in] channel to be searched { UINT_32 index = 0; for (UINT_32 i = 0; i < searchCount; i++) { if (pChanSet[i].valid && (pChanSet[i].channel == channel)) { index = Max(index, static_cast(pChanSet[i].index)); } } return index; } /** **************************************************************************************************** * GetCoordActiveMask * * @brief * Get bit mask which indicates which positions in the equation match the target coord **************************************************************************************************** */ static inline UINT_32 GetCoordActiveMask( const ADDR_CHANNEL_SETTING *pChanSet, ///< [in] channel setting to be initialized UINT_32 searchCount,///< [in] number of channel setting to be searched UINT_32 channel, ///< [in] channel to be searched UINT_32 index) ///< [in] index to be searched { UINT_32 mask = 0; for (UINT_32 i = 0; i < searchCount; i++) { if ((pChanSet[i].valid == TRUE) && (pChanSet[i].channel == channel) && (pChanSet[i].index == index)) { mask |= (1 << i); } } return mask; } /** **************************************************************************************************** * ShiftCeil * * @brief * Apply righ-shift with ceiling **************************************************************************************************** */ static inline UINT_32 ShiftCeil( UINT_32 a, ///< [in] value to be right-shifted UINT_32 b) ///< [in] number of bits to shift { return (a >> b) + (((a & ((1 << b) - 1)) != 0) ? 1 : 0); } } // Addr } // rocr #endif // __ADDR_COMMON_H__ ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrelemlib.cpp000066400000000000000000001606041420110115200245730ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrelemlib.cpp * @brief Contains the class implementation for element/pixel related functions. **************************************************************************************************** */ #include "addrelemlib.h" #include "addrlib.h" namespace rocr { namespace Addr { /** **************************************************************************************************** * ElemLib::ElemLib * * @brief * constructor * * @return * N/A **************************************************************************************************** */ ElemLib::ElemLib( Lib* pAddrLib) ///< [in] Parent addrlib instance pointer : Object(pAddrLib->GetClient()), m_pAddrLib(pAddrLib) { switch (m_pAddrLib->GetChipFamily()) { case ADDR_CHIP_FAMILY_R6XX: m_depthPlanarType = ADDR_DEPTH_PLANAR_R600; m_fp16ExportNorm = 0; break; case ADDR_CHIP_FAMILY_R7XX: m_depthPlanarType = ADDR_DEPTH_PLANAR_R600; m_fp16ExportNorm = 1; break; case ADDR_CHIP_FAMILY_R8XX: case ADDR_CHIP_FAMILY_NI: // Same as 8xx m_depthPlanarType = ADDR_DEPTH_PLANAR_R800; m_fp16ExportNorm = 1; break; default: m_fp16ExportNorm = 1; m_depthPlanarType = ADDR_DEPTH_PLANAR_R800; break; } m_configFlags.value = 0; } /** **************************************************************************************************** * ElemLib::~ElemLib * * @brief * destructor * * @return * N/A **************************************************************************************************** */ ElemLib::~ElemLib() { } /** **************************************************************************************************** * ElemLib::Create * * @brief * Creates and initializes AddrLib object. * * @return * Returns point to ADDR_CREATEINFO if successful. **************************************************************************************************** */ ElemLib* ElemLib::Create( const Lib* pAddrLib) ///< [in] Pointer of parent AddrLib instance { ElemLib* pElemLib = NULL; if (pAddrLib) { VOID* pObj = Object::ClientAlloc(sizeof(ElemLib), pAddrLib->GetClient()); if (pObj) { pElemLib = new(pObj) ElemLib(const_cast(pAddrLib)); } } return pElemLib; } /************************************************************************************************** * ElemLib::Flt32sToInt32s * * @brief * Convert a ADDR_FLT_32 value to Int32 value * * @return * N/A **************************************************************************************************** */ VOID ElemLib::Flt32sToInt32s( ADDR_FLT_32 value, ///< [in] ADDR_FLT_32 value UINT_32 bits, ///< [in] nubmer of bits in value NumberType numberType, ///< [in] the type of number UINT_32* pResult) ///< [out] Int32 value { UINT_8 round = 128; //ADDR_ROUND_BY_HALF UINT_32 uscale; UINT_32 sign; //convert each component to an INT_32 switch ( numberType ) { case ADDR_NO_NUMBER: //fall through case ADDR_ZERO: //fall through case ADDR_ONE: //fall through case ADDR_EPSILON: //fall through return; // these are zero-bit components, so don't set result case ADDR_UINT_BITS: // unsigned integer bit field, clamped to range uscale = (1< uscale)) { *pResult = uscale; } else { *pResult = value.i; } return; } // The algorithm used in the DB and TX differs at one value for 24-bit unorms case ADDR_UNORM_R6XXDB: // unsigned repeating fraction if ((bits==24) && (value.i == 0x33000000)) { *pResult = 1; return; } // Else treat like ADDR_UNORM_R6XX case ADDR_UNORM_R6XX: // unsigned repeating fraction if (value.f <= 0) { *pResult = 0; // first clamp to [0..1] } else { if (value.f >= 1) { *pResult = (1<(f + (round/256.0f)); } #endif else { ADDR_FLT_32 scaled; ADDR_FLT_32 shifted; UINT_64 truncated, rounded; UINT_32 altShift; UINT_32 mask = (1 << bits) - 1; UINT_32 half = 1 << (bits - 1); UINT_32 mant24 = (value.i & 0x7FFFFF) + 0x800000; UINT_64 temp = mant24 - (mant24>>bits) - static_cast((mant24 & mask) > half); UINT_32 exp8 = value.i >> 23; UINT_32 shift = 126 - exp8 + 24 - bits; UINT_64 final; if (shift >= 32) // This is zero, even with maximum dither add { final = 0; } else { final = ((temp<<8) + (static_cast(round)<> (shift+8); } //ADDR_EXIT( *pResult == final, // ("Float %x converted to %d-bit Unorm %x != bitwise %x", // value.u, bits, (UINT_32)*pResult, (UINT_32)final) ); if (final > mask) { final = mask; } scaled.f = value.f * ((1<>23)&0xFF); truncated = (altShift > 60) ? 0 : truncated >> altShift; rounded = static_cast((round + truncated) >> 8); //if (rounded > ((1<(rounded); //(INT_32)final; } } } return; case ADDR_S8FLOAT32: // 32-bit IEEE float, passes through NaN values *pResult = value.i; return; // @@ FIX ROUNDING in this code, fix the denorm case case ADDR_U4FLOATC: // Unsigned float, 4-bit exponent. bias 15, clamped [0..1] sign = (value.i >> 31) & 1; if ((value.i&0x7F800000) == 0x7F800000) // If NaN or INF: { if ((value.i&0x007FFFFF) != 0) // then if NaN { *pResult = 0; // return 0 } else { *pResult = (sign)?0:0xF00000; // else +INF->+1, -INF->0 } return; } if (value.f <= 0) { *pResult = 0; } else { if (value.f>=1) { *pResult = 0xF << (bits-4); } else { if ((value.i>>23) > 112 ) { // 24-bit float: normalized // value.i += 1 << (22-bits+4); // round the IEEE mantissa to mantissa size // @@ NOTE: add code to support rounding value.u &= 0x7FFFFFF; // mask off high 4 exponent bits *pResult = value.i >> (23-bits+4);// shift off unused mantissa bits } else { // 24-bit float: denormalized value.f = value.f / (1<<28) / (1<<28); value.f = value.f / (1<<28) / (1<<28); // convert to IEEE denorm // value.i += 1 << (22-bits+4); // round the IEEE mantissa to mantissa size // @@ NOTE: add code to support rounding *pResult = value.i >> (23-bits+4); // shift off unused mantissa bits } } } return; default: // invalid number mode //ADDR_EXIT(0, ("Invalid AddrNumber %d", numberType) ); break; } } /** **************************************************************************************************** * ElemLib::Int32sToPixel * * @brief * Pack 32-bit integer values into an uncompressed pixel, * in the proper order * * @return * N/A * * @note * This entry point packes four 32-bit integer values into * an uncompressed pixel. The pixel values are specifies in * standard order, e.g. depth/stencil. This routine asserts * if called on compressed pixel. **************************************************************************************************** */ VOID ElemLib::Int32sToPixel( UINT_32 numComps, ///< [in] number of components UINT_32* pComps, ///< [in] compnents UINT_32* pCompBits, ///< [in] total bits in each component UINT_32* pCompStart, ///< [in] the first bit position of each component ComponentFlags properties, ///< [in] properties about byteAligned, exportNorm UINT_32 resultBits, ///< [in] result bits: total bpp after decompression UINT_8* pPixel) ///< [out] a depth/stencil pixel value { UINT_32 i; UINT_32 j; UINT_32 start; UINT_32 size; UINT_32 byte; UINT_32 value = 0; UINT_32 compMask; UINT_32 elemMask=0; UINT_32 elementXor = 0; // address xor when reading bytes from elements // @@ NOTE: assert if called on a compressed format! if (properties.byteAligned) // Components are all byte-sized { for (i = 0; i < numComps; i++) // Then for each component { // Copy the bytes of the component into the element start = pCompStart[i] / 8; size = pCompBits[i] / 8; for (j = 0; j < size; j++) { pPixel[(j+start)^elementXor] = static_cast(pComps[i] >> (8*j)); } } } else // Element is 32-bits or less, components are bit fields { // First, extract each component in turn and combine it into a 32-bit value for (i = 0; i < numComps; i++) { compMask = (1 << pCompBits[i]) - 1; elemMask |= compMask << pCompStart[i]; value |= (pComps[i] & compMask) << pCompStart[i]; } // Mext, copy the masked value into the element size = (resultBits + 7) / 8; for (i = 0; i < size; i++) { byte = pPixel[i^elementXor] & ~(elemMask >> (8*i)); pPixel[i^elementXor] = static_cast(byte | ((elemMask & value) >> (8*i))); } } } /** **************************************************************************************************** * Flt32ToDepthPixel * * @brief * Convert a FLT_32 value to a depth/stencil pixel value * * @return * N/A **************************************************************************************************** */ VOID ElemLib::Flt32ToDepthPixel( AddrDepthFormat format, ///< [in] Depth format const ADDR_FLT_32 comps[2], ///< [in] two components of depth UINT_8* pPixel ///< [out] depth pixel value ) const { UINT_32 i; UINT_32 values[2]; ComponentFlags properties; // byteAligned, exportNorm UINT_32 resultBits = 0; // result bits: total bits per pixel after decompression PixelFormatInfo fmt; // get type for each component PixGetDepthCompInfo(format, &fmt); //initialize properties properties.byteAligned = TRUE; properties.exportNorm = TRUE; properties.floatComp = FALSE; //set properties and result bits for (i = 0; i < 2; i++) { if ((fmt.compBit[i] & 7) || (fmt.compStart[i] & 7)) { properties.byteAligned = FALSE; } if (resultBits < fmt.compStart[i] + fmt.compBit[i]) { resultBits = fmt.compStart[i] + fmt.compBit[i]; } // Clear ADDR_EXPORT_NORM if can't be represented as 11-bit or smaller [-1..+1] format if (fmt.compBit[i] > 11 || fmt.numType[i] >= ADDR_USCALED) { properties.exportNorm = FALSE; } // Mark if there are any floating point components if ((fmt.numType[i] == ADDR_U4FLOATC) || (fmt.numType[i] >= ADDR_S8FLOAT) ) { properties.floatComp = TRUE; } } // Convert the two input floats to integer values for (i = 0; i < 2; i++) { Flt32sToInt32s(comps[i], fmt.compBit[i], fmt.numType[i], &values[i]); } // Then pack the two integer components, in the proper order Int32sToPixel(2, values, fmt.compBit, fmt.compStart, properties, resultBits, pPixel ); } /** **************************************************************************************************** * Flt32ToColorPixel * * @brief * Convert a FLT_32 value to a red/green/blue/alpha pixel value * * @return * N/A **************************************************************************************************** */ VOID ElemLib::Flt32ToColorPixel( AddrColorFormat format, ///< [in] Color format AddrSurfaceNumber surfNum, ///< [in] Surface number AddrSurfaceSwap surfSwap, ///< [in] Surface swap const ADDR_FLT_32 comps[4], ///< [in] four components of color UINT_8* pPixel ///< [out] a red/green/blue/alpha pixel value ) const { PixelFormatInfo pixelInfo; UINT_32 i; UINT_32 values[4]; ComponentFlags properties; // byteAligned, exportNorm UINT_32 resultBits = 0; // result bits: total bits per pixel after decompression memset(&pixelInfo, 0, sizeof(PixelFormatInfo)); PixGetColorCompInfo(format, surfNum, surfSwap, &pixelInfo); //initialize properties properties.byteAligned = TRUE; properties.exportNorm = TRUE; properties.floatComp = FALSE; //set properties and result bits for (i = 0; i < 4; i++) { if ( (pixelInfo.compBit[i] & 7) || (pixelInfo.compStart[i] & 7) ) { properties.byteAligned = FALSE; } if (resultBits < pixelInfo.compStart[i] + pixelInfo.compBit[i]) { resultBits = pixelInfo.compStart[i] + pixelInfo.compBit[i]; } if (m_fp16ExportNorm) { // Clear ADDR_EXPORT_NORM if can't be represented as 11-bit or smaller [-1..+1] format // or if it's not FP and <=16 bits if (((pixelInfo.compBit[i] > 11) || (pixelInfo.numType[i] >= ADDR_USCALED)) && (pixelInfo.numType[i] !=ADDR_U4FLOATC)) { properties.exportNorm = FALSE; } } else { // Clear ADDR_EXPORT_NORM if can't be represented as 11-bit or smaller [-1..+1] format if (pixelInfo.compBit[i] > 11 || pixelInfo.numType[i] >= ADDR_USCALED) { properties.exportNorm = FALSE; } } // Mark if there are any floating point components if ( (pixelInfo.numType[i] == ADDR_U4FLOATC) || (pixelInfo.numType[i] >= ADDR_S8FLOAT) ) { properties.floatComp = TRUE; } } // Convert the four input floats to integer values for (i = 0; i < 4; i++) { Flt32sToInt32s(comps[i], pixelInfo.compBit[i], pixelInfo.numType[i], &values[i]); } // Then pack the four integer components, in the proper order Int32sToPixel(4, values, &pixelInfo.compBit[0], &pixelInfo.compStart[0], properties, resultBits, pPixel); } /** **************************************************************************************************** * ElemLib::GetCompType * * @brief * Fill per component info * * @return * N/A * **************************************************************************************************** */ VOID ElemLib::GetCompType( AddrColorFormat format, ///< [in] surface format AddrSurfaceNumber numType, ///< [in] number type PixelFormatInfo* pInfo) ///< [in][out] per component info out { BOOL_32 handled = FALSE; // Floating point formats override the number format switch (format) { case ADDR_COLOR_16_FLOAT: // fall through for all pure floating point format case ADDR_COLOR_16_16_FLOAT: case ADDR_COLOR_16_16_16_16_FLOAT: case ADDR_COLOR_32_FLOAT: case ADDR_COLOR_32_32_FLOAT: case ADDR_COLOR_32_32_32_32_FLOAT: case ADDR_COLOR_10_11_11_FLOAT: case ADDR_COLOR_11_11_10_FLOAT: numType = ADDR_NUMBER_FLOAT; break; // Special handling for the depth formats case ADDR_COLOR_8_24: // fall through for these 2 similar format case ADDR_COLOR_24_8: for (UINT_32 c = 0; c < 4; c++) { if (pInfo->compBit[c] == 8) { pInfo->numType[c] = ADDR_UINT_BITS; } else if (pInfo->compBit[c] == 24) { pInfo->numType[c] = ADDR_UNORM_R6XX; } else { pInfo->numType[c] = ADDR_NO_NUMBER; } } handled = TRUE; break; case ADDR_COLOR_8_24_FLOAT: // fall through for these 3 similar format case ADDR_COLOR_24_8_FLOAT: case ADDR_COLOR_X24_8_32_FLOAT: for (UINT_32 c = 0; c < 4; c++) { if (pInfo->compBit[c] == 8) { pInfo->numType[c] = ADDR_UINT_BITS; } else if (pInfo->compBit[c] == 24) { pInfo->numType[c] = ADDR_U4FLOATC; } else if (pInfo->compBit[c] == 32) { pInfo->numType[c] = ADDR_S8FLOAT32; } else { pInfo->numType[c] = ADDR_NO_NUMBER; } } handled = TRUE; break; default: break; } if (!handled) { for (UINT_32 c = 0; c < 4; c++) { // Assign a number type for each component AddrSurfaceNumber cnum; // First handle default component values if (pInfo->compBit[c] == 0) { if (c < 3) { pInfo->numType[c] = ADDR_ZERO; // Default is zero for RGB } else if (numType == ADDR_NUMBER_UINT || numType == ADDR_NUMBER_SINT) { pInfo->numType[c] = ADDR_EPSILON; // Alpha INT_32 bits default is 0x01 } else { pInfo->numType[c] = ADDR_ONE; // Alpha normal default is float 1.0 } continue; } // Now handle small components else if (pInfo->compBit[c] == 1) { if (numType == ADDR_NUMBER_UINT || numType == ADDR_NUMBER_SINT) { cnum = ADDR_NUMBER_UINT; } else { cnum = ADDR_NUMBER_UNORM; } } else { cnum = numType; } // If no default, set the number type fom num, compbits, and architecture switch (cnum) { case ADDR_NUMBER_SRGB: pInfo->numType[c] = (c < 3) ? ADDR_GAMMA8_R6XX : ADDR_UNORM_R6XX; break; case ADDR_NUMBER_UNORM: pInfo->numType[c] = ADDR_UNORM_R6XX; break; case ADDR_NUMBER_SNORM: pInfo->numType[c] = ADDR_SNORM_R6XX; break; case ADDR_NUMBER_USCALED: pInfo->numType[c] = ADDR_USCALED; // @@ Do we need separate Pele routine? break; case ADDR_NUMBER_SSCALED: pInfo->numType[c] = ADDR_SSCALED; // @@ Do we need separate Pele routine? break; case ADDR_NUMBER_FLOAT: if (pInfo->compBit[c] == 32) { pInfo->numType[c] = ADDR_S8FLOAT32; } else if (pInfo->compBit[c] == 16) { pInfo->numType[c] = ADDR_S5FLOAT; } else if (pInfo->compBit[c] >= 10) { pInfo->numType[c] = ADDR_U5FLOAT; } else { ADDR_ASSERT_ALWAYS(); } break; case ADDR_NUMBER_SINT: pInfo->numType[c] = ADDR_SINT_BITS; break; case ADDR_NUMBER_UINT: pInfo->numType[c] = ADDR_UINT_BITS; break; default: ADDR_ASSERT(!"Invalid number type"); pInfo->numType[c] = ADDR_NO_NUMBER; break; } } } } /** **************************************************************************************************** * ElemLib::GetCompSwap * * @brief * Get components swapped for color surface * * @return * N/A * **************************************************************************************************** */ VOID ElemLib::GetCompSwap( AddrSurfaceSwap swap, ///< [in] swap mode PixelFormatInfo* pInfo) ///< [in,out] output per component info { switch (pInfo->comps) { case 4: switch (swap) { case ADDR_SWAP_ALT: SwapComps( 0, 2, pInfo ); break; // BGRA case ADDR_SWAP_STD_REV: SwapComps( 0, 3, pInfo ); SwapComps( 1, 2, pInfo ); break; // ABGR case ADDR_SWAP_ALT_REV: SwapComps( 0, 3, pInfo ); SwapComps( 0, 2, pInfo ); SwapComps( 0, 1, pInfo ); break; // ARGB default: break; } break; case 3: switch (swap) { case ADDR_SWAP_ALT_REV: SwapComps( 0, 3, pInfo ); SwapComps( 0, 2, pInfo ); break; // AGR case ADDR_SWAP_STD_REV: SwapComps( 0, 2, pInfo ); break; // BGR case ADDR_SWAP_ALT: SwapComps( 2, 3, pInfo ); break; // RGA default: break; // RGB } break; case 2: switch (swap) { case ADDR_SWAP_ALT_REV: SwapComps( 0, 1, pInfo ); SwapComps( 1, 3, pInfo ); break; // AR case ADDR_SWAP_STD_REV: SwapComps( 0, 1, pInfo ); break; // GR case ADDR_SWAP_ALT: SwapComps( 1, 3, pInfo ); break; // RA default: break; // RG } break; case 1: switch (swap) { case ADDR_SWAP_ALT_REV: SwapComps( 0, 3, pInfo ); break; // A case ADDR_SWAP_STD_REV: SwapComps( 0, 2, pInfo ); break; // B case ADDR_SWAP_ALT: SwapComps( 0, 1, pInfo ); break; // G default: break; // R } break; } } /** **************************************************************************************************** * ElemLib::GetCompSwap * * @brief * Get components swapped for color surface * * @return * N/A * **************************************************************************************************** */ VOID ElemLib::SwapComps( UINT_32 c0, ///< [in] component index 0 UINT_32 c1, ///< [in] component index 1 PixelFormatInfo* pInfo) ///< [in,out] output per component info { UINT_32 start; UINT_32 bits; start = pInfo->compStart[c0]; pInfo->compStart[c0] = pInfo->compStart[c1]; pInfo->compStart[c1] = start; bits = pInfo->compBit[c0]; pInfo->compBit[c0] = pInfo->compBit[c1]; pInfo->compBit[c1] = bits; } /** **************************************************************************************************** * ElemLib::PixGetColorCompInfo * * @brief * Get per component info for color surface * * @return * N/A * **************************************************************************************************** */ VOID ElemLib::PixGetColorCompInfo( AddrColorFormat format, ///< [in] surface format, read from register AddrSurfaceNumber number, ///< [in] pixel number type AddrSurfaceSwap swap, ///< [in] component swap mode PixelFormatInfo* pInfo ///< [out] output per component info ) const { // 1. Get componet bits switch (format) { case ADDR_COLOR_8: GetCompBits(8, 0, 0, 0, pInfo); break; case ADDR_COLOR_1_5_5_5: GetCompBits(5, 5, 5, 1, pInfo); break; case ADDR_COLOR_5_6_5: GetCompBits(8, 6, 5, 0, pInfo); break; case ADDR_COLOR_6_5_5: GetCompBits(5, 5, 6, 0, pInfo); break; case ADDR_COLOR_8_8: GetCompBits(8, 8, 0, 0, pInfo); break; case ADDR_COLOR_4_4_4_4: GetCompBits(4, 4, 4, 4, pInfo); break; case ADDR_COLOR_16: GetCompBits(16, 0, 0, 0, pInfo); break; case ADDR_COLOR_8_8_8_8: GetCompBits(8, 8, 8, 8, pInfo); break; case ADDR_COLOR_2_10_10_10: GetCompBits(10, 10, 10, 2, pInfo); break; case ADDR_COLOR_10_11_11: GetCompBits(11, 11, 10, 0, pInfo); break; case ADDR_COLOR_11_11_10: GetCompBits(10, 11, 11, 0, pInfo); break; case ADDR_COLOR_16_16: GetCompBits(16, 16, 0, 0, pInfo); break; case ADDR_COLOR_16_16_16_16: GetCompBits(16, 16, 16, 16, pInfo); break; case ADDR_COLOR_16_FLOAT: GetCompBits(16, 0, 0, 0, pInfo); break; case ADDR_COLOR_16_16_FLOAT: GetCompBits(16, 16, 0, 0, pInfo); break; case ADDR_COLOR_32_FLOAT: GetCompBits(32, 0, 0, 0, pInfo); break; case ADDR_COLOR_32_32_FLOAT: GetCompBits(32, 32, 0, 0, pInfo); break; case ADDR_COLOR_16_16_16_16_FLOAT: GetCompBits(16, 16, 16, 16, pInfo); break; case ADDR_COLOR_32_32_32_32_FLOAT: GetCompBits(32, 32, 32, 32, pInfo); break; case ADDR_COLOR_32: GetCompBits(32, 0, 0, 0, pInfo); break; case ADDR_COLOR_32_32: GetCompBits(32, 32, 0, 0, pInfo); break; case ADDR_COLOR_32_32_32_32: GetCompBits(32, 32, 32, 32, pInfo); break; case ADDR_COLOR_10_10_10_2: GetCompBits(2, 10, 10, 10, pInfo); break; case ADDR_COLOR_10_11_11_FLOAT: GetCompBits(11, 11, 10, 0, pInfo); break; case ADDR_COLOR_11_11_10_FLOAT: GetCompBits(10, 11, 11, 0, pInfo); break; case ADDR_COLOR_5_5_5_1: GetCompBits(1, 5, 5, 5, pInfo); break; case ADDR_COLOR_3_3_2: GetCompBits(2, 3, 3, 0, pInfo); break; case ADDR_COLOR_4_4: GetCompBits(4, 4, 0, 0, pInfo); break; case ADDR_COLOR_8_24: case ADDR_COLOR_8_24_FLOAT: // same bit count, fall through GetCompBits(24, 8, 0, 0, pInfo); break; case ADDR_COLOR_24_8: case ADDR_COLOR_24_8_FLOAT: // same bit count, fall through GetCompBits(8, 24, 0, 0, pInfo); break; case ADDR_COLOR_X24_8_32_FLOAT: GetCompBits(32, 8, 0, 0, pInfo); break; case ADDR_COLOR_INVALID: GetCompBits(0, 0, 0, 0, pInfo); break; default: ADDR_ASSERT(0); GetCompBits(0, 0, 0, 0, pInfo); break; } // 2. Get component number type GetCompType(format, number, pInfo); // 3. Swap components if needed GetCompSwap(swap, pInfo); } /** **************************************************************************************************** * ElemLib::PixGetDepthCompInfo * * @brief * Get per component info for depth surface * * @return * N/A * **************************************************************************************************** */ VOID ElemLib::PixGetDepthCompInfo( AddrDepthFormat format, ///< [in] surface format, read from register PixelFormatInfo* pInfo ///< [out] output per component bits and type ) const { if (m_depthPlanarType == ADDR_DEPTH_PLANAR_R800) { if (format == ADDR_DEPTH_8_24_FLOAT) { format = ADDR_DEPTH_X24_8_32_FLOAT; // Use this format to represent R800's D24FS8 } if (format == ADDR_DEPTH_X8_24_FLOAT) { format = ADDR_DEPTH_32_FLOAT; } } switch (format) { case ADDR_DEPTH_16: GetCompBits(16, 0, 0, 0, pInfo); break; case ADDR_DEPTH_8_24: case ADDR_DEPTH_8_24_FLOAT: // similar format, fall through GetCompBits(24, 8, 0, 0, pInfo); break; case ADDR_DEPTH_X8_24: case ADDR_DEPTH_X8_24_FLOAT: // similar format, fall through GetCompBits(24, 0, 0, 0, pInfo); break; case ADDR_DEPTH_32_FLOAT: GetCompBits(32, 0, 0, 0, pInfo); break; case ADDR_DEPTH_X24_8_32_FLOAT: GetCompBits(32, 8, 0, 0, pInfo); break; case ADDR_DEPTH_INVALID: GetCompBits(0, 0, 0, 0, pInfo); break; default: ADDR_ASSERT(0); GetCompBits(0, 0, 0, 0, pInfo); break; } switch (format) { case ADDR_DEPTH_16: pInfo->numType [0] = ADDR_UNORM_R6XX; pInfo->numType [1] = ADDR_ZERO; break; case ADDR_DEPTH_8_24: pInfo->numType [0] = ADDR_UNORM_R6XXDB; pInfo->numType [1] = ADDR_UINT_BITS; break; case ADDR_DEPTH_8_24_FLOAT: pInfo->numType [0] = ADDR_U4FLOATC; pInfo->numType [1] = ADDR_UINT_BITS; break; case ADDR_DEPTH_X8_24: pInfo->numType [0] = ADDR_UNORM_R6XXDB; pInfo->numType [1] = ADDR_ZERO; break; case ADDR_DEPTH_X8_24_FLOAT: pInfo->numType [0] = ADDR_U4FLOATC; pInfo->numType [1] = ADDR_ZERO; break; case ADDR_DEPTH_32_FLOAT: pInfo->numType [0] = ADDR_S8FLOAT32; pInfo->numType [1] = ADDR_ZERO; break; case ADDR_DEPTH_X24_8_32_FLOAT: pInfo->numType [0] = ADDR_S8FLOAT32; pInfo->numType [1] = ADDR_UINT_BITS; break; default: pInfo->numType [0] = ADDR_NO_NUMBER; pInfo->numType [1] = ADDR_NO_NUMBER; break; } pInfo->numType [2] = ADDR_NO_NUMBER; pInfo->numType [3] = ADDR_NO_NUMBER; } /** **************************************************************************************************** * ElemLib::PixGetExportNorm * * @brief * Check if fp16 export norm can be enabled. * * @return * TRUE if this can be enabled. * **************************************************************************************************** */ BOOL_32 ElemLib::PixGetExportNorm( AddrColorFormat colorFmt, ///< [in] surface format, read from register AddrSurfaceNumber numberFmt, ///< [in] pixel number type AddrSurfaceSwap swap ///< [in] components swap type ) const { BOOL_32 enabled = TRUE; PixelFormatInfo formatInfo; PixGetColorCompInfo(colorFmt, numberFmt, swap, &formatInfo); for (UINT_32 c = 0; c < 4; c++) { if (m_fp16ExportNorm) { if (((formatInfo.compBit[c] > 11) || (formatInfo.numType[c] > ADDR_USCALED)) && (formatInfo.numType[c] != ADDR_U4FLOATC) && (formatInfo.numType[c] != ADDR_S5FLOAT) && (formatInfo.numType[c] != ADDR_S5FLOATM) && (formatInfo.numType[c] != ADDR_U5FLOAT) && (formatInfo.numType[c] != ADDR_U3FLOATM)) { enabled = FALSE; break; } } else { if ((formatInfo.compBit[c] > 11) || (formatInfo.numType[c] > ADDR_USCALED)) { enabled = FALSE; break; } } } return enabled; } /** **************************************************************************************************** * ElemLib::AdjustSurfaceInfo * * @brief * Adjust bpp/base pitch/width/height according to elemMode and expandX/Y * * @return * N/A **************************************************************************************************** */ VOID ElemLib::AdjustSurfaceInfo( ElemMode elemMode, ///< [in] element mode UINT_32 expandX, ///< [in] decompression expansion factor in X UINT_32 expandY, ///< [in] decompression expansion factor in Y UINT_32* pBpp, ///< [in,out] bpp UINT_32* pBasePitch, ///< [in,out] base pitch UINT_32* pWidth, ///< [in,out] width UINT_32* pHeight) ///< [in,out] height { UINT_32 packedBits; UINT_32 basePitch; UINT_32 width; UINT_32 height; UINT_32 bpp; BOOL_32 bBCnFormat = FALSE; ADDR_ASSERT(pBpp != NULL); ADDR_ASSERT(pWidth != NULL && pHeight != NULL && pBasePitch != NULL); if (pBpp) { bpp = *pBpp; switch (elemMode) { case ADDR_EXPANDED: packedBits = bpp / expandX / expandY; break; case ADDR_PACKED_STD: // Different bit order case ADDR_PACKED_REV: packedBits = bpp * expandX * expandY; break; case ADDR_PACKED_GBGR: case ADDR_PACKED_BGRG: packedBits = bpp; // 32-bit packed ==> 2 32-bit result break; case ADDR_PACKED_BC1: // Fall through case ADDR_PACKED_BC4: packedBits = 64; bBCnFormat = TRUE; break; case ADDR_PACKED_BC2: // Fall through case ADDR_PACKED_BC3: // Fall through case ADDR_PACKED_BC5: // Fall through bBCnFormat = TRUE; // fall through case ADDR_PACKED_ASTC: case ADDR_PACKED_ETC2_128BPP: packedBits = 128; break; case ADDR_PACKED_ETC2_64BPP: packedBits = 64; break; case ADDR_ROUND_BY_HALF: // Fall through case ADDR_ROUND_TRUNCATE: // Fall through case ADDR_ROUND_DITHER: // Fall through case ADDR_UNCOMPRESSED: packedBits = bpp; break; default: packedBits = bpp; ADDR_ASSERT_ALWAYS(); break; } *pBpp = packedBits; } if (pWidth && pHeight && pBasePitch) { basePitch = *pBasePitch; width = *pWidth; height = *pHeight; if ((expandX > 1) || (expandY > 1)) { if (elemMode == ADDR_EXPANDED) { basePitch *= expandX; width *= expandX; height *= expandY; } else { // Evergreen family workaround if (bBCnFormat && (m_pAddrLib->GetChipFamily() == ADDR_CHIP_FAMILY_R8XX)) { // For BCn we now pad it to POW2 at the beginning so it is safe to // divide by 4 directly basePitch = basePitch / expandX; width = width / expandX; height = height / expandY; #if DEBUG width = (width == 0) ? 1 : width; height = (height == 0) ? 1 : height; if ((*pWidth > PowTwoAlign(width, 8) * expandX) || (*pHeight > PowTwoAlign(height, 8) * expandY)) // 8 is 1D tiling alignment { // if this assertion is hit we may have issues if app samples // rightmost/bottommost pixels ADDR_ASSERT_ALWAYS(); } #endif } else // Not BCn format we still keep old way (FMT_1? No real test yet) { basePitch = (basePitch + expandX - 1) / expandX; width = (width + expandX - 1) / expandX; height = (height + expandY - 1) / expandY; } } *pBasePitch = basePitch; // 0 is legal value for base pitch. *pWidth = (width == 0) ? 1 : width; *pHeight = (height == 0) ? 1 : height; } //if (pWidth && pHeight && pBasePitch) } } /** **************************************************************************************************** * ElemLib::RestoreSurfaceInfo * * @brief * Reverse operation of AdjustSurfaceInfo * * @return * N/A **************************************************************************************************** */ VOID ElemLib::RestoreSurfaceInfo( ElemMode elemMode, ///< [in] element mode UINT_32 expandX, ///< [in] decompression expansion factor in X UINT_32 expandY, ///< [out] decompression expansion factor in Y UINT_32* pBpp, ///< [in,out] bpp UINT_32* pWidth, ///< [in,out] width UINT_32* pHeight) ///< [in,out] height { UINT_32 originalBits; UINT_32 width; UINT_32 height; UINT_32 bpp; BOOL_32 bBCnFormat = FALSE; (void)bBCnFormat; ADDR_ASSERT(pBpp != NULL); ADDR_ASSERT(pWidth != NULL && pHeight != NULL); if (pBpp) { bpp = *pBpp; switch (elemMode) { case ADDR_EXPANDED: originalBits = bpp * expandX * expandY; break; case ADDR_PACKED_STD: // Different bit order case ADDR_PACKED_REV: originalBits = bpp / expandX / expandY; break; case ADDR_PACKED_GBGR: case ADDR_PACKED_BGRG: originalBits = bpp; // 32-bit packed ==> 2 32-bit result break; case ADDR_PACKED_BC1: // Fall through case ADDR_PACKED_BC4: originalBits = 64; bBCnFormat = TRUE; break; case ADDR_PACKED_BC2: // Fall through case ADDR_PACKED_BC3: // Fall through case ADDR_PACKED_BC5: bBCnFormat = TRUE; // fall through case ADDR_PACKED_ASTC: case ADDR_PACKED_ETC2_128BPP: originalBits = 128; break; case ADDR_PACKED_ETC2_64BPP: originalBits = 64; break; case ADDR_ROUND_BY_HALF: // Fall through case ADDR_ROUND_TRUNCATE: // Fall through case ADDR_ROUND_DITHER: // Fall through case ADDR_UNCOMPRESSED: originalBits = bpp; break; default: originalBits = bpp; ADDR_ASSERT_ALWAYS(); break; } *pBpp = originalBits; } if (pWidth && pHeight) { width = *pWidth; height = *pHeight; if ((expandX > 1) || (expandY > 1)) { if (elemMode == ADDR_EXPANDED) { width /= expandX; height /= expandY; } else { width *= expandX; height *= expandY; } } *pWidth = (width == 0) ? 1 : width; *pHeight = (height == 0) ? 1 : height; } } /** **************************************************************************************************** * ElemLib::GetBitsPerPixel * * @brief * Compute the total bits per element according to a format * code. For compressed formats, this is not the same as * the number of bits per decompressed element. * * @return * Bits per pixel **************************************************************************************************** */ UINT_32 ElemLib::GetBitsPerPixel( AddrFormat format, ///< [in] surface format code ElemMode* pElemMode, ///< [out] element mode UINT_32* pExpandX, ///< [out] decompression expansion factor in X UINT_32* pExpandY, ///< [out] decompression expansion factor in Y UINT_32* pUnusedBits) ///< [out] bits unused { UINT_32 bpp; UINT_32 expandX = 1; UINT_32 expandY = 1; UINT_32 bitUnused = 0; ElemMode elemMode = ADDR_UNCOMPRESSED; // default value switch (format) { case ADDR_FMT_8: bpp = 8; break; case ADDR_FMT_1_5_5_5: case ADDR_FMT_5_6_5: case ADDR_FMT_6_5_5: case ADDR_FMT_8_8: case ADDR_FMT_4_4_4_4: case ADDR_FMT_16: bpp = 16; break; case ADDR_FMT_GB_GR: elemMode = ADDR_PACKED_GBGR; bpp = m_configFlags.use32bppFor422Fmt ? 32 : 16; expandX = m_configFlags.use32bppFor422Fmt ? 2 : 1; break; case ADDR_FMT_BG_RG: elemMode = ADDR_PACKED_BGRG; bpp = m_configFlags.use32bppFor422Fmt ? 32 : 16; expandX = m_configFlags.use32bppFor422Fmt ? 2 : 1; break; case ADDR_FMT_8_8_8_8: case ADDR_FMT_2_10_10_10: case ADDR_FMT_10_11_11: case ADDR_FMT_11_11_10: case ADDR_FMT_16_16: case ADDR_FMT_32: case ADDR_FMT_24_8: bpp = 32; break; case ADDR_FMT_16_16_16_16: case ADDR_FMT_32_32: case ADDR_FMT_CTX1: bpp = 64; break; case ADDR_FMT_32_32_32_32: bpp = 128; break; case ADDR_FMT_INVALID: bpp = 0; break; case ADDR_FMT_1_REVERSED: elemMode = ADDR_PACKED_REV; expandX = 8; bpp = 1; break; case ADDR_FMT_1: elemMode = ADDR_PACKED_STD; expandX = 8; bpp = 1; break; case ADDR_FMT_4_4: case ADDR_FMT_3_3_2: bpp = 8; break; case ADDR_FMT_5_5_5_1: bpp = 16; break; case ADDR_FMT_32_AS_8: case ADDR_FMT_32_AS_8_8: case ADDR_FMT_8_24: case ADDR_FMT_10_10_10_2: case ADDR_FMT_5_9_9_9_SHAREDEXP: bpp = 32; break; case ADDR_FMT_X24_8_32_FLOAT: bpp = 64; bitUnused = 24; break; case ADDR_FMT_8_8_8: elemMode = ADDR_EXPANDED; bpp = 24;//@@ 8; // read 3 elements per pixel expandX = 3; break; case ADDR_FMT_16_16_16: elemMode = ADDR_EXPANDED; bpp = 48;//@@ 16; // read 3 elements per pixel expandX = 3; break; case ADDR_FMT_32_32_32: elemMode = ADDR_EXPANDED; expandX = 3; bpp = 96;//@@ 32; // read 3 elements per pixel break; case ADDR_FMT_BC1: elemMode = ADDR_PACKED_BC1; expandX = 4; expandY = 4; bpp = 64; break; case ADDR_FMT_BC4: elemMode = ADDR_PACKED_BC4; expandX = 4; expandY = 4; bpp = 64; break; case ADDR_FMT_BC2: elemMode = ADDR_PACKED_BC2; expandX = 4; expandY = 4; bpp = 128; break; case ADDR_FMT_BC3: elemMode = ADDR_PACKED_BC3; expandX = 4; expandY = 4; bpp = 128; break; case ADDR_FMT_BC5: case ADDR_FMT_BC6: // reuse ADDR_PACKED_BC5 case ADDR_FMT_BC7: // reuse ADDR_PACKED_BC5 elemMode = ADDR_PACKED_BC5; expandX = 4; expandY = 4; bpp = 128; break; case ADDR_FMT_ETC2_64BPP: elemMode = ADDR_PACKED_ETC2_64BPP; expandX = 4; expandY = 4; bpp = 64; break; case ADDR_FMT_ETC2_128BPP: elemMode = ADDR_PACKED_ETC2_128BPP; expandX = 4; expandY = 4; bpp = 128; break; case ADDR_FMT_ASTC_4x4: elemMode = ADDR_PACKED_ASTC; expandX = 4; expandY = 4; bpp = 128; break; case ADDR_FMT_ASTC_5x4: elemMode = ADDR_PACKED_ASTC; expandX = 5; expandY = 4; bpp = 128; break; case ADDR_FMT_ASTC_5x5: elemMode = ADDR_PACKED_ASTC; expandX = 5; expandY = 5; bpp = 128; break; case ADDR_FMT_ASTC_6x5: elemMode = ADDR_PACKED_ASTC; expandX = 6; expandY = 5; bpp = 128; break; case ADDR_FMT_ASTC_6x6: elemMode = ADDR_PACKED_ASTC; expandX = 6; expandY = 6; bpp = 128; break; case ADDR_FMT_ASTC_8x5: elemMode = ADDR_PACKED_ASTC; expandX = 8; expandY = 5; bpp = 128; break; case ADDR_FMT_ASTC_8x6: elemMode = ADDR_PACKED_ASTC; expandX = 8; expandY = 6; bpp = 128; break; case ADDR_FMT_ASTC_8x8: elemMode = ADDR_PACKED_ASTC; expandX = 8; expandY = 8; bpp = 128; break; case ADDR_FMT_ASTC_10x5: elemMode = ADDR_PACKED_ASTC; expandX = 10; expandY = 5; bpp = 128; break; case ADDR_FMT_ASTC_10x6: elemMode = ADDR_PACKED_ASTC; expandX = 10; expandY = 6; bpp = 128; break; case ADDR_FMT_ASTC_10x8: elemMode = ADDR_PACKED_ASTC; expandX = 10; expandY = 8; bpp = 128; break; case ADDR_FMT_ASTC_10x10: elemMode = ADDR_PACKED_ASTC; expandX = 10; expandY = 10; bpp = 128; break; case ADDR_FMT_ASTC_12x10: elemMode = ADDR_PACKED_ASTC; expandX = 12; expandY = 10; bpp = 128; break; case ADDR_FMT_ASTC_12x12: elemMode = ADDR_PACKED_ASTC; expandX = 12; expandY = 12; bpp = 128; break; default: bpp = 0; ADDR_ASSERT_ALWAYS(); break; // @@ or should this be an error? } SafeAssign(pExpandX, expandX); SafeAssign(pExpandY, expandY); SafeAssign(pUnusedBits, bitUnused); SafeAssign(reinterpret_cast(pElemMode), elemMode); return bpp; } /** **************************************************************************************************** * ElemLib::GetCompBits * * @brief * Set each component's bit size and bit start. And set element mode and number type * * @return * N/A **************************************************************************************************** */ VOID ElemLib::GetCompBits( UINT_32 c0, ///< [in] bits of component 0 UINT_32 c1, ///< [in] bits of component 1 UINT_32 c2, ///< [in] bits of component 2 UINT_32 c3, ///< [in] bits of component 3 PixelFormatInfo* pInfo, ///< [out] per component info out ElemMode elemMode) ///< [in] element mode { pInfo->comps = 0; pInfo->compBit[0] = c0; pInfo->compBit[1] = c1; pInfo->compBit[2] = c2; pInfo->compBit[3] = c3; pInfo->compStart[0] = 0; pInfo->compStart[1] = c0; pInfo->compStart[2] = c0+c1; pInfo->compStart[3] = c0+c1+c2; pInfo->elemMode = elemMode; // still needed since component swap may depend on number of components for (INT i=0; i<4; i++) { if (pInfo->compBit[i] == 0) { pInfo->compStart[i] = 0; // all null components start at bit 0 pInfo->numType[i] = ADDR_NO_NUMBER; // and have no number type } else { pInfo->comps++; } } } /** **************************************************************************************************** * ElemLib::GetCompBits * * @brief * Set the clear color (or clear depth/stencil) for a surface * * @note * If clearColor is zero, a default clear value is used in place of comps[4]. * If float32 is set, full precision is used, else the mantissa is reduced to 12-bits * * @return * N/A **************************************************************************************************** */ VOID ElemLib::SetClearComps( ADDR_FLT_32 comps[4], ///< [in,out] components BOOL_32 clearColor, ///< [in] TRUE if clear color is set (CLEAR_COLOR) BOOL_32 float32) ///< [in] TRUE if float32 component (BLEND_FLOAT32) { INT_32 i; // Use default clearvalues if clearColor is disabled if (clearColor == FALSE) { for (i=0; i<3; i++) { comps[i].f = 0.0; } comps[3].f = 1.0; } // Otherwise use the (modified) clear value else { for (i=0; i<4; i++) { // If full precision, use clear value unchanged if (float32) { // Do nothing //comps[i] = comps[i]; } // Else if it is a NaN, use the standard NaN value else if ((comps[i].u & 0x7FFFFFFF) > 0x7F800000) { comps[i].u = 0xFFC00000; } // Else reduce the mantissa precision else { comps[i].u = comps[i].u & 0xFFFFF000; } } } } /** **************************************************************************************************** * ElemLib::IsBlockCompressed * * @brief * TRUE if this is block compressed format * * @note * * @return * BOOL_32 **************************************************************************************************** */ BOOL_32 ElemLib::IsBlockCompressed( AddrFormat format) ///< [in] Format { return (((format >= ADDR_FMT_BC1) && (format <= ADDR_FMT_BC7)) || ((format >= ADDR_FMT_ASTC_4x4) && (format <= ADDR_FMT_ETC2_128BPP))); } /** **************************************************************************************************** * ElemLib::IsCompressed * * @brief * TRUE if this is block compressed format or 1 bit format * * @note * * @return * BOOL_32 **************************************************************************************************** */ BOOL_32 ElemLib::IsCompressed( AddrFormat format) ///< [in] Format { return IsBlockCompressed(format) || format == ADDR_FMT_BC1 || format == ADDR_FMT_BC7; } /** **************************************************************************************************** * ElemLib::IsExpand3x * * @brief * TRUE if this is 3x expand format * * @note * * @return * BOOL_32 **************************************************************************************************** */ BOOL_32 ElemLib::IsExpand3x( AddrFormat format) ///< [in] Format { BOOL_32 is3x = FALSE; switch (format) { case ADDR_FMT_8_8_8: case ADDR_FMT_16_16_16: case ADDR_FMT_32_32_32: is3x = TRUE; break; default: break; } return is3x; } /** **************************************************************************************************** * ElemLib::IsMacroPixelPacked * * @brief * TRUE if this is a macro-pixel-packed format. * * @note * * @return * BOOL_32 **************************************************************************************************** */ BOOL_32 ElemLib::IsMacroPixelPacked( AddrFormat format) ///< [in] Format { BOOL_32 isMacroPixelPacked = FALSE; switch (format) { case ADDR_FMT_BG_RG: case ADDR_FMT_GB_GR: isMacroPixelPacked = TRUE; break; default: break; } return isMacroPixelPacked; } } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrelemlib.h000066400000000000000000000272371420110115200242440ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrelemlib.h * @brief Contains the class for element/pixel related functions. **************************************************************************************************** */ #ifndef __ELEM_LIB_H__ #define __ELEM_LIB_H__ #include "addrinterface.h" #include "addrobject.h" #include "addrcommon.h" namespace rocr { namespace Addr { class Lib; // The masks for property bits within the Properties INT_32 union ComponentFlags { struct { UINT_32 byteAligned : 1; ///< all components are byte aligned UINT_32 exportNorm : 1; ///< components support R6xx NORM compression UINT_32 floatComp : 1; ///< there is at least one floating point component }; UINT_32 value; }; // Copy from legacy lib's NumberType enum NumberType { // The following number types have the range [-1..1] ADDR_NO_NUMBER, // This component doesn't exist and has no default value ADDR_EPSILON, // Force component value to integer 0x00000001 ADDR_ZERO, // Force component value to integer 0x00000000 ADDR_ONE, // Force component value to floating point 1.0 // Above values don't have any bits per component (keep ADDR_ONE the last of these) ADDR_UNORM, // Unsigned normalized (repeating fraction) full precision ADDR_SNORM, // Signed normalized (repeating fraction) full precision ADDR_GAMMA, // Gamma-corrected, full precision ADDR_UNORM_R5XXRB, // Unsigned normalized (repeating fraction) for r5xx RB ADDR_SNORM_R5XXRB, // Signed normalized (repeating fraction) for r5xx RB ADDR_GAMMA_R5XXRB, // Gamma-corrected for r5xx RB (note: unnormalized value) ADDR_UNORM_R5XXBC, // Unsigned normalized (repeating fraction) for r5xx BC ADDR_SNORM_R5XXBC, // Signed normalized (repeating fraction) for r5xx BC ADDR_GAMMA_R5XXBC, // Gamma-corrected for r5xx BC (note: unnormalized value) ADDR_UNORM_R6XX, // Unsigned normalized (repeating fraction) for R6xx ADDR_UNORM_R6XXDB, // Unorms for 24-bit depth: one value differs from ADDR_UNORM_R6XX ADDR_SNORM_R6XX, // Signed normalized (repeating fraction) for R6xx ADDR_GAMMA8_R6XX, // Gamma-corrected for r6xx ADDR_GAMMA8_R7XX_TP, // Gamma-corrected for r7xx TP 12bit unorm 8.4. ADDR_U4FLOATC, // Unsigned float: 4-bit exponent, bias=15, no NaN, clamp [0..1] ADDR_GAMMA_4SEG, // Gamma-corrected, four segment approximation ADDR_U0FIXED, // Unsigned 0.N-bit fixed point // The following number types have large ranges (LEAVE ADDR_USCALED first or fix Finish routine) ADDR_USCALED, // Unsigned integer converted to/from floating point ADDR_SSCALED, // Signed integer converted to/from floating point ADDR_USCALED_R5XXRB, // Unsigned integer to/from floating point for r5xx RB ADDR_SSCALED_R5XXRB, // Signed integer to/from floating point for r5xx RB ADDR_UINT_BITS, // Keep in unsigned integer form, clamped to specified range ADDR_SINT_BITS, // Keep in signed integer form, clamped to specified range ADDR_UINTBITS, // @@ remove Keep in unsigned integer form, use modulus to reduce bits ADDR_SINTBITS, // @@ remove Keep in signed integer form, use modulus to reduce bits // The following number types and ADDR_U4FLOATC have exponents // (LEAVE ADDR_S8FLOAT first or fix Finish routine) ADDR_S8FLOAT, // Signed floating point with 8-bit exponent, bias=127 ADDR_S8FLOAT32, // 32-bit IEEE float, passes through NaN values ADDR_S5FLOAT, // Signed floating point with 5-bit exponent, bias=15 ADDR_S5FLOATM, // Signed floating point with 5-bit exponent, bias=15, no NaN/Inf ADDR_U5FLOAT, // Signed floating point with 5-bit exponent, bias=15 ADDR_U3FLOATM, // Unsigned floating point with 3-bit exponent, bias=3 ADDR_S5FIXED, // Signed 5.N-bit fixed point, with rounding ADDR_END_NUMBER // Used for range comparisons }; // Copy from legacy lib's AddrElement enum ElemMode { // These formats allow both packing an unpacking ADDR_ROUND_BY_HALF, // add 1/2 and truncate when packing this element ADDR_ROUND_TRUNCATE, // truncate toward 0 for sign/mag, else toward neg ADDR_ROUND_DITHER, // Pack by dithering -- requires (x,y) position // These formats only allow unpacking, no packing ADDR_UNCOMPRESSED, // Elements are not compressed: one data element per pixel/texel ADDR_EXPANDED, // Elements are split up and stored in multiple data elements ADDR_PACKED_STD, // Elements are compressed into ExpandX by ExpandY data elements ADDR_PACKED_REV, // Like ADDR_PACKED, but X order of pixels is reverved ADDR_PACKED_GBGR, // Elements are compressed 4:2:2 in G1B_G0R order (high to low) ADDR_PACKED_BGRG, // Elements are compressed 4:2:2 in BG1_RG0 order (high to low) ADDR_PACKED_BC1, // Each data element is uncompressed to a 4x4 pixel/texel array ADDR_PACKED_BC2, // Each data element is uncompressed to a 4x4 pixel/texel array ADDR_PACKED_BC3, // Each data element is uncompressed to a 4x4 pixel/texel array ADDR_PACKED_BC4, // Each data element is uncompressed to a 4x4 pixel/texel array ADDR_PACKED_BC5, // Each data element is uncompressed to a 4x4 pixel/texel array ADDR_PACKED_ETC2_64BPP, // ETC2 formats that use 64bpp to represent each 4x4 block ADDR_PACKED_ETC2_128BPP, // ETC2 formats that use 128bpp to represent each 4x4 block ADDR_PACKED_ASTC, // Various ASTC formats, all are 128bpp with varying block sizes // These formats provide various kinds of compression ADDR_ZPLANE_R5XX, // Compressed Zplane using r5xx architecture format ADDR_ZPLANE_R6XX, // Compressed Zplane using r6xx architecture format //@@ Fill in the compression modes ADDR_END_ELEMENT // Used for range comparisons }; enum DepthPlanarType { ADDR_DEPTH_PLANAR_NONE = 0, // No plane z/stencl ADDR_DEPTH_PLANAR_R600 = 1, // R600 z and stencil planes are store within a tile ADDR_DEPTH_PLANAR_R800 = 2, // R800 has separate z and stencil planes }; /** **************************************************************************************************** * PixelFormatInfo * * @brief * Per component info * **************************************************************************************************** */ struct PixelFormatInfo { UINT_32 compBit[4]; NumberType numType[4]; UINT_32 compStart[4]; ElemMode elemMode; UINT_32 comps; ///< Number of components }; /** **************************************************************************************************** * @brief This class contains asic indepentent element related attributes and operations **************************************************************************************************** */ class ElemLib : public Object { protected: ElemLib(Lib* pAddrLib); public: /// Makes this class virtual virtual ~ElemLib(); static ElemLib* Create( const Lib* pAddrLib); /// The implementation is only for R6xx/R7xx, so make it virtual in case we need for R8xx BOOL_32 PixGetExportNorm( AddrColorFormat colorFmt, AddrSurfaceNumber numberFmt, AddrSurfaceSwap swap) const; /// Below method are asic independent, so make them just static. /// Remove static if we need different operation in hwl. VOID Flt32ToDepthPixel( AddrDepthFormat format, const ADDR_FLT_32 comps[2], UINT_8 *pPixel) const; VOID Flt32ToColorPixel( AddrColorFormat format, AddrSurfaceNumber surfNum, AddrSurfaceSwap surfSwap, const ADDR_FLT_32 comps[4], UINT_8 *pPixel) const; static VOID Flt32sToInt32s( ADDR_FLT_32 value, UINT_32 bits, NumberType numberType, UINT_32* pResult); static VOID Int32sToPixel( UINT_32 numComps, UINT_32* pComps, UINT_32* pCompBits, UINT_32* pCompStart, ComponentFlags properties, UINT_32 resultBits, UINT_8* pPixel); VOID PixGetColorCompInfo( AddrColorFormat format, AddrSurfaceNumber number, AddrSurfaceSwap swap, PixelFormatInfo* pInfo) const; VOID PixGetDepthCompInfo( AddrDepthFormat format, PixelFormatInfo* pInfo) const; UINT_32 GetBitsPerPixel( AddrFormat format, ElemMode* pElemMode = NULL, UINT_32* pExpandX = NULL, UINT_32* pExpandY = NULL, UINT_32* pBitsUnused = NULL); static VOID SetClearComps( ADDR_FLT_32 comps[4], BOOL_32 clearColor, BOOL_32 float32); VOID AdjustSurfaceInfo( ElemMode elemMode, UINT_32 expandX, UINT_32 expandY, UINT_32* pBpp, UINT_32* pBasePitch, UINT_32* pWidth, UINT_32* pHeight); VOID RestoreSurfaceInfo( ElemMode elemMode, UINT_32 expandX, UINT_32 expandY, UINT_32* pBpp, UINT_32* pWidth, UINT_32* pHeight); /// Checks if depth and stencil are planar inside a tile BOOL_32 IsDepthStencilTilePlanar() { return (m_depthPlanarType == ADDR_DEPTH_PLANAR_R600) ? TRUE : FALSE; } /// Sets m_configFlags, copied from AddrLib VOID SetConfigFlags(ConfigFlags flags) { m_configFlags = flags; } static BOOL_32 IsCompressed(AddrFormat format); static BOOL_32 IsBlockCompressed(AddrFormat format); static BOOL_32 IsExpand3x(AddrFormat format); static BOOL_32 IsMacroPixelPacked(AddrFormat format); protected: static VOID GetCompBits( UINT_32 c0, UINT_32 c1, UINT_32 c2, UINT_32 c3, PixelFormatInfo* pInfo, ElemMode elemMode = ADDR_ROUND_BY_HALF); static VOID GetCompType( AddrColorFormat format, AddrSurfaceNumber numType, PixelFormatInfo* pInfo); static VOID GetCompSwap( AddrSurfaceSwap swap, PixelFormatInfo* pInfo); static VOID SwapComps( UINT_32 c0, UINT_32 c1, PixelFormatInfo* pInfo); private: UINT_32 m_fp16ExportNorm; ///< If allow FP16 to be reported as EXPORT_NORM DepthPlanarType m_depthPlanarType; ConfigFlags m_configFlags; ///< Copy of AddrLib's configFlags Addr::Lib* const m_pAddrLib; ///< Pointer to parent addrlib instance }; } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrlib.cpp000066400000000000000000000447671420110115200237430ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrlib.cpp * @brief Contains the implementation for the Addr::Lib class. **************************************************************************************************** */ #include "addrinterface.h" #include "addrlib.h" #include "addrcommon.h" #if defined(__APPLE__) UINT_32 div64_32(UINT_64 n, UINT_32 base) { UINT_64 rem = n; UINT_64 b = base; UINT_64 res, d = 1; UINT_32 high = rem >> 32; res = 0; if (high >= base) { high /= base; res = (UINT_64) high << 32; rem -= (UINT_64) (high * base) << 32; } while (((INT_64)b > 0) && (b < rem)) { b = b + b; d = d + d; } do { if (rem >= b) { rem -= b; res += d; } b >>= 1; d >>= 1; } while (d); n = res; return rem; } extern "C" UINT_32 __umoddi3(UINT_64 n, UINT_32 base) { return div64_32(n, base); } #endif // __APPLE__ namespace rocr { namespace Addr { //////////////////////////////////////////////////////////////////////////////////////////////////// // Constructor/Destructor //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Lib::Lib * * @brief * Constructor for the AddrLib class * **************************************************************************************************** */ Lib::Lib() : m_class(BASE_ADDRLIB), m_chipFamily(ADDR_CHIP_FAMILY_IVLD), m_chipRevision(0), m_version(ADDRLIB_VERSION), m_pipes(0), m_banks(0), m_pipeInterleaveBytes(0), m_rowSize(0), m_minPitchAlignPixels(1), m_maxSamples(8), m_pElemLib(NULL) { m_configFlags.value = 0; } /** **************************************************************************************************** * Lib::Lib * * @brief * Constructor for the AddrLib class with hClient as parameter * **************************************************************************************************** */ Lib::Lib(const Client* pClient) : Object(pClient), m_class(BASE_ADDRLIB), m_chipFamily(ADDR_CHIP_FAMILY_IVLD), m_chipRevision(0), m_version(ADDRLIB_VERSION), m_pipes(0), m_banks(0), m_pipeInterleaveBytes(0), m_rowSize(0), m_minPitchAlignPixels(1), m_maxSamples(8), m_pElemLib(NULL) { m_configFlags.value = 0; } /** **************************************************************************************************** * Lib::~AddrLib * * @brief * Destructor for the AddrLib class * **************************************************************************************************** */ Lib::~Lib() { if (m_pElemLib) { delete m_pElemLib; m_pElemLib = NULL; } } //////////////////////////////////////////////////////////////////////////////////////////////////// // Initialization/Helper //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Lib::Create * * @brief * Creates and initializes AddrLib object. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::Create( const ADDR_CREATE_INPUT* pCreateIn, ///< [in] pointer to ADDR_CREATE_INPUT ADDR_CREATE_OUTPUT* pCreateOut) ///< [out] pointer to ADDR_CREATE_OUTPUT { Lib* pLib = NULL; ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pCreateIn->createFlags.fillSizeFields == TRUE) { if ((pCreateIn->size != sizeof(ADDR_CREATE_INPUT)) || (pCreateOut->size != sizeof(ADDR_CREATE_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if ((returnCode == ADDR_OK) && (pCreateIn->callbacks.allocSysMem != NULL) && (pCreateIn->callbacks.freeSysMem != NULL)) { Client client = { pCreateIn->hClient, pCreateIn->callbacks }; switch (pCreateIn->chipEngine) { case CIASICIDGFXENGINE_SOUTHERNISLAND: switch (pCreateIn->chipFamily) { case FAMILY_SI: pLib = SiHwlInit(&client); break; case FAMILY_VI: case FAMILY_CZ: case FAMILY_CI: case FAMILY_KV: // CI based fusion pLib = CiHwlInit(&client); break; default: ADDR_ASSERT_ALWAYS(); break; } break; case CIASICIDGFXENGINE_ARCTICISLAND: switch (pCreateIn->chipFamily) { case FAMILY_AI: case FAMILY_RV: pLib = Gfx9HwlInit(&client); break; case FAMILY_NV: pLib = Gfx10HwlInit(&client); break; default: ADDR_ASSERT_ALWAYS(); break; } break; default: ADDR_ASSERT_ALWAYS(); break; } } if (pLib != NULL) { BOOL_32 initValid; // Pass createFlags to configFlags first since these flags may be overwritten pLib->m_configFlags.noCubeMipSlicesPad = pCreateIn->createFlags.noCubeMipSlicesPad; pLib->m_configFlags.fillSizeFields = pCreateIn->createFlags.fillSizeFields; pLib->m_configFlags.useTileIndex = pCreateIn->createFlags.useTileIndex; pLib->m_configFlags.useCombinedSwizzle = pCreateIn->createFlags.useCombinedSwizzle; pLib->m_configFlags.checkLast2DLevel = pCreateIn->createFlags.checkLast2DLevel; pLib->m_configFlags.useHtileSliceAlign = pCreateIn->createFlags.useHtileSliceAlign; pLib->m_configFlags.allowLargeThickTile = pCreateIn->createFlags.allowLargeThickTile; pLib->m_configFlags.forceDccAndTcCompat = pCreateIn->createFlags.forceDccAndTcCompat; pLib->m_configFlags.nonPower2MemConfig = pCreateIn->createFlags.nonPower2MemConfig; pLib->m_configFlags.disableLinearOpt = FALSE; pLib->SetChipFamily(pCreateIn->chipFamily, pCreateIn->chipRevision); pLib->SetMinPitchAlignPixels(pCreateIn->minPitchAlignPixels); // Global parameters initialized and remaining configFlags bits are set as well initValid = pLib->HwlInitGlobalParams(pCreateIn); if (initValid) { pLib->m_pElemLib = ElemLib::Create(pLib); } else { pLib->m_pElemLib = NULL; // Don't go on allocating element lib returnCode = ADDR_INVALIDGBREGVALUES; } if (pLib->m_pElemLib == NULL) { delete pLib; pLib = NULL; ADDR_ASSERT_ALWAYS(); } else { pLib->m_pElemLib->SetConfigFlags(pLib->m_configFlags); } } pCreateOut->hLib = pLib; if ((pLib != NULL) && (returnCode == ADDR_OK)) { pCreateOut->numEquations = pLib->HwlGetEquationTableInfo(&pCreateOut->pEquationTable); pLib->SetMaxAlignments(); } else if ((pLib == NULL) && (returnCode == ADDR_OK)) { // Unknown failures, we return the general error code returnCode = ADDR_ERROR; } return returnCode; } /** **************************************************************************************************** * Lib::SetChipFamily * * @brief * Convert familyID defined in atiid.h to ChipFamily and set m_chipFamily/m_chipRevision * @return * N/A **************************************************************************************************** */ VOID Lib::SetChipFamily( UINT_32 uChipFamily, ///< [in] chip family defined in atiih.h UINT_32 uChipRevision) ///< [in] chip revision defined in "asic_family"_id.h { ChipFamily family = HwlConvertChipFamily(uChipFamily, uChipRevision); ADDR_ASSERT(family != ADDR_CHIP_FAMILY_IVLD); m_chipFamily = family; m_chipRevision = uChipRevision; } /** **************************************************************************************************** * Lib::SetMinPitchAlignPixels * * @brief * Set m_minPitchAlignPixels with input param * * @return * N/A **************************************************************************************************** */ VOID Lib::SetMinPitchAlignPixels( UINT_32 minPitchAlignPixels) ///< [in] minmum pitch alignment in pixels { m_minPitchAlignPixels = (minPitchAlignPixels == 0) ? 1 : minPitchAlignPixels; } /** **************************************************************************************************** * Lib::SetMaxAlignments * * @brief * Set max alignments * * @return * N/A **************************************************************************************************** */ VOID Lib::SetMaxAlignments() { m_maxBaseAlign = HwlComputeMaxBaseAlignments(); m_maxMetaBaseAlign = HwlComputeMaxMetaBaseAlignments(); } /** **************************************************************************************************** * Lib::GetLib * * @brief * Get AddrLib pointer * * @return * An AddrLib class pointer **************************************************************************************************** */ Lib* Lib::GetLib( ADDR_HANDLE hLib) ///< [in] handle of ADDR_HANDLE { return static_cast(hLib); } /** **************************************************************************************************** * Lib::GetMaxAlignments * * @brief * Gets maximum alignments for data surface (include FMask) * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::GetMaxAlignments( ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if (pOut->size != sizeof(ADDR_GET_MAX_ALIGNMENTS_OUTPUT)) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { if (m_maxBaseAlign != 0) { pOut->baseAlign = m_maxBaseAlign; } else { returnCode = ADDR_NOTIMPLEMENTED; } } return returnCode; } /** **************************************************************************************************** * Lib::GetMaxMetaAlignments * * @brief * Gets maximum alignments for metadata (CMask, DCC and HTile) * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::GetMaxMetaAlignments( ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if (pOut->size != sizeof(ADDR_GET_MAX_ALIGNMENTS_OUTPUT)) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { if (m_maxMetaBaseAlign != 0) { pOut->baseAlign = m_maxMetaBaseAlign; } else { returnCode = ADDR_NOTIMPLEMENTED; } } return returnCode; } /** **************************************************************************************************** * Lib::Bits2Number * * @brief * Cat a array of binary bit to a number * * @return * The number combined with the array of bits **************************************************************************************************** */ UINT_32 Lib::Bits2Number( UINT_32 bitNum, ///< [in] how many bits ...) ///< [in] varaible bits value starting from MSB { UINT_32 number = 0; UINT_32 i; va_list bits_ptr; va_start(bits_ptr, bitNum); for(i = 0; i < bitNum; i++) { number |= va_arg(bits_ptr, UINT_32); number <<= 1; } number >>= 1; va_end(bits_ptr); return number; } //////////////////////////////////////////////////////////////////////////////////////////////////// // Element lib //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Lib::Flt32ToColorPixel * * @brief * Convert a FLT_32 value to a depth/stencil pixel value * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::Flt32ToDepthPixel( const ELEM_FLT32TODEPTHPIXEL_INPUT* pIn, ELEM_FLT32TODEPTHPIXEL_OUTPUT* pOut) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ELEM_FLT32TODEPTHPIXEL_INPUT)) || (pOut->size != sizeof(ELEM_FLT32TODEPTHPIXEL_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { GetElemLib()->Flt32ToDepthPixel(pIn->format, pIn->comps, pOut->pPixel); UINT_32 depthBase = 0; UINT_32 stencilBase = 0; UINT_32 depthBits = 0; UINT_32 stencilBits = 0; switch (pIn->format) { case ADDR_DEPTH_16: depthBits = 16; break; case ADDR_DEPTH_X8_24: case ADDR_DEPTH_8_24: case ADDR_DEPTH_X8_24_FLOAT: case ADDR_DEPTH_8_24_FLOAT: depthBase = 8; depthBits = 24; stencilBits = 8; break; case ADDR_DEPTH_32_FLOAT: depthBits = 32; break; case ADDR_DEPTH_X24_8_32_FLOAT: depthBase = 8; depthBits = 32; stencilBits = 8; break; default: break; } // Overwrite base since R800 has no "tileBase" if (GetElemLib()->IsDepthStencilTilePlanar() == FALSE) { depthBase = 0; stencilBase = 0; } depthBase *= 64; stencilBase *= 64; pOut->stencilBase = stencilBase; pOut->depthBase = depthBase; pOut->depthBits = depthBits; pOut->stencilBits = stencilBits; } return returnCode; } /** **************************************************************************************************** * Lib::Flt32ToColorPixel * * @brief * Convert a FLT_32 value to a red/green/blue/alpha pixel value * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::Flt32ToColorPixel( const ELEM_FLT32TOCOLORPIXEL_INPUT* pIn, ELEM_FLT32TOCOLORPIXEL_OUTPUT* pOut) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ELEM_FLT32TOCOLORPIXEL_INPUT)) || (pOut->size != sizeof(ELEM_FLT32TOCOLORPIXEL_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { GetElemLib()->Flt32ToColorPixel(pIn->format, pIn->surfNum, pIn->surfSwap, pIn->comps, pOut->pPixel); } return returnCode; } /** **************************************************************************************************** * Lib::GetExportNorm * * @brief * Check one format can be EXPORT_NUM * @return * TRUE if EXPORT_NORM can be used **************************************************************************************************** */ BOOL_32 Lib::GetExportNorm( const ELEM_GETEXPORTNORM_INPUT* pIn) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; BOOL_32 enabled = FALSE; if (GetFillSizeFieldsFlags() == TRUE) { if (pIn->size != sizeof(ELEM_GETEXPORTNORM_INPUT)) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { enabled = GetElemLib()->PixGetExportNorm(pIn->format, pIn->num, pIn->swap); } return enabled; } /** **************************************************************************************************** * Lib::GetBpe * * @brief * Get bits-per-element for specified format * @return * bits-per-element of specified format **************************************************************************************************** */ UINT_32 Lib::GetBpe(AddrFormat format) const { return GetElemLib()->GetBitsPerPixel(format); } } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrlib.h000066400000000000000000000333141420110115200233720ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrlib.h * @brief Contains the Addr::Lib base class definition. **************************************************************************************************** */ #ifndef __ADDR_LIB_H__ #define __ADDR_LIB_H__ #include "addrinterface.h" #include "addrobject.h" #include "addrelemlib.h" #include "amdgpu_asic_addr.h" #ifndef CIASICIDGFXENGINE_R600 #define CIASICIDGFXENGINE_R600 0x00000006 #endif #ifndef CIASICIDGFXENGINE_R800 #define CIASICIDGFXENGINE_R800 0x00000008 #endif #ifndef CIASICIDGFXENGINE_SOUTHERNISLAND #define CIASICIDGFXENGINE_SOUTHERNISLAND 0x0000000A #endif #ifndef CIASICIDGFXENGINE_ARCTICISLAND #define CIASICIDGFXENGINE_ARCTICISLAND 0x0000000D #endif namespace rocr { namespace Addr { /** **************************************************************************************************** * @brief Neutral enums that define pipeinterleave **************************************************************************************************** */ enum PipeInterleave { ADDR_PIPEINTERLEAVE_256B = 256, ADDR_PIPEINTERLEAVE_512B = 512, ADDR_PIPEINTERLEAVE_1KB = 1024, ADDR_PIPEINTERLEAVE_2KB = 2048, }; /** **************************************************************************************************** * @brief Neutral enums that define DRAM row size **************************************************************************************************** */ enum RowSize { ADDR_ROWSIZE_1KB = 1024, ADDR_ROWSIZE_2KB = 2048, ADDR_ROWSIZE_4KB = 4096, ADDR_ROWSIZE_8KB = 8192, }; /** **************************************************************************************************** * @brief Neutral enums that define bank interleave **************************************************************************************************** */ enum BankInterleave { ADDR_BANKINTERLEAVE_1 = 1, ADDR_BANKINTERLEAVE_2 = 2, ADDR_BANKINTERLEAVE_4 = 4, ADDR_BANKINTERLEAVE_8 = 8, }; /** **************************************************************************************************** * @brief Neutral enums that define shader engine tile size **************************************************************************************************** */ enum ShaderEngineTileSize { ADDR_SE_TILESIZE_16 = 16, ADDR_SE_TILESIZE_32 = 32, }; /** **************************************************************************************************** * @brief Neutral enums that define bank swap size **************************************************************************************************** */ enum BankSwapSize { ADDR_BANKSWAP_128B = 128, ADDR_BANKSWAP_256B = 256, ADDR_BANKSWAP_512B = 512, ADDR_BANKSWAP_1KB = 1024, }; /** **************************************************************************************************** * @brief Enums that define max compressed fragments config **************************************************************************************************** */ enum NumMaxCompressedFragmentsConfig { ADDR_CONFIG_1_MAX_COMPRESSED_FRAGMENTS = 0x00000000, ADDR_CONFIG_2_MAX_COMPRESSED_FRAGMENTS = 0x00000001, ADDR_CONFIG_4_MAX_COMPRESSED_FRAGMENTS = 0x00000002, ADDR_CONFIG_8_MAX_COMPRESSED_FRAGMENTS = 0x00000003, }; /** **************************************************************************************************** * @brief Enums that define num pipes config **************************************************************************************************** */ enum NumPipesConfig { ADDR_CONFIG_1_PIPE = 0x00000000, ADDR_CONFIG_2_PIPE = 0x00000001, ADDR_CONFIG_4_PIPE = 0x00000002, ADDR_CONFIG_8_PIPE = 0x00000003, ADDR_CONFIG_16_PIPE = 0x00000004, ADDR_CONFIG_32_PIPE = 0x00000005, ADDR_CONFIG_64_PIPE = 0x00000006, }; /** **************************************************************************************************** * @brief Enums that define num banks config **************************************************************************************************** */ enum NumBanksConfig { ADDR_CONFIG_1_BANK = 0x00000000, ADDR_CONFIG_2_BANK = 0x00000001, ADDR_CONFIG_4_BANK = 0x00000002, ADDR_CONFIG_8_BANK = 0x00000003, ADDR_CONFIG_16_BANK = 0x00000004, }; /** **************************************************************************************************** * @brief Enums that define num rb per shader engine config **************************************************************************************************** */ enum NumRbPerShaderEngineConfig { ADDR_CONFIG_1_RB_PER_SHADER_ENGINE = 0x00000000, ADDR_CONFIG_2_RB_PER_SHADER_ENGINE = 0x00000001, ADDR_CONFIG_4_RB_PER_SHADER_ENGINE = 0x00000002, }; /** **************************************************************************************************** * @brief Enums that define num shader engines config **************************************************************************************************** */ enum NumShaderEnginesConfig { ADDR_CONFIG_1_SHADER_ENGINE = 0x00000000, ADDR_CONFIG_2_SHADER_ENGINE = 0x00000001, ADDR_CONFIG_4_SHADER_ENGINE = 0x00000002, ADDR_CONFIG_8_SHADER_ENGINE = 0x00000003, }; /** **************************************************************************************************** * @brief Enums that define pipe interleave size config **************************************************************************************************** */ enum PipeInterleaveSizeConfig { ADDR_CONFIG_PIPE_INTERLEAVE_256B = 0x00000000, ADDR_CONFIG_PIPE_INTERLEAVE_512B = 0x00000001, ADDR_CONFIG_PIPE_INTERLEAVE_1KB = 0x00000002, ADDR_CONFIG_PIPE_INTERLEAVE_2KB = 0x00000003, }; /** **************************************************************************************************** * @brief Enums that define row size config **************************************************************************************************** */ enum RowSizeConfig { ADDR_CONFIG_1KB_ROW = 0x00000000, ADDR_CONFIG_2KB_ROW = 0x00000001, ADDR_CONFIG_4KB_ROW = 0x00000002, }; /** **************************************************************************************************** * @brief Enums that define bank interleave size config **************************************************************************************************** */ enum BankInterleaveSizeConfig { ADDR_CONFIG_BANK_INTERLEAVE_1 = 0x00000000, ADDR_CONFIG_BANK_INTERLEAVE_2 = 0x00000001, ADDR_CONFIG_BANK_INTERLEAVE_4 = 0x00000002, ADDR_CONFIG_BANK_INTERLEAVE_8 = 0x00000003, }; /** **************************************************************************************************** * @brief Enums that define engine tile size config **************************************************************************************************** */ enum ShaderEngineTileSizeConfig { ADDR_CONFIG_SE_TILE_16 = 0x00000000, ADDR_CONFIG_SE_TILE_32 = 0x00000001, }; /** **************************************************************************************************** * @brief This class contains asic independent address lib functionalities **************************************************************************************************** */ class Lib : public Object { public: virtual ~Lib(); static ADDR_E_RETURNCODE Create( const ADDR_CREATE_INPUT* pCreateInfo, ADDR_CREATE_OUTPUT* pCreateOut); /// Pair of Create VOID Destroy() { delete this; } static Lib* GetLib(ADDR_HANDLE hLib); /// Returns AddrLib version (from compiled binary instead include file) UINT_32 GetVersion() { return m_version; } /// Returns asic chip family name defined by AddrLib ChipFamily GetChipFamily() { return m_chipFamily; } ADDR_E_RETURNCODE Flt32ToDepthPixel( const ELEM_FLT32TODEPTHPIXEL_INPUT* pIn, ELEM_FLT32TODEPTHPIXEL_OUTPUT* pOut) const; ADDR_E_RETURNCODE Flt32ToColorPixel( const ELEM_FLT32TOCOLORPIXEL_INPUT* pIn, ELEM_FLT32TOCOLORPIXEL_OUTPUT* pOut) const; BOOL_32 GetExportNorm(const ELEM_GETEXPORTNORM_INPUT* pIn) const; ADDR_E_RETURNCODE GetMaxAlignments(ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) const; ADDR_E_RETURNCODE GetMaxMetaAlignments(ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) const; UINT_32 GetBpe(AddrFormat format) const; protected: Lib(); // Constructor is protected Lib(const Client* pClient); /// Pure virtual function to get max base alignments virtual UINT_32 HwlComputeMaxBaseAlignments() const = 0; /// Gets maximum alignements for metadata virtual UINT_32 HwlComputeMaxMetaBaseAlignments() const { ADDR_NOT_IMPLEMENTED(); return 0; } VOID ValidBaseAlignments(UINT_32 alignment) const { #if DEBUG ADDR_ASSERT(alignment <= m_maxBaseAlign); #endif } VOID ValidMetaBaseAlignments(UINT_32 metaAlignment) const { #if DEBUG ADDR_ASSERT(metaAlignment <= m_maxMetaBaseAlign); #endif } // // Initialization // /// Pure Virtual function for Hwl computing internal global parameters from h/w registers virtual BOOL_32 HwlInitGlobalParams(const ADDR_CREATE_INPUT* pCreateIn) = 0; /// Pure Virtual function for Hwl converting chip family virtual ChipFamily HwlConvertChipFamily(UINT_32 uChipFamily, UINT_32 uChipRevision) = 0; /// Get equation table pointer and number of equations virtual UINT_32 HwlGetEquationTableInfo(const ADDR_EQUATION** ppEquationTable) const { *ppEquationTable = NULL; return 0; } // // Misc helper // static UINT_32 Bits2Number(UINT_32 bitNum, ...); static UINT_32 GetNumFragments(UINT_32 numSamples, UINT_32 numFrags) { return (numFrags != 0) ? numFrags : Max(1u, numSamples); } /// Returns pointer of ElemLib ElemLib* GetElemLib() const { return m_pElemLib; } /// Returns fillSizeFields flag UINT_32 GetFillSizeFieldsFlags() const { return m_configFlags.fillSizeFields; } private: // Disallow the copy constructor Lib(const Lib& a); // Disallow the assignment operator Lib& operator=(const Lib& a); VOID SetChipFamily(UINT_32 uChipFamily, UINT_32 uChipRevision); VOID SetMinPitchAlignPixels(UINT_32 minPitchAlignPixels); VOID SetMaxAlignments(); protected: LibClass m_class; ///< Store class type (HWL type) ChipFamily m_chipFamily; ///< Chip family translated from the one in atiid.h UINT_32 m_chipRevision; ///< Revision id from xxx_id.h UINT_32 m_version; ///< Current version // // Global parameters // ConfigFlags m_configFlags; ///< Global configuration flags. Note this is setup by /// AddrLib instead of Client except forceLinearAligned UINT_32 m_pipes; ///< Number of pipes UINT_32 m_banks; ///< Number of banks /// For r800 this is MC_ARB_RAMCFG.NOOFBANK /// Keep it here to do default parameter calculation UINT_32 m_pipeInterleaveBytes; ///< Specifies the size of contiguous address space /// within each tiling pipe when making linear /// accesses. (Formerly Group Size) UINT_32 m_rowSize; ///< DRAM row size, in bytes UINT_32 m_minPitchAlignPixels; ///< Minimum pitch alignment in pixels UINT_32 m_maxSamples; ///< Max numSamples UINT_32 m_maxBaseAlign; ///< Max base alignment for data surface UINT_32 m_maxMetaBaseAlign; ///< Max base alignment for metadata private: ElemLib* m_pElemLib; ///< Element Lib pointer }; Lib* SiHwlInit (const Client* pClient); Lib* CiHwlInit (const Client* pClient); Lib* Gfx9HwlInit (const Client* pClient); Lib* Gfx10HwlInit(const Client* pClient); } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrlib1.cpp000066400000000000000000003762111420110115200240140ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addr1lib.cpp * @brief Contains the implementation for the Addr::V1::Lib base class. **************************************************************************************************** */ #include "addrinterface.h" #include "addrlib1.h" #include "addrcommon.h" namespace rocr { namespace Addr { namespace V1 { //////////////////////////////////////////////////////////////////////////////////////////////////// // Static Const Member //////////////////////////////////////////////////////////////////////////////////////////////////// const TileModeFlags Lib::ModeFlags[ADDR_TM_COUNT] = {// T L 1 2 3 P Pr B {1, 1, 0, 0, 0, 0, 0, 0}, // ADDR_TM_LINEAR_GENERAL {1, 1, 0, 0, 0, 0, 0, 0}, // ADDR_TM_LINEAR_ALIGNED {1, 0, 1, 0, 0, 0, 0, 0}, // ADDR_TM_1D_TILED_THIN1 {4, 0, 1, 0, 0, 0, 0, 0}, // ADDR_TM_1D_TILED_THICK {1, 0, 0, 1, 0, 0, 0, 0}, // ADDR_TM_2D_TILED_THIN1 {1, 0, 0, 1, 0, 0, 0, 0}, // ADDR_TM_2D_TILED_THIN2 {1, 0, 0, 1, 0, 0, 0, 0}, // ADDR_TM_2D_TILED_THIN4 {4, 0, 0, 1, 0, 0, 0, 0}, // ADDR_TM_2D_TILED_THICK {1, 0, 0, 1, 0, 0, 0, 1}, // ADDR_TM_2B_TILED_THIN1 {1, 0, 0, 1, 0, 0, 0, 1}, // ADDR_TM_2B_TILED_THIN2 {1, 0, 0, 1, 0, 0, 0, 1}, // ADDR_TM_2B_TILED_THIN4 {4, 0, 0, 1, 0, 0, 0, 1}, // ADDR_TM_2B_TILED_THICK {1, 0, 0, 1, 1, 0, 0, 0}, // ADDR_TM_3D_TILED_THIN1 {4, 0, 0, 1, 1, 0, 0, 0}, // ADDR_TM_3D_TILED_THICK {1, 0, 0, 1, 1, 0, 0, 1}, // ADDR_TM_3B_TILED_THIN1 {4, 0, 0, 1, 1, 0, 0, 1}, // ADDR_TM_3B_TILED_THICK {8, 0, 0, 1, 0, 0, 0, 0}, // ADDR_TM_2D_TILED_XTHICK {8, 0, 0, 1, 1, 0, 0, 0}, // ADDR_TM_3D_TILED_XTHICK {1, 0, 0, 0, 0, 0, 0, 0}, // ADDR_TM_POWER_SAVE {1, 0, 0, 1, 0, 1, 1, 0}, // ADDR_TM_PRT_TILED_THIN1 {1, 0, 0, 1, 0, 1, 0, 0}, // ADDR_TM_PRT_2D_TILED_THIN1 {1, 0, 0, 1, 1, 1, 0, 0}, // ADDR_TM_PRT_3D_TILED_THIN1 {4, 0, 0, 1, 0, 1, 1, 0}, // ADDR_TM_PRT_TILED_THICK {4, 0, 0, 1, 0, 1, 0, 0}, // ADDR_TM_PRT_2D_TILED_THICK {4, 0, 0, 1, 1, 1, 0, 0}, // ADDR_TM_PRT_3D_TILED_THICK {0, 0, 0, 0, 0, 0, 0, 0}, // ADDR_TM_UNKNOWN }; //////////////////////////////////////////////////////////////////////////////////////////////////// // Constructor/Destructor //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Lib::AddrLib1 * * @brief * Constructor for the AddrLib1 class * **************************************************************************************************** */ Lib::Lib() : Addr::Lib() { } /** **************************************************************************************************** * Lib::Lib * * @brief * Constructor for the Addr::V1::Lib class with hClient as parameter * **************************************************************************************************** */ Lib::Lib(const Client* pClient) : Addr::Lib(pClient) { } /** **************************************************************************************************** * Lib::~AddrLib1 * * @brief * Destructor for the AddrLib1 class * **************************************************************************************************** */ Lib::~Lib() { } /** **************************************************************************************************** * Lib::GetLib * * @brief * Get AddrLib1 pointer * * @return * An Addr::V1::Lib class pointer **************************************************************************************************** */ Lib* Lib::GetLib( ADDR_HANDLE hLib) ///< [in] handle of ADDR_HANDLE { Addr::Lib* pAddrLib = Addr::Lib::GetLib(hLib); if ((pAddrLib != NULL) && ((pAddrLib->GetChipFamily() == ADDR_CHIP_FAMILY_IVLD) || (pAddrLib->GetChipFamily() > ADDR_CHIP_FAMILY_VI))) { // only valid and pre-VI ASIC can use AddrLib1 function. ADDR_ASSERT_ALWAYS(); hLib = NULL; } return static_cast(hLib); } //////////////////////////////////////////////////////////////////////////////////////////////////// // Surface Methods //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Lib::ComputeSurfaceInfo * * @brief * Interface function stub of AddrComputeSurfaceInfo. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_SURFACE_INFO_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_SURFACE_INFO_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } // We suggest client do sanity check but a check here is also good if (pIn->bpp > 128) { returnCode = ADDR_INVALIDPARAMS; } if ((pIn->tileMode == ADDR_TM_UNKNOWN) && (pIn->mipLevel > 0)) { returnCode = ADDR_INVALIDPARAMS; } // Thick modes don't support multisample if ((Thickness(pIn->tileMode) > 1) && (pIn->numSamples > 1)) { returnCode = ADDR_INVALIDPARAMS; } if (returnCode == ADDR_OK) { // Get a local copy of input structure and only reference pIn for unadjusted values ADDR_COMPUTE_SURFACE_INFO_INPUT localIn = *pIn; ADDR_TILEINFO tileInfoNull = {0}; if (UseTileInfo()) { // If the original input has a valid ADDR_TILEINFO pointer then copy its contents. // Otherwise the default 0's in tileInfoNull are used. if (pIn->pTileInfo) { tileInfoNull = *pIn->pTileInfo; } localIn.pTileInfo = &tileInfoNull; } localIn.numSamples = (pIn->numSamples == 0) ? 1 : pIn->numSamples; // Do mipmap check first // If format is BCn, pre-pad dimension to power-of-two according to HWL ComputeMipLevel(&localIn); if (m_configFlags.checkLast2DLevel) { // Save this level's original height in pixels pOut->height = pIn->height; } UINT_32 expandX = 1; UINT_32 expandY = 1; ElemMode elemMode; // Save outputs that may not go through HWL pOut->pixelBits = localIn.bpp; pOut->numSamples = localIn.numSamples; pOut->last2DLevel = FALSE; pOut->tcCompatible = FALSE; #if !ALT_TEST if (localIn.numSamples > 1) { ADDR_ASSERT(localIn.mipLevel == 0); } #endif if (localIn.format != ADDR_FMT_INVALID) // Set format to INVALID will skip this conversion { // Get compression/expansion factors and element mode // (which indicates compression/expansion localIn.bpp = GetElemLib()->GetBitsPerPixel(localIn.format, &elemMode, &expandX, &expandY); // Special flag for 96 bit surface. 96 (or 48 if we support) bit surface's width is // pre-multiplied by 3 and bpp is divided by 3. So pitch alignment for linear- // aligned does not meet 64-pixel in real. We keep special handling in hwl since hw // restrictions are different. // Also Mip 1+ needs an element pitch of 32 bits so we do not need this workaround // but we use this flag to skip RestoreSurfaceInfo below if ((elemMode == ADDR_EXPANDED) && (expandX > 1)) { ADDR_ASSERT(IsLinear(localIn.tileMode)); } GetElemLib()->AdjustSurfaceInfo(elemMode, expandX, expandY, &localIn.bpp, &localIn.basePitch, &localIn.width, &localIn.height); // Overwrite these parameters if we have a valid format } else if (localIn.bpp != 0) { localIn.width = (localIn.width != 0) ? localIn.width : 1; localIn.height = (localIn.height != 0) ? localIn.height : 1; } else // Rule out some invalid parameters { ADDR_ASSERT_ALWAYS(); returnCode = ADDR_INVALIDPARAMS; } // Check mipmap after surface expansion if (returnCode == ADDR_OK) { returnCode = PostComputeMipLevel(&localIn, pOut); } if (returnCode == ADDR_OK) { if (UseTileIndex(localIn.tileIndex)) { // Make sure pTileInfo is not NULL ADDR_ASSERT(localIn.pTileInfo); UINT_32 numSamples = GetNumFragments(localIn.numSamples, localIn.numFrags); INT_32 macroModeIndex = TileIndexNoMacroIndex; if (localIn.tileIndex != TileIndexLinearGeneral) { // Try finding a macroModeIndex macroModeIndex = HwlComputeMacroModeIndex(localIn.tileIndex, localIn.flags, localIn.bpp, numSamples, localIn.pTileInfo, &localIn.tileMode, &localIn.tileType); } // If macroModeIndex is not needed, then call HwlSetupTileCfg to get tile info if (macroModeIndex == TileIndexNoMacroIndex) { returnCode = HwlSetupTileCfg(localIn.bpp, localIn.tileIndex, macroModeIndex, localIn.pTileInfo, &localIn.tileMode, &localIn.tileType); } // If macroModeIndex is invalid, then assert this is not macro tiled else if (macroModeIndex == TileIndexInvalid) { ADDR_ASSERT(!IsMacroTiled(localIn.tileMode)); } pOut->macroModeIndex = macroModeIndex; } } if (returnCode == ADDR_OK) { localIn.flags.dccPipeWorkaround = localIn.flags.dccCompatible; if (localIn.tileMode == ADDR_TM_UNKNOWN) { // HWL layer may override tile mode if necessary HwlSelectTileMode(&localIn); } else { // HWL layer may override tile mode if necessary HwlOverrideTileMode(&localIn); // Optimize tile mode if possible OptimizeTileMode(&localIn); } } // Call main function to compute surface info if (returnCode == ADDR_OK) { returnCode = HwlComputeSurfaceInfo(&localIn, pOut); } if (returnCode == ADDR_OK) { // Since bpp might be changed we just pass it through pOut->bpp = localIn.bpp; // Also original width/height/bpp pOut->pixelPitch = pOut->pitch; pOut->pixelHeight = pOut->height; #if DEBUG if (localIn.flags.display) { ADDR_ASSERT((pOut->pitchAlign % 32) == 0); } #endif //DEBUG if (localIn.format != ADDR_FMT_INVALID) { // // Note: For 96 bit surface, the pixelPitch returned might be an odd number, but it // is okay to program texture pitch as HW's mip calculator would multiply 3 first, // then do the appropriate paddings (linear alignment requirement and possible the // nearest power-of-two for mipmaps), which results in the original pitch. // GetElemLib()->RestoreSurfaceInfo(elemMode, expandX, expandY, &localIn.bpp, &pOut->pixelPitch, &pOut->pixelHeight); } if (localIn.flags.qbStereo) { if (pOut->pStereoInfo) { ComputeQbStereoInfo(pOut); } } if (localIn.flags.volume) // For volume sliceSize equals to all z-slices { pOut->sliceSize = pOut->surfSize; } else // For array: sliceSize is likely to have slice-padding (the last one) { pOut->sliceSize = pOut->surfSize / pOut->depth; // array or cubemap if (pIn->numSlices > 1) { // If this is the last slice then add the padding size to this slice if (pIn->slice == (pIn->numSlices - 1)) { pOut->sliceSize += pOut->sliceSize * (pOut->depth - pIn->numSlices); } else if (m_configFlags.checkLast2DLevel) { // Reset last2DLevel flag if this is not the last array slice pOut->last2DLevel = FALSE; } } } pOut->pitchTileMax = pOut->pitch / 8 - 1; pOut->heightTileMax = pOut->height / 8 - 1; pOut->sliceTileMax = pOut->pitch * pOut->height / 64 - 1; } } ValidBaseAlignments(pOut->baseAlign); return returnCode; } /** **************************************************************************************************** * Lib::ComputeSurfaceInfo * * @brief * Interface function stub of AddrComputeSurfaceInfo. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeSurfaceAddrFromCoord( const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; const ADDR_SURFACE_FLAGS flags = {{0}}; UINT_32 numSamples = GetNumFragments(pIn->numSamples, pIn->numFrags); // Try finding a macroModeIndex INT_32 macroModeIndex = HwlComputeMacroModeIndex(input.tileIndex, flags, input.bpp, numSamples, input.pTileInfo, &input.tileMode, &input.tileType); // If macroModeIndex is not needed, then call HwlSetupTileCfg to get tile info if (macroModeIndex == TileIndexNoMacroIndex) { returnCode = HwlSetupTileCfg(input.bpp, input.tileIndex, macroModeIndex, input.pTileInfo, &input.tileMode, &input.tileType); } // If macroModeIndex is invalid, then assert this is not macro tiled else if (macroModeIndex == TileIndexInvalid) { ADDR_ASSERT(!IsMacroTiled(input.tileMode)); } // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { returnCode = HwlComputeSurfaceAddrFromCoord(pIn, pOut); if (returnCode == ADDR_OK) { pOut->prtBlockIndex = static_cast(pOut->addr / (64 * 1024)); } } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeSurfaceCoordFromAddr * * @brief * Interface function stub of ComputeSurfaceCoordFromAddr. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeSurfaceCoordFromAddr( const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; const ADDR_SURFACE_FLAGS flags = {{0}}; UINT_32 numSamples = GetNumFragments(pIn->numSamples, pIn->numFrags); // Try finding a macroModeIndex INT_32 macroModeIndex = HwlComputeMacroModeIndex(input.tileIndex, flags, input.bpp, numSamples, input.pTileInfo, &input.tileMode, &input.tileType); // If macroModeIndex is not needed, then call HwlSetupTileCfg to get tile info if (macroModeIndex == TileIndexNoMacroIndex) { returnCode = HwlSetupTileCfg(input.bpp, input.tileIndex, macroModeIndex, input.pTileInfo, &input.tileMode, &input.tileType); } // If macroModeIndex is invalid, then assert this is not macro tiled else if (macroModeIndex == TileIndexInvalid) { ADDR_ASSERT(!IsMacroTiled(input.tileMode)); } // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { returnCode = HwlComputeSurfaceCoordFromAddr(pIn, pOut); } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeSliceTileSwizzle * * @brief * Interface function stub of ComputeSliceTileSwizzle. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeSliceTileSwizzle( const ADDR_COMPUTE_SLICESWIZZLE_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SLICESWIZZLE_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_SLICESWIZZLE_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_SLICESWIZZLE_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_SLICESWIZZLE_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo, &input.tileMode); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { returnCode = HwlComputeSliceTileSwizzle(pIn, pOut); } } return returnCode; } /** **************************************************************************************************** * Lib::ExtractBankPipeSwizzle * * @brief * Interface function stub of AddrExtractBankPipeSwizzle. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ExtractBankPipeSwizzle( const ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT* pIn, ///< [in] input structure ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT)) || (pOut->size != sizeof(ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { returnCode = HwlExtractBankPipeSwizzle(pIn, pOut); } } return returnCode; } /** **************************************************************************************************** * Lib::CombineBankPipeSwizzle * * @brief * Interface function stub of AddrCombineBankPipeSwizzle. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::CombineBankPipeSwizzle( const ADDR_COMBINE_BANKPIPE_SWIZZLE_INPUT* pIn, ///< [in] input structure ADDR_COMBINE_BANKPIPE_SWIZZLE_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_FMASK_INFO_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_FMASK_INFO_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMBINE_BANKPIPE_SWIZZLE_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { returnCode = HwlCombineBankPipeSwizzle(pIn->bankSwizzle, pIn->pipeSwizzle, pIn->pTileInfo, pIn->baseAddr, &pOut->tileSwizzle); } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeBaseSwizzle * * @brief * Interface function stub of AddrCompueBaseSwizzle. * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeBaseSwizzle( const ADDR_COMPUTE_BASE_SWIZZLE_INPUT* pIn, ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT* pOut) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_BASE_SWIZZLE_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_BASE_SWIZZLE_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { if (IsMacroTiled(pIn->tileMode)) { returnCode = HwlComputeBaseSwizzle(pIn, pOut); } else { pOut->tileSwizzle = 0; } } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeFmaskInfo * * @brief * Interface function stub of ComputeFmaskInfo. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut ///< [out] output structure ) { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_FMASK_INFO_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_FMASK_INFO_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } // No thick MSAA if (Thickness(pIn->tileMode) > 1) { returnCode = ADDR_INVALIDPARAMS; } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_FMASK_INFO_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; if (pOut->pTileInfo) { // Use temp tile info for calcalation input.pTileInfo = pOut->pTileInfo; } else { input.pTileInfo = &tileInfoNull; } ADDR_SURFACE_FLAGS flags = {{0}}; flags.fmask = 1; // Try finding a macroModeIndex INT_32 macroModeIndex = HwlComputeMacroModeIndex(pIn->tileIndex, flags, HwlComputeFmaskBits(pIn, NULL), pIn->numSamples, input.pTileInfo, &input.tileMode); // If macroModeIndex is not needed, then call HwlSetupTileCfg to get tile info if (macroModeIndex == TileIndexNoMacroIndex) { returnCode = HwlSetupTileCfg(0, input.tileIndex, macroModeIndex, input.pTileInfo, &input.tileMode); } ADDR_ASSERT(macroModeIndex != TileIndexInvalid); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { if (pIn->numSamples > 1) { returnCode = HwlComputeFmaskInfo(pIn, pOut); } else { memset(pOut, 0, sizeof(ADDR_COMPUTE_FMASK_INFO_OUTPUT)); returnCode = ADDR_INVALIDPARAMS; } } } ValidBaseAlignments(pOut->baseAlign); return returnCode; } /** **************************************************************************************************** * Lib::ComputeFmaskAddrFromCoord * * @brief * Interface function stub of ComputeFmaskAddrFromCoord. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeFmaskAddrFromCoord( const ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_ASSERT(pIn->numSamples > 1); if (pIn->numSamples > 1) { returnCode = HwlComputeFmaskAddrFromCoord(pIn, pOut); } else { returnCode = ADDR_INVALIDPARAMS; } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeFmaskCoordFromAddr * * @brief * Interface function stub of ComputeFmaskAddrFromCoord. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeFmaskCoordFromAddr( const ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_ASSERT(pIn->numSamples > 1); if (pIn->numSamples > 1) { returnCode = HwlComputeFmaskCoordFromAddr(pIn, pOut); } else { returnCode = ADDR_INVALIDPARAMS; } } return returnCode; } /** **************************************************************************************************** * Lib::ConvertTileInfoToHW * * @brief * Convert tile info from real value to HW register value in HW layer * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ConvertTileInfoToHW( const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ///< [in] input structure ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_CONVERT_TILEINFOTOHW_INPUT)) || (pOut->size != sizeof(ADDR_CONVERT_TILEINFOTOHW_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_CONVERT_TILEINFOTOHW_INPUT input; // if pIn->reverse is TRUE, indices are ignored if (pIn->reverse == FALSE && UseTileIndex(pIn->tileIndex)) { input = *pIn; input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(input.bpp, input.tileIndex, input.macroModeIndex, input.pTileInfo); pIn = &input; } if (returnCode == ADDR_OK) { returnCode = HwlConvertTileInfoToHW(pIn, pOut); } } return returnCode; } /** **************************************************************************************************** * Lib::ConvertTileIndex * * @brief * Convert tile index to tile mode/type/info * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ConvertTileIndex( const ADDR_CONVERT_TILEINDEX_INPUT* pIn, ///< [in] input structure ADDR_CONVERT_TILEINDEX_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_CONVERT_TILEINDEX_INPUT)) || (pOut->size != sizeof(ADDR_CONVERT_TILEINDEX_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { returnCode = HwlSetupTileCfg(pIn->bpp, pIn->tileIndex, pIn->macroModeIndex, pOut->pTileInfo, &pOut->tileMode, &pOut->tileType); if (returnCode == ADDR_OK && pIn->tileInfoHw) { ADDR_CONVERT_TILEINFOTOHW_INPUT hwInput = {0}; ADDR_CONVERT_TILEINFOTOHW_OUTPUT hwOutput = {0}; hwInput.pTileInfo = pOut->pTileInfo; hwInput.tileIndex = -1; hwOutput.pTileInfo = pOut->pTileInfo; returnCode = HwlConvertTileInfoToHW(&hwInput, &hwOutput); } } return returnCode; } /** **************************************************************************************************** * Lib::GetMacroModeIndex * * @brief * Get macro mode index based on input info * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::GetMacroModeIndex( const ADDR_GET_MACROMODEINDEX_INPUT* pIn, ///< [in] input structure ADDR_GET_MACROMODEINDEX_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags()) { if ((pIn->size != sizeof(ADDR_GET_MACROMODEINDEX_INPUT)) || (pOut->size != sizeof(ADDR_GET_MACROMODEINDEX_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfo = {0}; pOut->macroModeIndex = HwlComputeMacroModeIndex(pIn->tileIndex, pIn->flags, pIn->bpp, pIn->numFrags, &tileInfo); } return returnCode; } /** **************************************************************************************************** * Lib::ConvertTileIndex1 * * @brief * Convert tile index to tile mode/type/info * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ConvertTileIndex1( const ADDR_CONVERT_TILEINDEX1_INPUT* pIn, ///< [in] input structure ADDR_CONVERT_TILEINDEX_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_CONVERT_TILEINDEX1_INPUT)) || (pOut->size != sizeof(ADDR_CONVERT_TILEINDEX_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_SURFACE_FLAGS flags = {{0}}; HwlComputeMacroModeIndex(pIn->tileIndex, flags, pIn->bpp, pIn->numSamples, pOut->pTileInfo, &pOut->tileMode, &pOut->tileType); if (pIn->tileInfoHw) { ADDR_CONVERT_TILEINFOTOHW_INPUT hwInput = {0}; ADDR_CONVERT_TILEINFOTOHW_OUTPUT hwOutput = {0}; hwInput.pTileInfo = pOut->pTileInfo; hwInput.tileIndex = -1; hwOutput.pTileInfo = pOut->pTileInfo; returnCode = HwlConvertTileInfoToHW(&hwInput, &hwOutput); } } return returnCode; } /** **************************************************************************************************** * Lib::GetTileIndex * * @brief * Get tile index from tile mode/type/info * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::GetTileIndex( const ADDR_GET_TILEINDEX_INPUT* pIn, ///< [in] input structure ADDR_GET_TILEINDEX_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_GET_TILEINDEX_INPUT)) || (pOut->size != sizeof(ADDR_GET_TILEINDEX_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { returnCode = HwlGetTileIndex(pIn, pOut); } return returnCode; } /** **************************************************************************************************** * Lib::Thickness * * @brief * Get tile mode thickness * * @return * Tile mode thickness **************************************************************************************************** */ UINT_32 Lib::Thickness( AddrTileMode tileMode) ///< [in] tile mode { return ModeFlags[tileMode].thickness; } //////////////////////////////////////////////////////////////////////////////////////////////////// // CMASK/HTILE //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Lib::ComputeHtileInfo * * @brief * Interface function stub of AddrComputeHtilenfo * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeHtileInfo( const ADDR_COMPUTE_HTILE_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_HTILE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; BOOL_32 isWidth8 = (pIn->blockWidth == 8) ? TRUE : FALSE; BOOL_32 isHeight8 = (pIn->blockHeight == 8) ? TRUE : FALSE; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_HTILE_INFO_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_HTILE_INFO_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_HTILE_INFO_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { if (pIn->flags.tcCompatible) { const UINT_32 sliceSize = pIn->pitch * pIn->height * 4 / (8 * 8); const UINT_32 align = HwlGetPipes(pIn->pTileInfo) * pIn->pTileInfo->banks * m_pipeInterleaveBytes; if (pIn->numSlices > 1) { const UINT_32 surfBytes = (sliceSize * pIn->numSlices); pOut->sliceSize = sliceSize; pOut->htileBytes = pIn->flags.skipTcCompatSizeAlign ? surfBytes : PowTwoAlign(surfBytes, align); pOut->sliceInterleaved = ((sliceSize % align) != 0) ? TRUE : FALSE; } else { pOut->sliceSize = pIn->flags.skipTcCompatSizeAlign ? sliceSize : PowTwoAlign(sliceSize, align); pOut->htileBytes = pOut->sliceSize; pOut->sliceInterleaved = FALSE; } pOut->nextMipLevelCompressible = ((sliceSize % align) == 0) ? TRUE : FALSE; pOut->pitch = pIn->pitch; pOut->height = pIn->height; pOut->baseAlign = align; pOut->macroWidth = 0; pOut->macroHeight = 0; pOut->bpp = 32; } else { pOut->bpp = ComputeHtileInfo(pIn->flags, pIn->pitch, pIn->height, pIn->numSlices, pIn->isLinear, isWidth8, isHeight8, pIn->pTileInfo, &pOut->pitch, &pOut->height, &pOut->htileBytes, &pOut->macroWidth, &pOut->macroHeight, &pOut->sliceSize, &pOut->baseAlign); } } } ValidMetaBaseAlignments(pOut->baseAlign); return returnCode; } /** **************************************************************************************************** * Lib::ComputeCmaskInfo * * @brief * Interface function stub of AddrComputeCmaskInfo * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeCmaskInfo( const ADDR_COMPUTE_CMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_CMASK_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_CMASK_INFO_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_CMASK_INFO_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_CMASK_INFO_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { returnCode = ComputeCmaskInfo(pIn->flags, pIn->pitch, pIn->height, pIn->numSlices, pIn->isLinear, pIn->pTileInfo, &pOut->pitch, &pOut->height, &pOut->cmaskBytes, &pOut->macroWidth, &pOut->macroHeight, &pOut->sliceSize, &pOut->baseAlign, &pOut->blockMax); } } ValidMetaBaseAlignments(pOut->baseAlign); return returnCode; } /** **************************************************************************************************** * Lib::ComputeDccInfo * * @brief * Interface function to compute DCC key info * * @return * return code of HwlComputeDccInfo **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeDccInfo( const ADDR_COMPUTE_DCCINFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_DCCINFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE ret = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_DCCINFO_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_DCCINFO_OUTPUT))) { ret = ADDR_PARAMSIZEMISMATCH; } } if (ret == ADDR_OK) { ADDR_COMPUTE_DCCINFO_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; ret = HwlSetupTileCfg(input.bpp, input.tileIndex, input.macroModeIndex, &input.tileInfo, &input.tileMode); pIn = &input; } if (ret == ADDR_OK) { ret = HwlComputeDccInfo(pIn, pOut); ValidMetaBaseAlignments(pOut->dccRamBaseAlign); } } return ret; } /** **************************************************************************************************** * Lib::ComputeHtileAddrFromCoord * * @brief * Interface function stub of AddrComputeHtileAddrFromCoord * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeHtileAddrFromCoord( const ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; BOOL_32 isWidth8 = (pIn->blockWidth == 8) ? TRUE : FALSE; BOOL_32 isHeight8 = (pIn->blockHeight == 8) ? TRUE : FALSE; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { if (pIn->flags.tcCompatible) { HwlComputeHtileAddrFromCoord(pIn, pOut); } else { pOut->addr = HwlComputeXmaskAddrFromCoord(pIn->pitch, pIn->height, pIn->x, pIn->y, pIn->slice, pIn->numSlices, 1, pIn->isLinear, isWidth8, isHeight8, pIn->pTileInfo, &pOut->bitPosition); } } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeHtileCoordFromAddr * * @brief * Interface function stub of AddrComputeHtileCoordFromAddr * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeHtileCoordFromAddr( const ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; BOOL_32 isWidth8 = (pIn->blockWidth == 8) ? TRUE : FALSE; BOOL_32 isHeight8 = (pIn->blockHeight == 8) ? TRUE : FALSE; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_HTILE_COORDFROMADDR_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { HwlComputeXmaskCoordFromAddr(pIn->addr, pIn->bitPosition, pIn->pitch, pIn->height, pIn->numSlices, 1, pIn->isLinear, isWidth8, isHeight8, pIn->pTileInfo, &pOut->x, &pOut->y, &pOut->slice); } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeCmaskAddrFromCoord * * @brief * Interface function stub of AddrComputeCmaskAddrFromCoord * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeCmaskAddrFromCoord( const ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { if (pIn->flags.tcCompatible == TRUE) { returnCode = HwlComputeCmaskAddrFromCoord(pIn, pOut); } else { pOut->addr = HwlComputeXmaskAddrFromCoord(pIn->pitch, pIn->height, pIn->x, pIn->y, pIn->slice, pIn->numSlices, 2, pIn->isLinear, FALSE, //this is cmask, isWidth8 is not needed FALSE, //this is cmask, isHeight8 is not needed pIn->pTileInfo, &pOut->bitPosition); } } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeCmaskCoordFromAddr * * @brief * Interface function stub of AddrComputeCmaskCoordFromAddr * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeCmaskCoordFromAddr( const ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_CMASK_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT)) || (pOut->size != sizeof(ADDR_COMPUTE_CMASK_COORDFROMADDR_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if (returnCode == ADDR_OK) { ADDR_TILEINFO tileInfoNull; ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT input; if (UseTileIndex(pIn->tileIndex)) { input = *pIn; // Use temp tile info for calcalation input.pTileInfo = &tileInfoNull; returnCode = HwlSetupTileCfg(0, input.tileIndex, input.macroModeIndex, input.pTileInfo); // Change the input structure pIn = &input; } if (returnCode == ADDR_OK) { HwlComputeXmaskCoordFromAddr(pIn->addr, pIn->bitPosition, pIn->pitch, pIn->height, pIn->numSlices, 2, pIn->isLinear, FALSE, FALSE, pIn->pTileInfo, &pOut->x, &pOut->y, &pOut->slice); } } return returnCode; } /** **************************************************************************************************** * Lib::ComputeTileDataWidthAndHeight * * @brief * Compute the squared cache shape for per-tile data (CMASK and HTILE) * * @return * N/A * * @note * MacroWidth and macroHeight are measured in pixels **************************************************************************************************** */ VOID Lib::ComputeTileDataWidthAndHeight( UINT_32 bpp, ///< [in] bits per pixel UINT_32 cacheBits, ///< [in] bits of cache ADDR_TILEINFO* pTileInfo, ///< [in] Tile info UINT_32* pMacroWidth, ///< [out] macro tile width UINT_32* pMacroHeight ///< [out] macro tile height ) const { UINT_32 height = 1; UINT_32 width = cacheBits / bpp; UINT_32 pipes = HwlGetPipes(pTileInfo); // Double height until the macro-tile is close to square // Height can only be doubled if width is even while ((width > height * 2 * pipes) && !(width & 1)) { width /= 2; height *= 2; } *pMacroWidth = 8 * width; *pMacroHeight = 8 * height * pipes; // Note: The above iterative comptuation is equivalent to the following // //int log2_height = ((log2(cacheBits)-log2(bpp)-log2(pipes))/2); //int macroHeight = pow2( 3+log2(pipes)+log2_height ); } /** **************************************************************************************************** * Lib::HwlComputeTileDataWidthAndHeightLinear * * @brief * Compute the squared cache shape for per-tile data (CMASK and HTILE) for linear layout * * @return * N/A * * @note * MacroWidth and macroHeight are measured in pixels **************************************************************************************************** */ VOID Lib::HwlComputeTileDataWidthAndHeightLinear( UINT_32* pMacroWidth, ///< [out] macro tile width UINT_32* pMacroHeight, ///< [out] macro tile height UINT_32 bpp, ///< [in] bits per pixel ADDR_TILEINFO* pTileInfo ///< [in] tile info ) const { ADDR_ASSERT(bpp != 4); // Cmask does not support linear layout prior to SI *pMacroWidth = 8 * 512 / bpp; // Align width to 512-bit memory accesses *pMacroHeight = 8 * m_pipes; // Align height to number of pipes } /** **************************************************************************************************** * Lib::ComputeHtileInfo * * @brief * Compute htile pitch,width, bytes per 2D slice * * @return * Htile bpp i.e. How many bits for an 8x8 tile * Also returns by output parameters: * *Htile pitch, height, total size in bytes, macro-tile dimensions and slice size* **************************************************************************************************** */ UINT_32 Lib::ComputeHtileInfo( ADDR_HTILE_FLAGS flags, ///< [in] htile flags UINT_32 pitchIn, ///< [in] pitch input UINT_32 heightIn, ///< [in] height input UINT_32 numSlices, ///< [in] number of slices BOOL_32 isLinear, ///< [in] if it is linear mode BOOL_32 isWidth8, ///< [in] if htile block width is 8 BOOL_32 isHeight8, ///< [in] if htile block height is 8 ADDR_TILEINFO* pTileInfo, ///< [in] Tile info UINT_32* pPitchOut, ///< [out] pitch output UINT_32* pHeightOut, ///< [out] height output UINT_64* pHtileBytes, ///< [out] bytes per 2D slice UINT_32* pMacroWidth, ///< [out] macro-tile width in pixels UINT_32* pMacroHeight, ///< [out] macro-tile width in pixels UINT_64* pSliceSize, ///< [out] slice size in bytes UINT_32* pBaseAlign ///< [out] base alignment ) const { UINT_32 macroWidth; UINT_32 macroHeight; UINT_32 baseAlign; UINT_64 surfBytes; UINT_64 sliceBytes; numSlices = Max(1u, numSlices); const UINT_32 bpp = HwlComputeHtileBpp(isWidth8, isHeight8); const UINT_32 cacheBits = HtileCacheBits; if (isLinear) { HwlComputeTileDataWidthAndHeightLinear(¯oWidth, ¯oHeight, bpp, pTileInfo); } else { ComputeTileDataWidthAndHeight(bpp, cacheBits, pTileInfo, ¯oWidth, ¯oHeight); } *pPitchOut = PowTwoAlign(pitchIn, macroWidth); *pHeightOut = PowTwoAlign(heightIn, macroHeight); baseAlign = HwlComputeHtileBaseAlign(flags.tcCompatible, isLinear, pTileInfo); surfBytes = HwlComputeHtileBytes(*pPitchOut, *pHeightOut, bpp, isLinear, numSlices, &sliceBytes, baseAlign); *pHtileBytes = surfBytes; // // Use SafeAssign since they are optional // SafeAssign(pMacroWidth, macroWidth); SafeAssign(pMacroHeight, macroHeight); SafeAssign(pSliceSize, sliceBytes); SafeAssign(pBaseAlign, baseAlign); return bpp; } /** **************************************************************************************************** * Lib::ComputeCmaskBaseAlign * * @brief * Compute cmask base alignment * * @return * Cmask base alignment **************************************************************************************************** */ UINT_32 Lib::ComputeCmaskBaseAlign( ADDR_CMASK_FLAGS flags, ///< [in] Cmask flags ADDR_TILEINFO* pTileInfo ///< [in] Tile info ) const { UINT_32 baseAlign = m_pipeInterleaveBytes * HwlGetPipes(pTileInfo); if (flags.tcCompatible) { ADDR_ASSERT(pTileInfo != NULL); if (pTileInfo) { baseAlign *= pTileInfo->banks; } } return baseAlign; } /** **************************************************************************************************** * Lib::ComputeCmaskBytes * * @brief * Compute cmask size in bytes * * @return * Cmask size in bytes **************************************************************************************************** */ UINT_64 Lib::ComputeCmaskBytes( UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 numSlices ///< [in] number of slices ) const { return BITS_TO_BYTES(static_cast(pitch) * height * numSlices * CmaskElemBits) / MicroTilePixels; } /** **************************************************************************************************** * Lib::ComputeCmaskInfo * * @brief * Compute cmask pitch,width, bytes per 2D slice * * @return * BlockMax. Also by output parameters: Cmask pitch,height, total size in bytes, * macro-tile dimensions **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeCmaskInfo( ADDR_CMASK_FLAGS flags, ///< [in] cmask flags UINT_32 pitchIn, ///< [in] pitch input UINT_32 heightIn, ///< [in] height input UINT_32 numSlices, ///< [in] number of slices BOOL_32 isLinear, ///< [in] is linear mode ADDR_TILEINFO* pTileInfo, ///< [in] Tile info UINT_32* pPitchOut, ///< [out] pitch output UINT_32* pHeightOut, ///< [out] height output UINT_64* pCmaskBytes, ///< [out] bytes per 2D slice UINT_32* pMacroWidth, ///< [out] macro-tile width in pixels UINT_32* pMacroHeight, ///< [out] macro-tile width in pixels UINT_64* pSliceSize, ///< [out] slice size in bytes UINT_32* pBaseAlign, ///< [out] base alignment UINT_32* pBlockMax ///< [out] block max == slice / 128 / 128 - 1 ) const { UINT_32 macroWidth; UINT_32 macroHeight; UINT_32 baseAlign; UINT_64 surfBytes; UINT_64 sliceBytes; numSlices = Max(1u, numSlices); const UINT_32 bpp = CmaskElemBits; const UINT_32 cacheBits = CmaskCacheBits; ADDR_E_RETURNCODE returnCode = ADDR_OK; if (isLinear) { HwlComputeTileDataWidthAndHeightLinear(¯oWidth, ¯oHeight, bpp, pTileInfo); } else { ComputeTileDataWidthAndHeight(bpp, cacheBits, pTileInfo, ¯oWidth, ¯oHeight); } *pPitchOut = (pitchIn + macroWidth - 1) & ~(macroWidth - 1); *pHeightOut = (heightIn + macroHeight - 1) & ~(macroHeight - 1); sliceBytes = ComputeCmaskBytes(*pPitchOut, *pHeightOut, 1); baseAlign = ComputeCmaskBaseAlign(flags, pTileInfo); while (sliceBytes % baseAlign) { *pHeightOut += macroHeight; sliceBytes = ComputeCmaskBytes(*pPitchOut, *pHeightOut, 1); } surfBytes = sliceBytes * numSlices; *pCmaskBytes = surfBytes; // // Use SafeAssign since they are optional // SafeAssign(pMacroWidth, macroWidth); SafeAssign(pMacroHeight, macroHeight); SafeAssign(pBaseAlign, baseAlign); SafeAssign(pSliceSize, sliceBytes); UINT_32 slice = (*pPitchOut) * (*pHeightOut); UINT_32 blockMax = slice / 128 / 128 - 1; #if DEBUG if (slice % (64*256) != 0) { ADDR_ASSERT_ALWAYS(); } #endif //DEBUG UINT_32 maxBlockMax = HwlGetMaxCmaskBlockMax(); if (blockMax > maxBlockMax) { blockMax = maxBlockMax; returnCode = ADDR_INVALIDPARAMS; } SafeAssign(pBlockMax, blockMax); return returnCode; } /** **************************************************************************************************** * Lib::ComputeXmaskCoordYFromPipe * * @brief * Compute the Y coord from pipe number for cmask/htile * * @return * Y coordinate * **************************************************************************************************** */ UINT_32 Lib::ComputeXmaskCoordYFromPipe( UINT_32 pipe, ///< [in] pipe number UINT_32 x ///< [in] x coordinate ) const { UINT_32 pipeBit0; UINT_32 pipeBit1; UINT_32 xBit0; UINT_32 xBit1; UINT_32 yBit0; UINT_32 yBit1; UINT_32 y = 0; UINT_32 numPipes = m_pipes; // SI has its implementation // // Convert pipe + x to y coordinate. // switch (numPipes) { case 1: // // 1 pipe // // p0 = 0 // y = 0; break; case 2: // // 2 pipes // // p0 = x0 ^ y0 // // y0 = p0 ^ x0 // pipeBit0 = pipe & 0x1; xBit0 = x & 0x1; yBit0 = pipeBit0 ^ xBit0; y = yBit0; break; case 4: // // 4 pipes // // p0 = x1 ^ y0 // p1 = x0 ^ y1 // // y0 = p0 ^ x1 // y1 = p1 ^ x0 // pipeBit0 = pipe & 0x1; pipeBit1 = (pipe & 0x2) >> 1; xBit0 = x & 0x1; xBit1 = (x & 0x2) >> 1; yBit0 = pipeBit0 ^ xBit1; yBit1 = pipeBit1 ^ xBit0; y = (yBit0 | (yBit1 << 1)); break; case 8: // // 8 pipes // // r600 and r800 have different method // y = HwlComputeXmaskCoordYFrom8Pipe(pipe, x); break; default: break; } return y; } /** **************************************************************************************************** * Lib::HwlComputeXmaskCoordFromAddr * * @brief * Compute the coord from an address of a cmask/htile * * @return * N/A * * @note * This method is reused by htile, so rename to Xmask **************************************************************************************************** */ VOID Lib::HwlComputeXmaskCoordFromAddr( UINT_64 addr, ///< [in] address UINT_32 bitPosition, ///< [in] bitPosition in a byte UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 numSlices, ///< [in] number of slices UINT_32 factor, ///< [in] factor that indicates cmask or htile BOOL_32 isLinear, ///< [in] linear or tiled HTILE layout BOOL_32 isWidth8, ///< [in] TRUE if width is 8, FALSE means 4. It's register value BOOL_32 isHeight8, ///< [in] TRUE if width is 8, FALSE means 4. It's register value ADDR_TILEINFO* pTileInfo, ///< [in] Tile info UINT_32* pX, ///< [out] x coord UINT_32* pY, ///< [out] y coord UINT_32* pSlice ///< [out] slice index ) const { UINT_32 pipe; UINT_32 numPipes; UINT_32 numGroupBits; (void)numGroupBits; UINT_32 numPipeBits; UINT_32 macroTilePitch; UINT_32 macroTileHeight; UINT_64 bitAddr; UINT_32 microTileCoordY; UINT_32 elemBits; UINT_32 pitchAligned = pitch; UINT_32 heightAligned = height; UINT_64 totalBytes; UINT_64 elemOffset; UINT_64 macroIndex; UINT_32 microIndex; UINT_64 macroNumber; UINT_32 microNumber; UINT_32 macroX; UINT_32 macroY; UINT_32 macroZ; UINT_32 microX; UINT_32 microY; UINT_32 tilesPerMacro; UINT_32 macrosPerPitch; UINT_32 macrosPerSlice; // // Extract pipe. // numPipes = HwlGetPipes(pTileInfo); pipe = ComputePipeFromAddr(addr, numPipes); // // Compute the number of group and pipe bits. // numGroupBits = Log2(m_pipeInterleaveBytes); numPipeBits = Log2(numPipes); UINT_32 groupBits = 8 * m_pipeInterleaveBytes; UINT_32 pipes = numPipes; // // Compute the micro tile size, in bits. And macro tile pitch and height. // if (factor == 2) //CMASK { ADDR_CMASK_FLAGS flags = {{0}}; elemBits = CmaskElemBits; ComputeCmaskInfo(flags, pitch, height, numSlices, isLinear, pTileInfo, &pitchAligned, &heightAligned, &totalBytes, ¯oTilePitch, ¯oTileHeight); } else //HTILE { ADDR_HTILE_FLAGS flags = {{0}}; if (factor != 1) { factor = 1; } elemBits = HwlComputeHtileBpp(isWidth8, isHeight8); ComputeHtileInfo(flags, pitch, height, numSlices, isLinear, isWidth8, isHeight8, pTileInfo, &pitchAligned, &heightAligned, &totalBytes, ¯oTilePitch, ¯oTileHeight); } // Should use aligned dims // pitch = pitchAligned; height = heightAligned; // // Convert byte address to bit address. // bitAddr = BYTES_TO_BITS(addr) + bitPosition; // // Remove pipe bits from address. // bitAddr = (bitAddr % groupBits) + ((bitAddr/groupBits/pipes)*groupBits); elemOffset = bitAddr / elemBits; tilesPerMacro = (macroTilePitch/factor) * macroTileHeight / MicroTilePixels >> numPipeBits; macrosPerPitch = pitch / (macroTilePitch/factor); macrosPerSlice = macrosPerPitch * height / macroTileHeight; macroIndex = elemOffset / factor / tilesPerMacro; microIndex = static_cast(elemOffset % (tilesPerMacro * factor)); macroNumber = macroIndex * factor + microIndex % factor; microNumber = microIndex / factor; macroX = static_cast((macroNumber % macrosPerPitch)); macroY = static_cast((macroNumber % macrosPerSlice) / macrosPerPitch); macroZ = static_cast((macroNumber / macrosPerSlice)); microX = microNumber % (macroTilePitch / factor / MicroTileWidth); microY = (microNumber / (macroTilePitch / factor / MicroTileHeight)); *pX = macroX * (macroTilePitch/factor) + microX * MicroTileWidth; *pY = macroY * macroTileHeight + (microY * MicroTileHeight << numPipeBits); *pSlice = macroZ; microTileCoordY = ComputeXmaskCoordYFromPipe(pipe, *pX/MicroTileWidth); // // Assemble final coordinates. // *pY += microTileCoordY * MicroTileHeight; } /** **************************************************************************************************** * Lib::HwlComputeXmaskAddrFromCoord * * @brief * Compute the address from an address of cmask (prior to si) * * @return * Address in bytes * **************************************************************************************************** */ UINT_64 Lib::HwlComputeXmaskAddrFromCoord( UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 x, ///< [in] x coord UINT_32 y, ///< [in] y coord UINT_32 slice, ///< [in] slice/depth index UINT_32 numSlices, ///< [in] number of slices UINT_32 factor, ///< [in] factor that indicates cmask(2) or htile(1) BOOL_32 isLinear, ///< [in] linear or tiled HTILE layout BOOL_32 isWidth8, ///< [in] TRUE if width is 8, FALSE means 4. It's register value BOOL_32 isHeight8, ///< [in] TRUE if width is 8, FALSE means 4. It's register value ADDR_TILEINFO* pTileInfo, ///< [in] Tile info UINT_32* pBitPosition ///< [out] bit position inside a byte ) const { UINT_64 addr; UINT_32 numGroupBits; UINT_32 numPipeBits; UINT_32 newPitch = 0; UINT_32 newHeight = 0; UINT_64 sliceBytes = 0; UINT_64 totalBytes = 0; UINT_64 sliceOffset; UINT_32 pipe; UINT_32 macroTileWidth; UINT_32 macroTileHeight; UINT_32 macroTilesPerRow; UINT_32 macroTileBytes; UINT_32 macroTileIndexX; UINT_32 macroTileIndexY; UINT_64 macroTileOffset; UINT_32 pixelBytesPerRow; UINT_32 pixelOffsetX; UINT_32 pixelOffsetY; UINT_32 pixelOffset; UINT_64 totalOffset; UINT_64 offsetLo; UINT_64 offsetHi; UINT_64 groupMask; UINT_32 elemBits = 0; UINT_32 numPipes = m_pipes; // This function is accessed prior to si only if (factor == 2) //CMASK { elemBits = CmaskElemBits; // For asics before SI, cmask is always tiled isLinear = FALSE; } else //HTILE { if (factor != 1) // Fix compile warning { factor = 1; } elemBits = HwlComputeHtileBpp(isWidth8, isHeight8); } // // Compute the number of group bits and pipe bits. // numGroupBits = Log2(m_pipeInterleaveBytes); numPipeBits = Log2(numPipes); // // Compute macro tile dimensions. // if (factor == 2) // CMASK { ADDR_CMASK_FLAGS flags = {{0}}; ComputeCmaskInfo(flags, pitch, height, numSlices, isLinear, pTileInfo, &newPitch, &newHeight, &totalBytes, ¯oTileWidth, ¯oTileHeight); sliceBytes = totalBytes / numSlices; } else // HTILE { ADDR_HTILE_FLAGS flags = {{0}}; ComputeHtileInfo(flags, pitch, height, numSlices, isLinear, isWidth8, isHeight8, pTileInfo, &newPitch, &newHeight, &totalBytes, ¯oTileWidth, ¯oTileHeight, &sliceBytes); } sliceOffset = slice * sliceBytes; // // Get the pipe. Note that neither slice rotation nor pipe swizzling apply for CMASK. // pipe = ComputePipeFromCoord(x, y, 0, ADDR_TM_2D_TILED_THIN1, 0, FALSE, pTileInfo); // // Compute the number of macro tiles per row. // macroTilesPerRow = newPitch / macroTileWidth; // // Compute the number of bytes per macro tile. // macroTileBytes = BITS_TO_BYTES((macroTileWidth * macroTileHeight * elemBits) / MicroTilePixels); // // Compute the offset to the macro tile containing the specified coordinate. // macroTileIndexX = x / macroTileWidth; macroTileIndexY = y / macroTileHeight; macroTileOffset = ((macroTileIndexY * macroTilesPerRow) + macroTileIndexX) * macroTileBytes; // // Compute the pixel offset within the macro tile. // pixelBytesPerRow = BITS_TO_BYTES(macroTileWidth * elemBits) / MicroTileWidth; // // The nibbles are interleaved (see below), so the part of the offset relative to the x // coordinate repeats halfway across the row. (Not for HTILE) // if (factor == 2) { pixelOffsetX = (x % (macroTileWidth / 2)) / MicroTileWidth; } else { pixelOffsetX = (x % (macroTileWidth)) / MicroTileWidth * BITS_TO_BYTES(elemBits); } // // Compute the y offset within the macro tile. // pixelOffsetY = (((y % macroTileHeight) / MicroTileHeight) / numPipes) * pixelBytesPerRow; pixelOffset = pixelOffsetX + pixelOffsetY; // // Combine the slice offset and macro tile offset with the pixel offset, accounting for the // pipe bits in the middle of the address. // totalOffset = ((sliceOffset + macroTileOffset) >> numPipeBits) + pixelOffset; // // Split the offset to put some bits below the pipe bits and some above. // groupMask = (1 << numGroupBits) - 1; offsetLo = totalOffset & groupMask; offsetHi = (totalOffset & ~groupMask) << numPipeBits; // // Assemble the address from its components. // addr = offsetLo; addr |= offsetHi; // This is to remove warning with /analyze option UINT_32 pipeBits = pipe << numGroupBits; addr |= pipeBits; // // Compute the bit position. The lower nibble is used when the x coordinate within the macro // tile is less than half of the macro tile width, and the upper nibble is used when the x // coordinate within the macro tile is greater than or equal to half the macro tile width. // *pBitPosition = ((x % macroTileWidth) < (macroTileWidth / factor)) ? 0 : 4; return addr; } //////////////////////////////////////////////////////////////////////////////////////////////////// // Surface Addressing Shared //////////////////////////////////////////////////////////////////////////////////////////////////// /** **************************************************************************************************** * Lib::ComputeSurfaceAddrFromCoordLinear * * @brief * Compute address from coord for linear surface * * @return * Address in bytes * **************************************************************************************************** */ UINT_64 Lib::ComputeSurfaceAddrFromCoordLinear( UINT_32 x, ///< [in] x coord UINT_32 y, ///< [in] y coord UINT_32 slice, ///< [in] slice/depth index UINT_32 sample, ///< [in] sample index UINT_32 bpp, ///< [in] bits per pixel UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 numSlices, ///< [in] number of slices UINT_32* pBitPosition ///< [out] bit position inside a byte ) const { const UINT_64 sliceSize = static_cast(pitch) * height; UINT_64 sliceOffset = (slice + sample * numSlices)* sliceSize; UINT_64 rowOffset = static_cast(y) * pitch; UINT_64 pixOffset = x; UINT_64 addr = (sliceOffset + rowOffset + pixOffset) * bpp; *pBitPosition = static_cast(addr % 8); addr /= 8; return addr; } /** **************************************************************************************************** * Lib::ComputeSurfaceCoordFromAddrLinear * * @brief * Compute the coord from an address of a linear surface * * @return * N/A **************************************************************************************************** */ VOID Lib::ComputeSurfaceCoordFromAddrLinear( UINT_64 addr, ///< [in] address UINT_32 bitPosition, ///< [in] bitPosition in a byte UINT_32 bpp, ///< [in] bits per pixel UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 numSlices, ///< [in] number of slices UINT_32* pX, ///< [out] x coord UINT_32* pY, ///< [out] y coord UINT_32* pSlice, ///< [out] slice/depth index UINT_32* pSample ///< [out] sample index ) const { const UINT_64 sliceSize = static_cast(pitch) * height; const UINT_64 linearOffset = (BYTES_TO_BITS(addr) + bitPosition) / bpp; *pX = static_cast((linearOffset % sliceSize) % pitch); *pY = static_cast((linearOffset % sliceSize) / pitch % height); *pSlice = static_cast((linearOffset / sliceSize) % numSlices); *pSample = static_cast((linearOffset / sliceSize) / numSlices); } /** **************************************************************************************************** * Lib::ComputeSurfaceCoordFromAddrMicroTiled * * @brief * Compute the coord from an address of a micro tiled surface * * @return * N/A **************************************************************************************************** */ VOID Lib::ComputeSurfaceCoordFromAddrMicroTiled( UINT_64 addr, ///< [in] address UINT_32 bitPosition, ///< [in] bitPosition in a byte UINT_32 bpp, ///< [in] bits per pixel UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 numSamples, ///< [in] number of samples AddrTileMode tileMode, ///< [in] tile mode UINT_32 tileBase, ///< [in] base offset within a tile UINT_32 compBits, ///< [in] component bits actually needed(for planar surface) UINT_32* pX, ///< [out] x coord UINT_32* pY, ///< [out] y coord UINT_32* pSlice, ///< [out] slice/depth index UINT_32* pSample, ///< [out] sample index, AddrTileType microTileType, ///< [in] micro tiling order BOOL_32 isDepthSampleOrder ///< [in] TRUE if in depth sample order ) const { UINT_64 bitAddr; UINT_32 microTileThickness; UINT_32 microTileBits; UINT_64 sliceBits; UINT_64 rowBits; UINT_32 sliceIndex; UINT_32 microTileCoordX; UINT_32 microTileCoordY; UINT_32 pixelOffset; UINT_32 pixelCoordX = 0; UINT_32 pixelCoordY = 0; UINT_32 pixelCoordZ = 0; UINT_32 pixelCoordS = 0; // // Convert byte address to bit address. // bitAddr = BYTES_TO_BITS(addr) + bitPosition; // // Compute the micro tile size, in bits. // switch (tileMode) { case ADDR_TM_1D_TILED_THICK: microTileThickness = ThickTileThickness; break; default: microTileThickness = 1; break; } microTileBits = MicroTilePixels * microTileThickness * bpp * numSamples; // // Compute number of bits per slice and number of bits per row of micro tiles. // sliceBits = static_cast(pitch) * height * microTileThickness * bpp * numSamples; rowBits = (pitch / MicroTileWidth) * microTileBits; // // Extract the slice index. // sliceIndex = static_cast(bitAddr / sliceBits); bitAddr -= sliceIndex * sliceBits; // // Extract the y coordinate of the micro tile. // microTileCoordY = static_cast(bitAddr / rowBits) * MicroTileHeight; bitAddr -= (microTileCoordY / MicroTileHeight) * rowBits; // // Extract the x coordinate of the micro tile. // microTileCoordX = static_cast(bitAddr / microTileBits) * MicroTileWidth; // // Compute the pixel offset within the micro tile. // pixelOffset = static_cast(bitAddr % microTileBits); // // Extract pixel coordinates from the offset. // HwlComputePixelCoordFromOffset(pixelOffset, bpp, numSamples, tileMode, tileBase, compBits, &pixelCoordX, &pixelCoordY, &pixelCoordZ, &pixelCoordS, microTileType, isDepthSampleOrder); // // Assemble final coordinates. // *pX = microTileCoordX + pixelCoordX; *pY = microTileCoordY + pixelCoordY; *pSlice = (sliceIndex * microTileThickness) + pixelCoordZ; *pSample = pixelCoordS; if (microTileThickness > 1) { *pSample = 0; } } /** **************************************************************************************************** * Lib::ComputePipeFromAddr * * @brief * Compute the pipe number from an address * * @return * Pipe number * **************************************************************************************************** */ UINT_32 Lib::ComputePipeFromAddr( UINT_64 addr, ///< [in] address UINT_32 numPipes ///< [in] number of banks ) const { UINT_32 pipe; UINT_32 groupBytes = m_pipeInterleaveBytes; //just different terms // R600 // The LSBs of the address are arranged as follows: // bank | pipe | group // // To get the pipe number, shift off the group bits and mask the pipe bits. // // R800 // The LSBs of the address are arranged as follows: // bank | bankInterleave | pipe | pipeInterleave // // To get the pipe number, shift off the pipe interleave bits and mask the pipe bits. // pipe = static_cast(addr >> Log2(groupBytes)) & (numPipes - 1); return pipe; } /** **************************************************************************************************** * Lib::ComputeMicroTileEquation * * @brief * Compute micro tile equation * * @return * If equation can be computed * **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputeMicroTileEquation( UINT_32 log2BytesPP, ///< [in] log2 of bytes per pixel AddrTileMode tileMode, ///< [in] tile mode AddrTileType microTileType, ///< [in] pixel order in display/non-display mode ADDR_EQUATION* pEquation ///< [out] equation ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; for (UINT_32 i = 0; i < log2BytesPP; i++) { pEquation->addr[i].valid = 1; pEquation->addr[i].channel = 0; pEquation->addr[i].index = i; } ADDR_CHANNEL_SETTING* pixelBit = &pEquation->addr[log2BytesPP]; ADDR_CHANNEL_SETTING x0 = InitChannel(1, 0, log2BytesPP + 0); ADDR_CHANNEL_SETTING x1 = InitChannel(1, 0, log2BytesPP + 1); ADDR_CHANNEL_SETTING x2 = InitChannel(1, 0, log2BytesPP + 2); ADDR_CHANNEL_SETTING y0 = InitChannel(1, 1, 0); ADDR_CHANNEL_SETTING y1 = InitChannel(1, 1, 1); ADDR_CHANNEL_SETTING y2 = InitChannel(1, 1, 2); ADDR_CHANNEL_SETTING z0 = InitChannel(1, 2, 0); ADDR_CHANNEL_SETTING z1 = InitChannel(1, 2, 1); ADDR_CHANNEL_SETTING z2 = InitChannel(1, 2, 2); UINT_32 thickness = Thickness(tileMode); UINT_32 bpp = 1 << (log2BytesPP + 3); if (microTileType != ADDR_THICK) { if (microTileType == ADDR_DISPLAYABLE) { switch (bpp) { case 8: pixelBit[0] = x0; pixelBit[1] = x1; pixelBit[2] = x2; pixelBit[3] = y1; pixelBit[4] = y0; pixelBit[5] = y2; break; case 16: pixelBit[0] = x0; pixelBit[1] = x1; pixelBit[2] = x2; pixelBit[3] = y0; pixelBit[4] = y1; pixelBit[5] = y2; break; case 32: pixelBit[0] = x0; pixelBit[1] = x1; pixelBit[2] = y0; pixelBit[3] = x2; pixelBit[4] = y1; pixelBit[5] = y2; break; case 64: pixelBit[0] = x0; pixelBit[1] = y0; pixelBit[2] = x1; pixelBit[3] = x2; pixelBit[4] = y1; pixelBit[5] = y2; break; case 128: pixelBit[0] = y0; pixelBit[1] = x0; pixelBit[2] = x1; pixelBit[3] = x2; pixelBit[4] = y1; pixelBit[5] = y2; break; default: ADDR_ASSERT_ALWAYS(); break; } } else if (microTileType == ADDR_NON_DISPLAYABLE || microTileType == ADDR_DEPTH_SAMPLE_ORDER) { pixelBit[0] = x0; pixelBit[1] = y0; pixelBit[2] = x1; pixelBit[3] = y1; pixelBit[4] = x2; pixelBit[5] = y2; } else if (microTileType == ADDR_ROTATED) { ADDR_ASSERT(thickness == 1); switch (bpp) { case 8: pixelBit[0] = y0; pixelBit[1] = y1; pixelBit[2] = y2; pixelBit[3] = x1; pixelBit[4] = x0; pixelBit[5] = x2; break; case 16: pixelBit[0] = y0; pixelBit[1] = y1; pixelBit[2] = y2; pixelBit[3] = x0; pixelBit[4] = x1; pixelBit[5] = x2; break; case 32: pixelBit[0] = y0; pixelBit[1] = y1; pixelBit[2] = x0; pixelBit[3] = y2; pixelBit[4] = x1; pixelBit[5] = x2; break; case 64: pixelBit[0] = y0; pixelBit[1] = x0; pixelBit[2] = y1; pixelBit[3] = x1; pixelBit[4] = x2; pixelBit[5] = y2; break; default: retCode = ADDR_NOTSUPPORTED; break; } } if (thickness > 1) { pixelBit[6] = z0; pixelBit[7] = z1; pEquation->numBits = 8 + log2BytesPP; } else { pEquation->numBits = 6 + log2BytesPP; } } else // ADDR_THICK { ADDR_ASSERT(thickness > 1); switch (bpp) { case 8: case 16: pixelBit[0] = x0; pixelBit[1] = y0; pixelBit[2] = x1; pixelBit[3] = y1; pixelBit[4] = z0; pixelBit[5] = z1; break; case 32: pixelBit[0] = x0; pixelBit[1] = y0; pixelBit[2] = x1; pixelBit[3] = z0; pixelBit[4] = y1; pixelBit[5] = z1; break; case 64: case 128: pixelBit[0] = x0; pixelBit[1] = y0; pixelBit[2] = z0; pixelBit[3] = x1; pixelBit[4] = y1; pixelBit[5] = z1; break; default: ADDR_ASSERT_ALWAYS(); break; } pixelBit[6] = x2; pixelBit[7] = y2; pEquation->numBits = 8 + log2BytesPP; } if (thickness == 8) { pixelBit[8] = z2; pEquation->numBits = 9 + log2BytesPP; } // stackedDepthSlices is used for addressing mode that a tile block contains multiple slices, // which is not supported by our address lib pEquation->stackedDepthSlices = FALSE; return retCode; } /** **************************************************************************************************** * Lib::ComputePixelIndexWithinMicroTile * * @brief * Compute the pixel index inside a micro tile of surface * * @return * Pixel index * **************************************************************************************************** */ UINT_32 Lib::ComputePixelIndexWithinMicroTile( UINT_32 x, ///< [in] x coord UINT_32 y, ///< [in] y coord UINT_32 z, ///< [in] slice/depth index UINT_32 bpp, ///< [in] bits per pixel AddrTileMode tileMode, ///< [in] tile mode AddrTileType microTileType ///< [in] pixel order in display/non-display mode ) const { UINT_32 pixelBit0 = 0; UINT_32 pixelBit1 = 0; UINT_32 pixelBit2 = 0; UINT_32 pixelBit3 = 0; UINT_32 pixelBit4 = 0; UINT_32 pixelBit5 = 0; UINT_32 pixelBit6 = 0; UINT_32 pixelBit7 = 0; UINT_32 pixelBit8 = 0; UINT_32 pixelNumber; UINT_32 x0 = _BIT(x, 0); UINT_32 x1 = _BIT(x, 1); UINT_32 x2 = _BIT(x, 2); UINT_32 y0 = _BIT(y, 0); UINT_32 y1 = _BIT(y, 1); UINT_32 y2 = _BIT(y, 2); UINT_32 z0 = _BIT(z, 0); UINT_32 z1 = _BIT(z, 1); UINT_32 z2 = _BIT(z, 2); UINT_32 thickness = Thickness(tileMode); // Compute the pixel number within the micro tile. if (microTileType != ADDR_THICK) { if (microTileType == ADDR_DISPLAYABLE) { switch (bpp) { case 8: pixelBit0 = x0; pixelBit1 = x1; pixelBit2 = x2; pixelBit3 = y1; pixelBit4 = y0; pixelBit5 = y2; break; case 16: pixelBit0 = x0; pixelBit1 = x1; pixelBit2 = x2; pixelBit3 = y0; pixelBit4 = y1; pixelBit5 = y2; break; case 32: pixelBit0 = x0; pixelBit1 = x1; pixelBit2 = y0; pixelBit3 = x2; pixelBit4 = y1; pixelBit5 = y2; break; case 64: pixelBit0 = x0; pixelBit1 = y0; pixelBit2 = x1; pixelBit3 = x2; pixelBit4 = y1; pixelBit5 = y2; break; case 128: pixelBit0 = y0; pixelBit1 = x0; pixelBit2 = x1; pixelBit3 = x2; pixelBit4 = y1; pixelBit5 = y2; break; default: ADDR_ASSERT_ALWAYS(); break; } } else if (microTileType == ADDR_NON_DISPLAYABLE || microTileType == ADDR_DEPTH_SAMPLE_ORDER) { pixelBit0 = x0; pixelBit1 = y0; pixelBit2 = x1; pixelBit3 = y1; pixelBit4 = x2; pixelBit5 = y2; } else if (microTileType == ADDR_ROTATED) { ADDR_ASSERT(thickness == 1); switch (bpp) { case 8: pixelBit0 = y0; pixelBit1 = y1; pixelBit2 = y2; pixelBit3 = x1; pixelBit4 = x0; pixelBit5 = x2; break; case 16: pixelBit0 = y0; pixelBit1 = y1; pixelBit2 = y2; pixelBit3 = x0; pixelBit4 = x1; pixelBit5 = x2; break; case 32: pixelBit0 = y0; pixelBit1 = y1; pixelBit2 = x0; pixelBit3 = y2; pixelBit4 = x1; pixelBit5 = x2; break; case 64: pixelBit0 = y0; pixelBit1 = x0; pixelBit2 = y1; pixelBit3 = x1; pixelBit4 = x2; pixelBit5 = y2; break; default: ADDR_ASSERT_ALWAYS(); break; } } if (thickness > 1) { pixelBit6 = z0; pixelBit7 = z1; } } else // ADDR_THICK { ADDR_ASSERT(thickness > 1); switch (bpp) { case 8: case 16: pixelBit0 = x0; pixelBit1 = y0; pixelBit2 = x1; pixelBit3 = y1; pixelBit4 = z0; pixelBit5 = z1; break; case 32: pixelBit0 = x0; pixelBit1 = y0; pixelBit2 = x1; pixelBit3 = z0; pixelBit4 = y1; pixelBit5 = z1; break; case 64: case 128: pixelBit0 = x0; pixelBit1 = y0; pixelBit2 = z0; pixelBit3 = x1; pixelBit4 = y1; pixelBit5 = z1; break; default: ADDR_ASSERT_ALWAYS(); break; } pixelBit6 = x2; pixelBit7 = y2; } if (thickness == 8) { pixelBit8 = z2; } pixelNumber = ((pixelBit0 ) | (pixelBit1 << 1) | (pixelBit2 << 2) | (pixelBit3 << 3) | (pixelBit4 << 4) | (pixelBit5 << 5) | (pixelBit6 << 6) | (pixelBit7 << 7) | (pixelBit8 << 8)); return pixelNumber; } /** **************************************************************************************************** * Lib::AdjustPitchAlignment * * @brief * Adjusts pitch alignment for flipping surface * * @return * N/A * **************************************************************************************************** */ VOID Lib::AdjustPitchAlignment( ADDR_SURFACE_FLAGS flags, ///< [in] Surface flags UINT_32* pPitchAlign ///< [out] Pointer to pitch alignment ) const { // Display engine hardwires lower 5 bit of GRPH_PITCH to ZERO which means 32 pixel alignment // Maybe it will be fixed in future but let's make it general for now. if (flags.display || flags.overlay) { *pPitchAlign = PowTwoAlign(*pPitchAlign, 32); if(flags.display) { *pPitchAlign = Max(m_minPitchAlignPixels, *pPitchAlign); } } } /** **************************************************************************************************** * Lib::PadDimensions * * @brief * Helper function to pad dimensions * * @return * N/A * **************************************************************************************************** */ VOID Lib::PadDimensions( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 numSamples, ///< [in] number of samples ADDR_TILEINFO* pTileInfo, ///< [in,out] bank structure. UINT_32 padDims, ///< [in] Dimensions to pad valid value 1,2,3 UINT_32 mipLevel, ///< [in] MipLevel UINT_32* pPitch, ///< [in,out] pitch in pixels UINT_32* pPitchAlign, ///< [in,out] pitch align could be changed in HwlPadDimensions UINT_32* pHeight, ///< [in,out] height in pixels UINT_32 heightAlign, ///< [in] height alignment UINT_32* pSlices, ///< [in,out] number of slices UINT_32 sliceAlign ///< [in] number of slice alignment ) const { UINT_32 pitchAlign = *pPitchAlign; UINT_32 thickness = Thickness(tileMode); ADDR_ASSERT(padDims <= 3); // // Override padding for mip levels // if (mipLevel > 0) { if (flags.cube) { // for cubemap, we only pad when client call with 6 faces as an identity if (*pSlices > 1) { padDims = 3; // we should pad cubemap sub levels when we treat it as 3d texture } else { padDims = 2; } } } // Any possibilities that padDims is 0? if (padDims == 0) { padDims = 3; } if (IsPow2(pitchAlign)) { *pPitch = PowTwoAlign((*pPitch), pitchAlign); } else // add this code to pass unit test, r600 linear mode is not align bpp to pow2 for linear { *pPitch += pitchAlign - 1; *pPitch /= pitchAlign; *pPitch *= pitchAlign; } if (padDims > 1) { if (IsPow2(heightAlign)) { *pHeight = PowTwoAlign((*pHeight), heightAlign); } else { *pHeight += heightAlign - 1; *pHeight /= heightAlign; *pHeight *= heightAlign; } } if (padDims > 2 || thickness > 1) { // for cubemap single face, we do not pad slices. // if we pad it, the slice number should be set to 6 and current mip level > 1 if (flags.cube && (!m_configFlags.noCubeMipSlicesPad || flags.cubeAsArray)) { *pSlices = NextPow2(*pSlices); } // normal 3D texture or arrays or cubemap has a thick mode? (Just pass unit test) if (thickness > 1) { *pSlices = PowTwoAlign((*pSlices), sliceAlign); } } HwlPadDimensions(tileMode, bpp, flags, numSamples, pTileInfo, mipLevel, pPitch, pPitchAlign, *pHeight, heightAlign); } /** **************************************************************************************************** * Lib::HwlPreHandleBaseLvl3xPitch * * @brief * Pre-handler of 3x pitch (96 bit) adjustment * * @return * Expected pitch **************************************************************************************************** */ UINT_32 Lib::HwlPreHandleBaseLvl3xPitch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input UINT_32 expPitch ///< [in] pitch ) const { ADDR_ASSERT(pIn->width == expPitch); // // If pitch is pre-multiplied by 3, we retrieve original one here to get correct miplevel size // if (ElemLib::IsExpand3x(pIn->format) && pIn->mipLevel == 0 && pIn->tileMode == ADDR_TM_LINEAR_ALIGNED) { expPitch /= 3; expPitch = NextPow2(expPitch); } return expPitch; } /** **************************************************************************************************** * Lib::HwlPostHandleBaseLvl3xPitch * * @brief * Post-handler of 3x pitch adjustment * * @return * Expected pitch **************************************************************************************************** */ UINT_32 Lib::HwlPostHandleBaseLvl3xPitch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input UINT_32 expPitch ///< [in] pitch ) const { // // 96 bits surface of sub levels require element pitch of 32 bits instead // So we just return pitch in 32 bit pixels without timing 3 // if (ElemLib::IsExpand3x(pIn->format) && pIn->mipLevel == 0 && pIn->tileMode == ADDR_TM_LINEAR_ALIGNED) { expPitch *= 3; } return expPitch; } /** **************************************************************************************************** * Lib::IsMacroTiled * * @brief * Check if the tile mode is macro tiled * * @return * TRUE if it is macro tiled (2D/2B/3D/3B) **************************************************************************************************** */ BOOL_32 Lib::IsMacroTiled( AddrTileMode tileMode) ///< [in] tile mode { return ModeFlags[tileMode].isMacro; } /** **************************************************************************************************** * Lib::IsMacro3dTiled * * @brief * Check if the tile mode is 3D macro tiled * * @return * TRUE if it is 3D macro tiled **************************************************************************************************** */ BOOL_32 Lib::IsMacro3dTiled( AddrTileMode tileMode) ///< [in] tile mode { return ModeFlags[tileMode].isMacro3d; } /** **************************************************************************************************** * Lib::IsMicroTiled * * @brief * Check if the tile mode is micro tiled * * @return * TRUE if micro tiled **************************************************************************************************** */ BOOL_32 Lib::IsMicroTiled( AddrTileMode tileMode) ///< [in] tile mode { return ModeFlags[tileMode].isMicro; } /** **************************************************************************************************** * Lib::IsLinear * * @brief * Check if the tile mode is linear * * @return * TRUE if linear **************************************************************************************************** */ BOOL_32 Lib::IsLinear( AddrTileMode tileMode) ///< [in] tile mode { return ModeFlags[tileMode].isLinear; } /** **************************************************************************************************** * Lib::IsPrtNoRotationTileMode * * @brief * Return TRUE if it is prt tile without rotation * @note * This function just used by CI **************************************************************************************************** */ BOOL_32 Lib::IsPrtNoRotationTileMode( AddrTileMode tileMode) { return ModeFlags[tileMode].isPrtNoRotation; } /** **************************************************************************************************** * Lib::IsPrtTileMode * * @brief * Return TRUE if it is prt tile * @note * This function just used by CI **************************************************************************************************** */ BOOL_32 Lib::IsPrtTileMode( AddrTileMode tileMode) { return ModeFlags[tileMode].isPrt; } /** **************************************************************************************************** * Lib::ComputeMipLevel * * @brief * Compute mipmap level width/height/slices * @return * N/A **************************************************************************************************** */ VOID Lib::ComputeMipLevel( ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn ///< [in,out] Input structure ) const { // Check if HWL has handled BOOL_32 hwlHandled = FALSE; (void)hwlHandled; if (ElemLib::IsBlockCompressed(pIn->format)) { if (pIn->mipLevel == 0) { // DXTn's level 0 must be multiple of 4 // But there are exceptions: // 1. Internal surface creation in hostblt/vsblt/etc... // 2. Runtime doesn't reject ATI1/ATI2 whose width/height are not multiple of 4 pIn->width = PowTwoAlign(pIn->width, 4); pIn->height = PowTwoAlign(pIn->height, 4); } } hwlHandled = HwlComputeMipLevel(pIn); } /** **************************************************************************************************** * Lib::DegradeTo1D * * @brief * Check if surface can be degraded to 1D * @return * TRUE if degraded **************************************************************************************************** */ BOOL_32 Lib::DegradeTo1D( UINT_32 width, ///< surface width UINT_32 height, ///< surface height UINT_32 macroTilePitchAlign, ///< macro tile pitch align UINT_32 macroTileHeightAlign ///< macro tile height align ) { BOOL_32 degrade = ((width < macroTilePitchAlign) || (height < macroTileHeightAlign)); // Check whether 2D tiling still has too much footprint if (degrade == FALSE) { // Only check width and height as slices are aligned to thickness UINT_64 unalignedSize = width * height; UINT_32 alignedPitch = PowTwoAlign(width, macroTilePitchAlign); UINT_32 alignedHeight = PowTwoAlign(height, macroTileHeightAlign); UINT_64 alignedSize = alignedPitch * alignedHeight; // alignedSize > 1.5 * unalignedSize if (2 * alignedSize > 3 * unalignedSize) { degrade = TRUE; } } return degrade; } /** **************************************************************************************************** * Lib::OptimizeTileMode * * @brief * Check if base level's tile mode can be optimized (degraded) * @return * N/A **************************************************************************************************** */ VOID Lib::OptimizeTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in, out] structure for surface info ) const { AddrTileMode tileMode = pInOut->tileMode; BOOL_32 doOpt = (pInOut->flags.opt4Space == TRUE) || (pInOut->flags.minimizeAlignment == TRUE) || (pInOut->maxBaseAlign != 0); BOOL_32 convertToPrt = FALSE; // Optimization can only be done on level 0 and samples <= 1 if ((doOpt == TRUE) && (pInOut->mipLevel == 0) && (IsPrtTileMode(tileMode) == FALSE) && (pInOut->flags.prt == FALSE)) { UINT_32 width = pInOut->width; UINT_32 height = pInOut->height; UINT_32 thickness = Thickness(tileMode); BOOL_32 macroTiledOK = TRUE; UINT_32 macroWidthAlign = 0; UINT_32 macroHeightAlign = 0; UINT_32 macroSizeAlign = 0; if (IsMacroTiled(tileMode)) { macroTiledOK = HwlGetAlignmentInfoMacroTiled(pInOut, ¯oWidthAlign, ¯oHeightAlign, ¯oSizeAlign); } if (macroTiledOK) { if ((pInOut->flags.display == FALSE) && (pInOut->flags.opt4Space == TRUE) && (pInOut->numSamples <= 1)) { // Check if linear mode is optimal if ((pInOut->height == 1) && (IsLinear(tileMode) == FALSE) && (ElemLib::IsBlockCompressed(pInOut->format) == FALSE) && (pInOut->flags.depth == FALSE) && (pInOut->flags.stencil == FALSE) && (m_configFlags.disableLinearOpt == FALSE) && (pInOut->flags.disableLinearOpt == FALSE)) { tileMode = ADDR_TM_LINEAR_ALIGNED; } else if (IsMacroTiled(tileMode) && (pInOut->flags.tcCompatible == FALSE)) { if (DegradeTo1D(width, height, macroWidthAlign, macroHeightAlign)) { tileMode = (thickness == 1) ? ADDR_TM_1D_TILED_THIN1 : ADDR_TM_1D_TILED_THICK; } else if ((thickness > 1) && (pInOut->flags.disallowLargeThickDegrade == 0)) { // As in the following HwlComputeSurfaceInfo, thick modes may be degraded to // thinner modes, we should re-evaluate whether the corresponding // thinner modes should be degraded. If so, we choose 1D thick mode instead. tileMode = DegradeLargeThickTile(pInOut->tileMode, pInOut->bpp); if (tileMode != pInOut->tileMode) { // Get thickness again after large thick degrade thickness = Thickness(tileMode); ADDR_COMPUTE_SURFACE_INFO_INPUT input = *pInOut; input.tileMode = tileMode; macroTiledOK = HwlGetAlignmentInfoMacroTiled(&input, ¯oWidthAlign, ¯oHeightAlign, ¯oSizeAlign); if (macroTiledOK && DegradeTo1D(width, height, macroWidthAlign, macroHeightAlign)) { tileMode = ADDR_TM_1D_TILED_THICK; } } } } } if (macroTiledOK) { if ((pInOut->flags.minimizeAlignment == TRUE) && (pInOut->numSamples <= 1) && (IsMacroTiled(tileMode) == TRUE)) { UINT_32 macroSize = PowTwoAlign(width, macroWidthAlign) * PowTwoAlign(height, macroHeightAlign); UINT_32 microSize = PowTwoAlign(width, MicroTileWidth) * PowTwoAlign(height, MicroTileHeight); if (macroSize > microSize) { tileMode = (thickness == 1) ? ADDR_TM_1D_TILED_THIN1 : ADDR_TM_1D_TILED_THICK; } } if ((pInOut->maxBaseAlign != 0) && (IsMacroTiled(tileMode) == TRUE)) { if (macroSizeAlign > pInOut->maxBaseAlign) { if (pInOut->numSamples > 1) { ADDR_ASSERT(pInOut->maxBaseAlign >= Block64K); convertToPrt = TRUE; } else if (pInOut->maxBaseAlign < Block64K) { tileMode = (thickness == 1) ? ADDR_TM_1D_TILED_THIN1 : ADDR_TM_1D_TILED_THICK; } else { convertToPrt = TRUE; } } } } } } if (convertToPrt) { if ((pInOut->flags.matchStencilTileCfg == TRUE) && (pInOut->numSamples <= 1)) { pInOut->tileMode = ADDR_TM_1D_TILED_THIN1; } else { HwlSetPrtTileMode(pInOut); } } else if (tileMode != pInOut->tileMode) { pInOut->tileMode = tileMode; } HwlOptimizeTileMode(pInOut); } /** **************************************************************************************************** * Lib::DegradeLargeThickTile * * @brief * Check if the thickness needs to be reduced if a tile is too large * @return * The degraded tile mode (unchanged if not degraded) **************************************************************************************************** */ AddrTileMode Lib::DegradeLargeThickTile( AddrTileMode tileMode, UINT_32 bpp) const { // Override tilemode // When tile_width (8) * tile_height (8) * thickness * element_bytes is > row_size, // it is better to just use THIN mode in this case UINT_32 thickness = Thickness(tileMode); if (thickness > 1 && m_configFlags.allowLargeThickTile == 0) { UINT_32 tileSize = MicroTilePixels * thickness * (bpp >> 3); if (tileSize > m_rowSize) { switch (tileMode) { case ADDR_TM_2D_TILED_XTHICK: if ((tileSize >> 1) <= m_rowSize) { tileMode = ADDR_TM_2D_TILED_THICK; break; } // else fall through case ADDR_TM_2D_TILED_THICK: tileMode = ADDR_TM_2D_TILED_THIN1; break; case ADDR_TM_3D_TILED_XTHICK: if ((tileSize >> 1) <= m_rowSize) { tileMode = ADDR_TM_3D_TILED_THICK; break; } // else fall through case ADDR_TM_3D_TILED_THICK: tileMode = ADDR_TM_3D_TILED_THIN1; break; case ADDR_TM_PRT_TILED_THICK: tileMode = ADDR_TM_PRT_TILED_THIN1; break; case ADDR_TM_PRT_2D_TILED_THICK: tileMode = ADDR_TM_PRT_2D_TILED_THIN1; break; case ADDR_TM_PRT_3D_TILED_THICK: tileMode = ADDR_TM_PRT_3D_TILED_THIN1; break; default: break; } } } return tileMode; } /** **************************************************************************************************** * Lib::PostComputeMipLevel * @brief * Compute MipLevel info (including level 0) after surface adjustment * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::PostComputeMipLevel( ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in,out] Input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] Output structure ) const { // Mipmap including level 0 must be pow2 padded since either SI hw expects so or it is // required by CFX for Hw Compatibility between NI and SI. Otherwise it is only needed for // mipLevel > 0. Any h/w has different requirement should implement its own virtual function if (pIn->flags.pow2Pad) { pIn->width = NextPow2(pIn->width); pIn->height = NextPow2(pIn->height); pIn->numSlices = NextPow2(pIn->numSlices); } else if (pIn->mipLevel > 0) { pIn->width = NextPow2(pIn->width); pIn->height = NextPow2(pIn->height); if (!pIn->flags.cube) { pIn->numSlices = NextPow2(pIn->numSlices); } // for cubemap, we keep its value at first } return ADDR_OK; } /** **************************************************************************************************** * Lib::HwlSetupTileCfg * * @brief * Map tile index to tile setting. * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::HwlSetupTileCfg( UINT_32 bpp, ///< Bits per pixel INT_32 index, ///< [in] Tile index INT_32 macroModeIndex, ///< [in] Index in macro tile mode table(CI) ADDR_TILEINFO* pInfo, ///< [out] Tile Info AddrTileMode* pMode, ///< [out] Tile mode AddrTileType* pType ///< [out] Tile type ) const { return ADDR_NOTSUPPORTED; } /** **************************************************************************************************** * Lib::HwlGetPipes * * @brief * Get number pipes * @return * num pipes **************************************************************************************************** */ UINT_32 Lib::HwlGetPipes( const ADDR_TILEINFO* pTileInfo ///< [in] Tile info ) const { //pTileInfo can be NULL when asic is 6xx and 8xx. return m_pipes; } /** **************************************************************************************************** * Lib::ComputeQbStereoInfo * * @brief * Get quad buffer stereo information * @return * N/A **************************************************************************************************** */ VOID Lib::ComputeQbStereoInfo( ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [in,out] updated pOut+pStereoInfo ) const { ADDR_ASSERT(pOut->bpp >= 8); ADDR_ASSERT((pOut->surfSize % pOut->baseAlign) == 0); // Save original height pOut->pStereoInfo->eyeHeight = pOut->height; // Right offset pOut->pStereoInfo->rightOffset = static_cast(pOut->surfSize); pOut->pStereoInfo->rightSwizzle = HwlComputeQbStereoRightSwizzle(pOut); // Double height pOut->height <<= 1; pOut->pixelHeight <<= 1; // Double size pOut->surfSize <<= 1; // Right start address meets the base align since it is guaranteed by AddrLib1 // 1D surface on SI may break this rule, but we can force it to meet by checking .qbStereo. } /** **************************************************************************************************** * Lib::ComputePrtInfo * * @brief * Compute prt surface related info * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE Lib::ComputePrtInfo( const ADDR_PRT_INFO_INPUT* pIn, ADDR_PRT_INFO_OUTPUT* pOut) const { ADDR_ASSERT(pOut != NULL); ADDR_E_RETURNCODE returnCode = ADDR_OK; UINT_32 expandX = 1; UINT_32 expandY = 1; ElemMode elemMode; UINT_32 bpp = GetElemLib()->GetBitsPerPixel(pIn->format, &elemMode, &expandX, &expandY); if (bpp <8 || bpp == 24 || bpp == 48 || bpp == 96) { returnCode = ADDR_INVALIDPARAMS; } UINT_32 numFrags = pIn->numFrags; ADDR_ASSERT(numFrags <= 8); UINT_32 tileWidth = 0; UINT_32 tileHeight = 0; if (returnCode == ADDR_OK) { // 3D texture without depth or 2d texture if (pIn->baseMipDepth > 1 || pIn->baseMipHeight > 1) { if (bpp == 8) { tileWidth = 256; tileHeight = 256; } else if (bpp == 16) { tileWidth = 256; tileHeight = 128; } else if (bpp == 32) { tileWidth = 128; tileHeight = 128; } else if (bpp == 64) { // assume it is BC1/4 tileWidth = 512; tileHeight = 256; if (elemMode == ADDR_UNCOMPRESSED) { tileWidth = 128; tileHeight = 64; } } else if (bpp == 128) { // assume it is BC2/3/5/6H/7 tileWidth = 256; tileHeight = 256; if (elemMode == ADDR_UNCOMPRESSED) { tileWidth = 64; tileHeight = 64; } } if (numFrags == 2) { tileWidth = tileWidth / 2; } else if (numFrags == 4) { tileWidth = tileWidth / 2; tileHeight = tileHeight / 2; } else if (numFrags == 8) { tileWidth = tileWidth / 4; tileHeight = tileHeight / 2; } } else // 1d { tileHeight = 1; if (bpp == 8) { tileWidth = 65536; } else if (bpp == 16) { tileWidth = 32768; } else if (bpp == 32) { tileWidth = 16384; } else if (bpp == 64) { tileWidth = 8192; } else if (bpp == 128) { tileWidth = 4096; } } } pOut->prtTileWidth = tileWidth; pOut->prtTileHeight = tileHeight; return returnCode; } } // V1 } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrlib1.h000066400000000000000000000506671420110115200234650ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrlib1.h * @brief Contains the Addr::V1::Lib class definition. **************************************************************************************************** */ #ifndef __ADDR_LIB1_H__ #define __ADDR_LIB1_H__ #include "addrlib.h" namespace rocr { namespace Addr { namespace V1 { /** **************************************************************************************************** * @brief Neutral enums that define bank swap size **************************************************************************************************** */ enum SampleSplitSize { ADDR_SAMPLESPLIT_1KB = 1024, ADDR_SAMPLESPLIT_2KB = 2048, ADDR_SAMPLESPLIT_4KB = 4096, ADDR_SAMPLESPLIT_8KB = 8192, }; /** **************************************************************************************************** * @brief Flags for AddrTileMode **************************************************************************************************** */ struct TileModeFlags { UINT_32 thickness : 4; UINT_32 isLinear : 1; UINT_32 isMicro : 1; UINT_32 isMacro : 1; UINT_32 isMacro3d : 1; UINT_32 isPrt : 1; UINT_32 isPrtNoRotation : 1; UINT_32 isBankSwapped : 1; }; static const UINT_32 Block64K = 0x10000; static const UINT_32 PrtTileSize = Block64K; /** **************************************************************************************************** * @brief This class contains asic independent address lib functionalities **************************************************************************************************** */ class Lib : public Addr::Lib { public: virtual ~Lib(); static Lib* GetLib( ADDR_HANDLE hLib); /// Returns tileIndex support BOOL_32 UseTileIndex(INT_32 index) const { return m_configFlags.useTileIndex && (index != TileIndexInvalid); } /// Returns combined swizzle support BOOL_32 UseCombinedSwizzle() const { return m_configFlags.useCombinedSwizzle; } // // Interface stubs // ADDR_E_RETURNCODE ComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceAddrFromCoord( const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceCoordFromAddr( const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSliceTileSwizzle( const ADDR_COMPUTE_SLICESWIZZLE_INPUT* pIn, ADDR_COMPUTE_SLICESWIZZLE_OUTPUT* pOut) const; ADDR_E_RETURNCODE ExtractBankPipeSwizzle( const ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT* pIn, ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT* pOut) const; ADDR_E_RETURNCODE CombineBankPipeSwizzle( const ADDR_COMBINE_BANKPIPE_SWIZZLE_INPUT* pIn, ADDR_COMBINE_BANKPIPE_SWIZZLE_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeBaseSwizzle( const ADDR_COMPUTE_BASE_SWIZZLE_INPUT* pIn, ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut); ADDR_E_RETURNCODE ComputeFmaskAddrFromCoord( const ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeFmaskCoordFromAddr( const ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut) const; ADDR_E_RETURNCODE ConvertTileInfoToHW( const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut) const; ADDR_E_RETURNCODE ConvertTileIndex( const ADDR_CONVERT_TILEINDEX_INPUT* pIn, ADDR_CONVERT_TILEINDEX_OUTPUT* pOut) const; ADDR_E_RETURNCODE GetMacroModeIndex( const ADDR_GET_MACROMODEINDEX_INPUT* pIn, ADDR_GET_MACROMODEINDEX_OUTPUT* pOut) const; ADDR_E_RETURNCODE ConvertTileIndex1( const ADDR_CONVERT_TILEINDEX1_INPUT* pIn, ADDR_CONVERT_TILEINDEX_OUTPUT* pOut) const; ADDR_E_RETURNCODE GetTileIndex( const ADDR_GET_TILEINDEX_INPUT* pIn, ADDR_GET_TILEINDEX_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeHtileInfo( const ADDR_COMPUTE_HTILE_INFO_INPUT* pIn, ADDR_COMPUTE_HTILE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeCmaskInfo( const ADDR_COMPUTE_CMASK_INFO_INPUT* pIn, ADDR_COMPUTE_CMASK_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeDccInfo( const ADDR_COMPUTE_DCCINFO_INPUT* pIn, ADDR_COMPUTE_DCCINFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeHtileAddrFromCoord( const ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeCmaskAddrFromCoord( const ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeHtileCoordFromAddr( const ADDR_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeCmaskCoordFromAddr( const ADDR_COMPUTE_CMASK_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_CMASK_COORDFROMADDR_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputePrtInfo( const ADDR_PRT_INFO_INPUT* pIn, ADDR_PRT_INFO_OUTPUT* pOut) const; protected: Lib(); // Constructor is protected Lib(const Client* pClient); /// Pure Virtual function for Hwl computing surface info virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl computing surface address from coord virtual ADDR_E_RETURNCODE HwlComputeSurfaceAddrFromCoord( const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl computing surface coord from address virtual ADDR_E_RETURNCODE HwlComputeSurfaceCoordFromAddr( const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl computing surface tile swizzle virtual ADDR_E_RETURNCODE HwlComputeSliceTileSwizzle( const ADDR_COMPUTE_SLICESWIZZLE_INPUT* pIn, ADDR_COMPUTE_SLICESWIZZLE_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl extracting bank/pipe swizzle from base256b virtual ADDR_E_RETURNCODE HwlExtractBankPipeSwizzle( const ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT* pIn, ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl combining bank/pipe swizzle virtual ADDR_E_RETURNCODE HwlCombineBankPipeSwizzle( UINT_32 bankSwizzle, UINT_32 pipeSwizzle, ADDR_TILEINFO* pTileInfo, UINT_64 baseAddr, UINT_32* pTileSwizzle) const = 0; /// Pure Virtual function for Hwl computing base swizzle virtual ADDR_E_RETURNCODE HwlComputeBaseSwizzle( const ADDR_COMPUTE_BASE_SWIZZLE_INPUT* pIn, ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl computing HTILE base align virtual UINT_32 HwlComputeHtileBaseAlign( BOOL_32 isTcCompatible, BOOL_32 isLinear, ADDR_TILEINFO* pTileInfo) const = 0; /// Pure Virtual function for Hwl computing HTILE bpp virtual UINT_32 HwlComputeHtileBpp( BOOL_32 isWidth8, BOOL_32 isHeight8) const = 0; /// Pure Virtual function for Hwl computing HTILE bytes virtual UINT_64 HwlComputeHtileBytes( UINT_32 pitch, UINT_32 height, UINT_32 bpp, BOOL_32 isLinear, UINT_32 numSlices, UINT_64* pSliceBytes, UINT_32 baseAlign) const = 0; /// Pure Virtual function for Hwl computing FMASK info virtual ADDR_E_RETURNCODE HwlComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut) = 0; /// Pure Virtual function for Hwl FMASK address from coord virtual ADDR_E_RETURNCODE HwlComputeFmaskAddrFromCoord( const ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl FMASK coord from address virtual ADDR_E_RETURNCODE HwlComputeFmaskCoordFromAddr( const ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl convert tile info from real value to HW value virtual ADDR_E_RETURNCODE HwlConvertTileInfoToHW( const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut) const = 0; /// Pure Virtual function for Hwl compute mipmap info virtual BOOL_32 HwlComputeMipLevel( ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn) const = 0; /// Pure Virtual function for Hwl compute max cmask blockMax value virtual BOOL_32 HwlGetMaxCmaskBlockMax() const = 0; /// Pure Virtual function for Hwl compute fmask bits virtual UINT_32 HwlComputeFmaskBits( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, UINT_32* pNumSamples) const = 0; /// Virtual function to get index (not pure then no need to implement this in all hwls virtual ADDR_E_RETURNCODE HwlGetTileIndex( const ADDR_GET_TILEINDEX_INPUT* pIn, ADDR_GET_TILEINDEX_OUTPUT* pOut) const { return ADDR_NOTSUPPORTED; } /// Virtual function for Hwl to compute Dcc info virtual ADDR_E_RETURNCODE HwlComputeDccInfo( const ADDR_COMPUTE_DCCINFO_INPUT* pIn, ADDR_COMPUTE_DCCINFO_OUTPUT* pOut) const { return ADDR_NOTSUPPORTED; } /// Virtual function to get cmask address for tc compatible cmask virtual ADDR_E_RETURNCODE HwlComputeCmaskAddrFromCoord( const ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) const { return ADDR_NOTSUPPORTED; } /// Virtual function to get htile address for tc compatible htile virtual ADDR_E_RETURNCODE HwlComputeHtileAddrFromCoord( const ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) const { return ADDR_NOTSUPPORTED; } // Compute attributes // HTILE UINT_32 ComputeHtileInfo( ADDR_HTILE_FLAGS flags, UINT_32 pitchIn, UINT_32 heightIn, UINT_32 numSlices, BOOL_32 isLinear, BOOL_32 isWidth8, BOOL_32 isHeight8, ADDR_TILEINFO* pTileInfo, UINT_32* pPitchOut, UINT_32* pHeightOut, UINT_64* pHtileBytes, UINT_32* pMacroWidth = NULL, UINT_32* pMacroHeight = NULL, UINT_64* pSliceSize = NULL, UINT_32* pBaseAlign = NULL) const; // CMASK ADDR_E_RETURNCODE ComputeCmaskInfo( ADDR_CMASK_FLAGS flags, UINT_32 pitchIn, UINT_32 heightIn, UINT_32 numSlices, BOOL_32 isLinear, ADDR_TILEINFO* pTileInfo, UINT_32* pPitchOut, UINT_32* pHeightOut, UINT_64* pCmaskBytes, UINT_32* pMacroWidth, UINT_32* pMacroHeight, UINT_64* pSliceSize = NULL, UINT_32* pBaseAlign = NULL, UINT_32* pBlockMax = NULL) const; virtual VOID HwlComputeTileDataWidthAndHeightLinear( UINT_32* pMacroWidth, UINT_32* pMacroHeight, UINT_32 bpp, ADDR_TILEINFO* pTileInfo) const; // CMASK & HTILE addressing virtual UINT_64 HwlComputeXmaskAddrFromCoord( UINT_32 pitch, UINT_32 height, UINT_32 x, UINT_32 y, UINT_32 slice, UINT_32 numSlices, UINT_32 factor, BOOL_32 isLinear, BOOL_32 isWidth8, BOOL_32 isHeight8, ADDR_TILEINFO* pTileInfo, UINT_32* bitPosition) const; virtual VOID HwlComputeXmaskCoordFromAddr( UINT_64 addr, UINT_32 bitPosition, UINT_32 pitch, UINT_32 height, UINT_32 numSlices, UINT_32 factor, BOOL_32 isLinear, BOOL_32 isWidth8, BOOL_32 isHeight8, ADDR_TILEINFO* pTileInfo, UINT_32* pX, UINT_32* pY, UINT_32* pSlice) const; // Surface mipmap VOID ComputeMipLevel( ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn) const; /// Pure Virtual function for Hwl to get macro tiled alignment info virtual BOOL_32 HwlGetAlignmentInfoMacroTiled( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32* pPitchAlign, UINT_32* pHeightAlign, UINT_32* pSizeAlign) const = 0; virtual VOID HwlOverrideTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const { // not supported in hwl layer } virtual VOID HwlOptimizeTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const { // not supported in hwl layer } virtual VOID HwlSelectTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const { // not supported in hwl layer } AddrTileMode DegradeLargeThickTile(AddrTileMode tileMode, UINT_32 bpp) const; VOID PadDimensions( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples, ADDR_TILEINFO* pTileInfo, UINT_32 padDims, UINT_32 mipLevel, UINT_32* pPitch, UINT_32* pPitchAlign, UINT_32* pHeight, UINT_32 heightAlign, UINT_32* pSlices, UINT_32 sliceAlign) const; virtual VOID HwlPadDimensions( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples, ADDR_TILEINFO* pTileInfo, UINT_32 mipLevel, UINT_32* pPitch, UINT_32* pPitchAlign, UINT_32 height, UINT_32 heightAlign) const { } // // Addressing shared for linear/1D tiling // UINT_64 ComputeSurfaceAddrFromCoordLinear( UINT_32 x, UINT_32 y, UINT_32 slice, UINT_32 sample, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSlices, UINT_32* pBitPosition) const; VOID ComputeSurfaceCoordFromAddrLinear( UINT_64 addr, UINT_32 bitPosition, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSlices, UINT_32* pX, UINT_32* pY, UINT_32* pSlice, UINT_32* pSample) const; VOID ComputeSurfaceCoordFromAddrMicroTiled( UINT_64 addr, UINT_32 bitPosition, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, AddrTileMode tileMode, UINT_32 tileBase, UINT_32 compBits, UINT_32* pX, UINT_32* pY, UINT_32* pSlice, UINT_32* pSample, AddrTileType microTileType, BOOL_32 isDepthSampleOrder) const; ADDR_E_RETURNCODE ComputeMicroTileEquation( UINT_32 bpp, AddrTileMode tileMode, AddrTileType microTileType, ADDR_EQUATION* pEquation) const; UINT_32 ComputePixelIndexWithinMicroTile( UINT_32 x, UINT_32 y, UINT_32 z, UINT_32 bpp, AddrTileMode tileMode, AddrTileType microTileType) const; /// Pure Virtual function for Hwl computing coord from offset inside micro tile virtual VOID HwlComputePixelCoordFromOffset( UINT_32 offset, UINT_32 bpp, UINT_32 numSamples, AddrTileMode tileMode, UINT_32 tileBase, UINT_32 compBits, UINT_32* pX, UINT_32* pY, UINT_32* pSlice, UINT_32* pSample, AddrTileType microTileType, BOOL_32 isDepthSampleOrder) const = 0; // // Addressing shared by all // virtual UINT_32 HwlGetPipes( const ADDR_TILEINFO* pTileInfo) const; UINT_32 ComputePipeFromAddr( UINT_64 addr, UINT_32 numPipes) const; virtual ADDR_E_RETURNCODE ComputePipeEquation( UINT_32 log2BytesPP, UINT_32 threshX, UINT_32 threshY, ADDR_TILEINFO* pTileInfo, ADDR_EQUATION* pEquation) const { return ADDR_NOTSUPPORTED; } /// Pure Virtual function for Hwl computing pipe from coord virtual UINT_32 ComputePipeFromCoord( UINT_32 x, UINT_32 y, UINT_32 slice, AddrTileMode tileMode, UINT_32 pipeSwizzle, BOOL_32 flags, ADDR_TILEINFO* pTileInfo) const = 0; /// Pure Virtual function for Hwl computing coord Y for 8 pipe cmask/htile virtual UINT_32 HwlComputeXmaskCoordYFrom8Pipe( UINT_32 pipe, UINT_32 x) const = 0; // // Misc helper // static const TileModeFlags ModeFlags[ADDR_TM_COUNT]; static UINT_32 Thickness( AddrTileMode tileMode); // Checking tile mode static BOOL_32 IsMacroTiled(AddrTileMode tileMode); static BOOL_32 IsMacro3dTiled(AddrTileMode tileMode); static BOOL_32 IsLinear(AddrTileMode tileMode); static BOOL_32 IsMicroTiled(AddrTileMode tileMode); static BOOL_32 IsPrtTileMode(AddrTileMode tileMode); static BOOL_32 IsPrtNoRotationTileMode(AddrTileMode tileMode); /// Return TRUE if tile info is needed BOOL_32 UseTileInfo() const { return !m_configFlags.ignoreTileInfo; } /// Adjusts pitch alignment for flipping surface VOID AdjustPitchAlignment( ADDR_SURFACE_FLAGS flags, UINT_32* pPitchAlign) const; /// Overwrite tile config according to tile index virtual ADDR_E_RETURNCODE HwlSetupTileCfg( UINT_32 bpp, INT_32 index, INT_32 macroModeIndex, ADDR_TILEINFO* pInfo, AddrTileMode* mode = NULL, AddrTileType* type = NULL) const; /// Overwrite macro tile config according to tile index virtual INT_32 HwlComputeMacroModeIndex( INT_32 index, ADDR_SURFACE_FLAGS flags, UINT_32 bpp, UINT_32 numSamples, ADDR_TILEINFO* pTileInfo, AddrTileMode *pTileMode = NULL, AddrTileType *pTileType = NULL ) const { return TileIndexNoMacroIndex; } /// Pre-handler of 3x pitch (96 bit) adjustment virtual UINT_32 HwlPreHandleBaseLvl3xPitch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32 expPitch) const; /// Post-handler of 3x pitch adjustment virtual UINT_32 HwlPostHandleBaseLvl3xPitch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32 expPitch) const; /// Check miplevel after surface adjustment ADDR_E_RETURNCODE PostComputeMipLevel( ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; /// Quad buffer stereo support, has its implementation in ind. layer VOID ComputeQbStereoInfo( ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; /// Pure virutual function to compute stereo bank swizzle for right eye virtual UINT_32 HwlComputeQbStereoRightSwizzle( ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const = 0; VOID OptimizeTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; /// Overwrite tile setting to PRT virtual VOID HwlSetPrtTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const { } static BOOL_32 DegradeTo1D( UINT_32 width, UINT_32 height, UINT_32 macroTilePitchAlign, UINT_32 macroTileHeightAlign); private: // Disallow the copy constructor Lib(const Lib& a); // Disallow the assignment operator Lib& operator=(const Lib& a); UINT_32 ComputeCmaskBaseAlign( ADDR_CMASK_FLAGS flags, ADDR_TILEINFO* pTileInfo) const; UINT_64 ComputeCmaskBytes( UINT_32 pitch, UINT_32 height, UINT_32 numSlices) const; // // CMASK/HTILE shared methods // VOID ComputeTileDataWidthAndHeight( UINT_32 bpp, UINT_32 cacheBits, ADDR_TILEINFO* pTileInfo, UINT_32* pMacroWidth, UINT_32* pMacroHeight) const; UINT_32 ComputeXmaskCoordYFromPipe( UINT_32 pipe, UINT_32 x) const; }; } // V1 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrlib2.cpp000066400000000000000000001766471420110115200240300ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** ************************************************************************************************************************ * @file addrlib2.cpp * @brief Contains the implementation for the AddrLib2 base class. ************************************************************************************************************************ */ #include "addrinterface.h" #include "addrlib2.h" #include "addrcommon.h" namespace rocr { namespace Addr { namespace V2 { //////////////////////////////////////////////////////////////////////////////////////////////////// // Static Const Member //////////////////////////////////////////////////////////////////////////////////////////////////// const Dim2d Lib::Block256_2d[] = {{16, 16}, {16, 8}, {8, 8}, {8, 4}, {4, 4}}; const Dim3d Lib::Block1K_3d[] = {{16, 8, 8}, {8, 8, 8}, {8, 8, 4}, {8, 4, 4}, {4, 4, 4}}; //////////////////////////////////////////////////////////////////////////////////////////////////// // Constructor/Destructor //////////////////////////////////////////////////////////////////////////////////////////////////// /** ************************************************************************************************************************ * Lib::Lib * * @brief * Constructor for the Addr::V2::Lib class * ************************************************************************************************************************ */ Lib::Lib() : Addr::Lib(), m_se(0), m_rbPerSe(0), m_maxCompFrag(0), m_banksLog2(0), m_pipesLog2(0), m_seLog2(0), m_rbPerSeLog2(0), m_maxCompFragLog2(0), m_pipeInterleaveLog2(0), m_blockVarSizeLog2(0), m_numEquations(0) { } /** ************************************************************************************************************************ * Lib::Lib * * @brief * Constructor for the AddrLib2 class with hClient as parameter * ************************************************************************************************************************ */ Lib::Lib(const Client* pClient) : Addr::Lib(pClient), m_se(0), m_rbPerSe(0), m_maxCompFrag(0), m_banksLog2(0), m_pipesLog2(0), m_seLog2(0), m_rbPerSeLog2(0), m_maxCompFragLog2(0), m_pipeInterleaveLog2(0), m_blockVarSizeLog2(0), m_numEquations(0) { } /** ************************************************************************************************************************ * Lib::~Lib * * @brief * Destructor for the AddrLib2 class * ************************************************************************************************************************ */ Lib::~Lib() { } /** ************************************************************************************************************************ * Lib::GetLib * * @brief * Get Addr::V2::Lib pointer * * @return * An Addr::V2::Lib class pointer ************************************************************************************************************************ */ Lib* Lib::GetLib( ADDR_HANDLE hLib) ///< [in] handle of ADDR_HANDLE { Addr::Lib* pAddrLib = Addr::Lib::GetLib(hLib); if ((pAddrLib != NULL) && (pAddrLib->GetChipFamily() <= ADDR_CHIP_FAMILY_VI)) { // only valid and GFX9+ ASIC can use AddrLib2 function. ADDR_ASSERT_ALWAYS(); hLib = NULL; } return static_cast(hLib); } //////////////////////////////////////////////////////////////////////////////////////////////////// // Surface Methods //////////////////////////////////////////////////////////////////////////////////////////////////// /** ************************************************************************************************************************ * Lib::ComputeSurfaceInfo * * @brief * Interface function stub of AddrComputeSurfaceInfo. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceInfo( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR2_COMPUTE_SURFACE_INFO_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_SURFACE_INFO_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } // Adjust coming parameters. ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = *pIn; localIn.width = Max(pIn->width, 1u); localIn.height = Max(pIn->height, 1u); localIn.numMipLevels = Max(pIn->numMipLevels, 1u); localIn.numSlices = Max(pIn->numSlices, 1u); localIn.numSamples = Max(pIn->numSamples, 1u); localIn.numFrags = (localIn.numFrags == 0) ? localIn.numSamples : pIn->numFrags; UINT_32 expandX = 1; UINT_32 expandY = 1; ElemMode elemMode = ADDR_UNCOMPRESSED; if (returnCode == ADDR_OK) { // Set format to INVALID will skip this conversion if (localIn.format != ADDR_FMT_INVALID) { // Get compression/expansion factors and element mode which indicates compression/expansion localIn.bpp = GetElemLib()->GetBitsPerPixel(localIn.format, &elemMode, &expandX, &expandY); // Special flag for 96 bit surface. 96 (or 48 if we support) bit surface's width is // pre-multiplied by 3 and bpp is divided by 3. So pitch alignment for linear- // aligned does not meet 64-pixel in real. We keep special handling in hwl since hw // restrictions are different. // Also Mip 1+ needs an element pitch of 32 bits so we do not need this workaround // but we use this flag to skip RestoreSurfaceInfo below if ((elemMode == ADDR_EXPANDED) && (expandX > 1)) { ADDR_ASSERT(IsLinear(localIn.swizzleMode)); } UINT_32 basePitch = 0; GetElemLib()->AdjustSurfaceInfo(elemMode, expandX, expandY, &localIn.bpp, &basePitch, &localIn.width, &localIn.height); // Overwrite these parameters if we have a valid format } if (localIn.bpp != 0) { localIn.width = Max(localIn.width, 1u); localIn.height = Max(localIn.height, 1u); } else // Rule out some invalid parameters { ADDR_ASSERT_ALWAYS(); returnCode = ADDR_INVALIDPARAMS; } } if (returnCode == ADDR_OK) { returnCode = ComputeSurfaceInfoSanityCheck(&localIn); } if (returnCode == ADDR_OK) { VerifyMipLevelInfo(pIn); if (IsLinear(pIn->swizzleMode)) { // linear mode returnCode = ComputeSurfaceInfoLinear(&localIn, pOut); } else { // tiled mode returnCode = ComputeSurfaceInfoTiled(&localIn, pOut); } if (returnCode == ADDR_OK) { pOut->bpp = localIn.bpp; pOut->pixelPitch = pOut->pitch; pOut->pixelHeight = pOut->height; pOut->pixelMipChainPitch = pOut->mipChainPitch; pOut->pixelMipChainHeight = pOut->mipChainHeight; pOut->pixelBits = localIn.bpp; if (localIn.format != ADDR_FMT_INVALID) { UINT_32 pixelBits = pOut->pixelBits; GetElemLib()->RestoreSurfaceInfo(elemMode, expandX, expandY, &pOut->pixelBits, &pOut->pixelPitch, &pOut->pixelHeight); GetElemLib()->RestoreSurfaceInfo(elemMode, expandX, expandY, &pixelBits, &pOut->pixelMipChainPitch, &pOut->pixelMipChainHeight); if ((localIn.numMipLevels > 1) && (pOut->pMipInfo != NULL)) { for (UINT_32 i = 0; i < localIn.numMipLevels; i++) { pOut->pMipInfo[i].pixelPitch = pOut->pMipInfo[i].pitch; pOut->pMipInfo[i].pixelHeight = pOut->pMipInfo[i].height; GetElemLib()->RestoreSurfaceInfo(elemMode, expandX, expandY, &pixelBits, &pOut->pMipInfo[i].pixelPitch, &pOut->pMipInfo[i].pixelHeight); } } } if (localIn.flags.needEquation && (Log2(localIn.numFrags) == 0)) { pOut->equationIndex = GetEquationIndex(&localIn, pOut); } if (localIn.flags.qbStereo) { if (pOut->pStereoInfo != NULL) { ComputeQbStereoInfo(pOut); } } } } ValidBaseAlignments(pOut->baseAlign); return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeSurfaceInfo * * @brief * Interface function stub of AddrComputeSurfaceInfo. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceAddrFromCoord( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT localIn = *pIn; localIn.unalignedWidth = Max(pIn->unalignedWidth, 1u); localIn.unalignedHeight = Max(pIn->unalignedHeight, 1u); localIn.numMipLevels = Max(pIn->numMipLevels, 1u); localIn.numSlices = Max(pIn->numSlices, 1u); localIn.numSamples = Max(pIn->numSamples, 1u); localIn.numFrags = Max(pIn->numFrags, 1u); if ((localIn.bpp < 8) || (localIn.bpp > 128) || ((localIn.bpp % 8) != 0) || (localIn.sample >= localIn.numSamples) || (localIn.slice >= localIn.numSlices) || (localIn.mipId >= localIn.numMipLevels) || (IsTex3d(localIn.resourceType) && (Valid3DMipSliceIdConstraint(localIn.numSlices, localIn.mipId, localIn.slice) == FALSE))) { returnCode = ADDR_INVALIDPARAMS; } if (returnCode == ADDR_OK) { if (IsLinear(localIn.swizzleMode)) { returnCode = ComputeSurfaceAddrFromCoordLinear(&localIn, pOut); } else { returnCode = ComputeSurfaceAddrFromCoordTiled(&localIn, pOut); } if (returnCode == ADDR_OK) { pOut->prtBlockIndex = static_cast(pOut->addr / (64 * 1024)); } } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeSurfaceCoordFromAddr * * @brief * Interface function stub of ComputeSurfaceCoordFromAddr. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceCoordFromAddr( const ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (GetFillSizeFieldsFlags() == TRUE) { if ((pIn->size != sizeof(ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT))) { returnCode = ADDR_PARAMSIZEMISMATCH; } } if ((pIn->bpp < 8) || (pIn->bpp > 128) || ((pIn->bpp % 8) != 0) || (pIn->bitPosition >= 8)) { returnCode = ADDR_INVALIDPARAMS; } if (returnCode == ADDR_OK) { if (IsLinear(pIn->swizzleMode)) { returnCode = ComputeSurfaceCoordFromAddrLinear(pIn, pOut); } else { returnCode = ComputeSurfaceCoordFromAddrTiled(pIn, pOut); } } return returnCode; } //////////////////////////////////////////////////////////////////////////////////////////////////// // CMASK/HTILE //////////////////////////////////////////////////////////////////////////////////////////////////// /** ************************************************************************************************************************ * Lib::ComputeHtileInfo * * @brief * Interface function stub of AddrComputeHtilenfo * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeHtileInfo( const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_HTILE_INFO_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_HTILE_INFO_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeHtileInfo(pIn, pOut); ValidMetaBaseAlignments(pOut->baseAlign); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeHtileAddrFromCoord * * @brief * Interface function stub of AddrComputeHtileAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeHtileAddrFromCoord( const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeHtileAddrFromCoord(pIn, pOut); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeHtileCoordFromAddr * * @brief * Interface function stub of AddrComputeHtileCoordFromAddr * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeHtileCoordFromAddr( const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeHtileCoordFromAddr(pIn, pOut); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeCmaskInfo * * @brief * Interface function stub of AddrComputeCmaskInfo * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeCmaskInfo( const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_CMASK_INFO_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_CMASK_INFO_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else if (pIn->cMaskFlags.linear) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeCmaskInfo(pIn, pOut); ValidMetaBaseAlignments(pOut->baseAlign); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeCmaskAddrFromCoord * * @brief * Interface function stub of AddrComputeCmaskAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeCmaskAddrFromCoord( const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeCmaskAddrFromCoord(pIn, pOut); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeCmaskCoordFromAddr * * @brief * Interface function stub of AddrComputeCmaskCoordFromAddr * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeCmaskCoordFromAddr( const ADDR2_COMPUTE_CMASK_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_CMASK_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_NOTIMPLEMENTED; ADDR_NOT_IMPLEMENTED(); return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeFmaskInfo * * @brief * Interface function stub of ComputeFmaskInfo. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeFmaskInfo( const ADDR2_COMPUTE_FMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_FMASK_INFO_OUTPUT* pOut ///< [out] output structure ) { ADDR_E_RETURNCODE returnCode; BOOL_32 valid = (IsZOrderSwizzle(pIn->swizzleMode) == TRUE) && ((pIn->numSamples > 0) || (pIn->numFrags > 0)); if (GetFillSizeFieldsFlags()) { if ((pIn->size != sizeof(ADDR2_COMPUTE_FMASK_INFO_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_FMASK_INFO_OUTPUT))) { valid = FALSE; } } if (valid == FALSE) { returnCode = ADDR_INVALIDPARAMS; } else { ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = {0}; ADDR2_COMPUTE_SURFACE_INFO_OUTPUT localOut = {0}; localIn.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_INPUT); localOut.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_OUTPUT); localIn.swizzleMode = pIn->swizzleMode; localIn.numSlices = Max(pIn->numSlices, 1u); localIn.width = Max(pIn->unalignedWidth, 1u); localIn.height = Max(pIn->unalignedHeight, 1u); localIn.bpp = GetFmaskBpp(pIn->numSamples, pIn->numFrags); localIn.flags.fmask = 1; localIn.numFrags = 1; localIn.numSamples = 1; localIn.resourceType = ADDR_RSRC_TEX_2D; if (localIn.bpp == 8) { localIn.format = ADDR_FMT_8; } else if (localIn.bpp == 16) { localIn.format = ADDR_FMT_16; } else if (localIn.bpp == 32) { localIn.format = ADDR_FMT_32; } else { localIn.format = ADDR_FMT_32_32; } returnCode = ComputeSurfaceInfo(&localIn, &localOut); if (returnCode == ADDR_OK) { pOut->pitch = localOut.pitch; pOut->height = localOut.height; pOut->baseAlign = localOut.baseAlign; pOut->numSlices = localOut.numSlices; pOut->fmaskBytes = static_cast(localOut.surfSize); pOut->sliceSize = static_cast(localOut.sliceSize); pOut->bpp = localIn.bpp; pOut->numSamples = 1; } } ValidBaseAlignments(pOut->baseAlign); return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeFmaskAddrFromCoord * * @brief * Interface function stub of ComputeFmaskAddrFromCoord. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeFmaskAddrFromCoord( const ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_NOTIMPLEMENTED; ADDR_NOT_IMPLEMENTED(); return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeFmaskCoordFromAddr * * @brief * Interface function stub of ComputeFmaskAddrFromCoord. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeFmaskCoordFromAddr( const ADDR2_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_NOTIMPLEMENTED; ADDR_NOT_IMPLEMENTED(); return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeDccInfo * * @brief * Interface function to compute DCC key info * * @return * return code of HwlComputeDccInfo ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeDccInfo( const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_DCCINFO_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_DCCINFO_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeDccInfo(pIn, pOut); ValidMetaBaseAlignments(pOut->dccRamBaseAlign); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeDccAddrFromCoord * * @brief * Interface function stub of ComputeDccAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeDccAddrFromCoord( const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeDccAddrFromCoord(pIn, pOut); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputePipeBankXor * * @brief * Interface function stub of Addr2ComputePipeBankXor. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputePipeBankXor( const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut) { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_PIPEBANKXOR_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputePipeBankXor(pIn, pOut); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeSlicePipeBankXor * * @brief * Interface function stub of Addr2ComputeSlicePipeBankXor. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSlicePipeBankXor( const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut) { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else if ((IsThin(pIn->resourceType, pIn->swizzleMode) == FALSE) || (IsNonPrtXor(pIn->swizzleMode) == FALSE) || (pIn->numSamples > 1)) { returnCode = ADDR_NOTSUPPORTED; } else { returnCode = HwlComputeSlicePipeBankXor(pIn, pOut); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeSubResourceOffsetForSwizzlePattern * * @brief * Interface function stub of Addr2ComputeSubResourceOffsetForSwizzlePattern. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSubResourceOffsetForSwizzlePattern( const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut) { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT)) || (pOut->size != sizeof(ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeSubResourceOffsetForSwizzlePattern(pIn, pOut); } return returnCode; } /** ************************************************************************************************************************ * Lib::ExtractPipeBankXor * * @brief * Internal function to extract bank and pipe xor bits from combined xor bits. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ExtractPipeBankXor( UINT_32 pipeBankXor, UINT_32 bankBits, UINT_32 pipeBits, UINT_32* pBankX, UINT_32* pPipeX) { ADDR_E_RETURNCODE returnCode; if (pipeBankXor < (1u << (pipeBits + bankBits))) { *pPipeX = pipeBankXor % (1 << pipeBits); *pBankX = pipeBankXor >> pipeBits; returnCode = ADDR_OK; } else { ADDR_ASSERT_ALWAYS(); returnCode = ADDR_INVALIDPARAMS; } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeSurfaceInfoSanityCheck * * @brief * Internal function to do basic sanity check before compute surface info * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceInfoSanityCheck( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn ///< [in] input structure ) const { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && (pIn->size != sizeof(ADDR2_COMPUTE_SURFACE_INFO_INPUT))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlComputeSurfaceInfoSanityCheck(pIn); } return returnCode; } /** ************************************************************************************************************************ * Lib::ApplyCustomizedPitchHeight * * @brief * Helper function to override hw required row pitch/slice pitch by customrized one * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ApplyCustomizedPitchHeight( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure UINT_32 elementBytes, ///< [in] element bytes per element UINT_32 pitchAlignInElement, ///< [in] pitch alignment in element UINT_32* pPitch, ///< [in/out] pitch UINT_32* pHeight ///< [in/out] height ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pIn->numMipLevels <= 1) { if (pIn->pitchInElement > 0) { if ((pIn->pitchInElement % pitchAlignInElement) != 0) { returnCode = ADDR_INVALIDPARAMS; } else if (pIn->pitchInElement < (*pPitch)) { returnCode = ADDR_INVALIDPARAMS; } else { *pPitch = pIn->pitchInElement; } } if (returnCode == ADDR_OK) { if (pIn->sliceAlign > 0) { UINT_32 customizedHeight = pIn->sliceAlign / elementBytes / (*pPitch); if (customizedHeight * elementBytes * (*pPitch) != pIn->sliceAlign) { returnCode = ADDR_INVALIDPARAMS; } else if ((pIn->numSlices > 1) && ((*pHeight) != customizedHeight)) { returnCode = ADDR_INVALIDPARAMS; } else { *pHeight = customizedHeight; } } } } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeSurfaceInfoLinear * * @brief * Internal function to calculate alignment for linear swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceInfoLinear( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { return HwlComputeSurfaceInfoLinear(pIn, pOut); } /** ************************************************************************************************************************ * Lib::ComputeSurfaceInfoTiled * * @brief * Internal function to calculate alignment for tiled swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceInfoTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { return HwlComputeSurfaceInfoTiled(pIn, pOut); } /** ************************************************************************************************************************ * Lib::ComputeSurfaceAddrFromCoordLinear * * @brief * Internal function to calculate address from coord for linear swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceAddrFromCoordLinear( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; BOOL_32 valid = (pIn->numSamples <= 1) && (pIn->numFrags <= 1) && (pIn->pipeBankXor == 0); if (valid) { if (IsTex1d(pIn->resourceType)) { valid = (pIn->y == 0); } } if (valid) { ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = {0}; ADDR2_COMPUTE_SURFACE_INFO_OUTPUT localOut = {0}; ADDR2_MIP_INFO mipInfo[MaxMipLevels]; localIn.bpp = pIn->bpp; localIn.flags = pIn->flags; localIn.width = Max(pIn->unalignedWidth, 1u); localIn.height = Max(pIn->unalignedHeight, 1u); localIn.numSlices = Max(pIn->numSlices, 1u); localIn.numMipLevels = Max(pIn->numMipLevels, 1u); localIn.resourceType = pIn->resourceType; if (localIn.numMipLevels <= 1) { localIn.pitchInElement = pIn->pitchInElement; } localOut.pMipInfo = mipInfo; returnCode = ComputeSurfaceInfoLinear(&localIn, &localOut); if (returnCode == ADDR_OK) { pOut->addr = (localOut.sliceSize * pIn->slice) + mipInfo[pIn->mipId].offset + (pIn->y * mipInfo[pIn->mipId].pitch + pIn->x) * (pIn->bpp >> 3); pOut->bitPosition = 0; } else { valid = FALSE; } } if (valid == FALSE) { returnCode = ADDR_INVALIDPARAMS; } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeSurfaceAddrFromCoordTiled * * @brief * Internal function to calculate address from coord for tiled swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceAddrFromCoordTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { return HwlComputeSurfaceAddrFromCoordTiled(pIn, pOut); } /** ************************************************************************************************************************ * Lib::ComputeSurfaceCoordFromAddrLinear * * @brief * Internal function to calculate coord from address for linear swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceCoordFromAddrLinear( const ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; BOOL_32 valid = (pIn->numSamples <= 1) && (pIn->numFrags <= 1); if (valid) { if (IsTex1d(pIn->resourceType)) { valid = (pIn->unalignedHeight == 1); } } if (valid) { ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = {0}; ADDR2_COMPUTE_SURFACE_INFO_OUTPUT localOut = {0}; localIn.bpp = pIn->bpp; localIn.flags = pIn->flags; localIn.width = Max(pIn->unalignedWidth, 1u); localIn.height = Max(pIn->unalignedHeight, 1u); localIn.numSlices = Max(pIn->numSlices, 1u); localIn.numMipLevels = Max(pIn->numMipLevels, 1u); localIn.resourceType = pIn->resourceType; if (localIn.numMipLevels <= 1) { localIn.pitchInElement = pIn->pitchInElement; } returnCode = ComputeSurfaceInfoLinear(&localIn, &localOut); if (returnCode == ADDR_OK) { pOut->slice = static_cast(pIn->addr / localOut.sliceSize); pOut->sample = 0; UINT_32 offsetInSlice = static_cast(pIn->addr % localOut.sliceSize); UINT_32 elementBytes = pIn->bpp >> 3; UINT_32 mipOffsetInSlice = 0; UINT_32 mipSize = 0; UINT_32 mipId = 0; for (; mipId < pIn->numMipLevels ; mipId++) { if (IsTex1d(pIn->resourceType)) { mipSize = localOut.pitch * elementBytes; } else { UINT_32 currentMipHeight = (PowTwoAlign(localIn.height, (1 << mipId))) >> mipId; mipSize = currentMipHeight * localOut.pitch * elementBytes; } if (mipSize == 0) { valid = FALSE; break; } else if ((mipSize + mipOffsetInSlice) > offsetInSlice) { break; } else { mipOffsetInSlice += mipSize; if ((mipId == (pIn->numMipLevels - 1)) || (mipOffsetInSlice >= localOut.sliceSize)) { valid = FALSE; } } } if (valid) { pOut->mipId = mipId; UINT_32 elemOffsetInMip = (offsetInSlice - mipOffsetInSlice) / elementBytes; if (IsTex1d(pIn->resourceType)) { if (elemOffsetInMip < localOut.pitch) { pOut->x = elemOffsetInMip; pOut->y = 0; } else { valid = FALSE; } } else { pOut->y = elemOffsetInMip / localOut.pitch; pOut->x = elemOffsetInMip % localOut.pitch; } if ((pOut->slice >= pIn->numSlices) || (pOut->mipId >= pIn->numMipLevels) || (pOut->x >= Max((pIn->unalignedWidth >> pOut->mipId), 1u)) || (pOut->y >= Max((pIn->unalignedHeight >> pOut->mipId), 1u)) || (IsTex3d(pIn->resourceType) && (FALSE == Valid3DMipSliceIdConstraint(pIn->numSlices, pOut->mipId, pOut->slice)))) { valid = FALSE; } } } else { valid = FALSE; } } if (valid == FALSE) { returnCode = ADDR_INVALIDPARAMS; } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeSurfaceCoordFromAddrTiled * * @brief * Internal function to calculate coord from address for tiled swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeSurfaceCoordFromAddrTiled( const ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_NOTIMPLEMENTED; ADDR_NOT_IMPLEMENTED(); return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeBlockDimensionForSurf * * @brief * Internal function to get block width/height/depth in element from surface input params. * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeBlockDimensionForSurf( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, UINT_32 numSamples, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (IsThick(resourceType, swizzleMode)) { ComputeThickBlockDimension(pWidth, pHeight, pDepth, bpp, resourceType, swizzleMode); } else if (IsThin(resourceType, swizzleMode)) { ComputeThinBlockDimension(pWidth, pHeight, pDepth, bpp, numSamples, resourceType, swizzleMode); } else { ADDR_ASSERT_ALWAYS(); returnCode = ADDR_INVALIDPARAMS; } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeThinBlockDimension * * @brief * Internal function to get thin block width/height/depth in element from surface input params. * * @return * N/A ************************************************************************************************************************ */ VOID Lib::ComputeThinBlockDimension( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, UINT_32 numSamples, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_ASSERT(IsThin(resourceType, swizzleMode)); // GFX9/GFX10 use different dimension amplifying logic: say for 128KB block + 1xAA + 1BPE, the dimension of thin // swizzle mode will be [256W * 512H] on GFX9 ASICs and [512W * 256H] on GFX10 ASICs. Since GFX10 is newer HWL so we // make its implementation into base class (in order to save future change on new HWLs) const UINT_32 log2BlkSize = GetBlockSizeLog2(swizzleMode); const UINT_32 log2EleBytes = Log2(bpp >> 3); const UINT_32 log2Samples = Log2(Max(numSamples, 1u)); const UINT_32 log2NumEle = log2BlkSize - log2EleBytes - log2Samples; // For "1xAA/4xAA cases" or "2xAA/8xAA + odd log2BlkSize cases", width == height or width == 2 * height; // For other cases, height == width or height == 2 * width const BOOL_32 widthPrecedent = ((log2Samples & 1) == 0) || ((log2BlkSize & 1) != 0); const UINT_32 log2Width = (log2NumEle + (widthPrecedent ? 1 : 0)) / 2; *pWidth = 1u << log2Width; *pHeight = 1u << (log2NumEle - log2Width); *pDepth = 1; } /** ************************************************************************************************************************ * Lib::ComputeBlockDimension * * @brief * Internal function to get block width/height/depth in element without considering MSAA case * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeBlockDimension( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (IsThick(resourceType, swizzleMode)) { ComputeThickBlockDimension(pWidth, pHeight, pDepth, bpp, resourceType, swizzleMode); } else if (IsThin(resourceType, swizzleMode)) { ComputeThinBlockDimension(pWidth, pHeight, pDepth, bpp, 0, resourceType, swizzleMode); } else { ADDR_ASSERT_ALWAYS(); returnCode = ADDR_INVALIDPARAMS; } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeThickBlockDimension * * @brief * Internal function to get block width/height/depth in element for thick swizzle mode * * @return * N/A ************************************************************************************************************************ */ VOID Lib::ComputeThickBlockDimension( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_ASSERT(IsThick(resourceType, swizzleMode)); const UINT_32 log2BlkSize = GetBlockSizeLog2(swizzleMode); const UINT_32 eleBytes = bpp >> 3; const UINT_32 microBlockSizeTableIndex = Log2(eleBytes); ADDR_ASSERT(microBlockSizeTableIndex < sizeof(Block1K_3d) / sizeof(Block1K_3d[0])); const UINT_32 log2blkSizeIn1KB = log2BlkSize - 10; const UINT_32 averageAmp = log2blkSizeIn1KB / 3; const UINT_32 restAmp = log2blkSizeIn1KB % 3; *pWidth = Block1K_3d[microBlockSizeTableIndex].w << averageAmp; *pHeight = Block1K_3d[microBlockSizeTableIndex].h << (averageAmp + (restAmp / 2)); *pDepth = Block1K_3d[microBlockSizeTableIndex].d << (averageAmp + ((restAmp != 0) ? 1 : 0)); } /** ************************************************************************************************************************ * Lib::GetMipTailDim * * @brief * Internal function to get out max dimension of first level in mip tail * * @return * Max Width/Height/Depth value of the first mip fitted in mip tail ************************************************************************************************************************ */ Dim3d Lib::GetMipTailDim( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 blockWidth, UINT_32 blockHeight, UINT_32 blockDepth) const { Dim3d out = {blockWidth, blockHeight, blockDepth}; UINT_32 log2BlkSize = GetBlockSizeLog2(swizzleMode); if (IsThick(resourceType, swizzleMode)) { UINT_32 dim = log2BlkSize % 3; if (dim == 0) { out.h >>= 1; } else if (dim == 1) { out.w >>= 1; } else { out.d >>= 1; } } else { ADDR_ASSERT(IsThin(resourceType, swizzleMode)); // GFX9/GFX10 use different dimension shrinking logic for mipmap tail: say for 128KB block + 2BPE, the maximum // dimension of mipmap tail level will be [256W * 128H] on GFX9 ASICs and [128W * 256H] on GFX10 ASICs. Since // GFX10 is newer HWL so we make its implementation into base class, in order to save future change on new HWLs. // And assert log2BlkSize will always be an even value on GFX9, so we never need the logic wrapped by DEBUG... #if DEBUG if ((log2BlkSize & 1) && (m_chipFamily == ADDR_CHIP_FAMILY_AI)) { // Should never go here... ADDR_ASSERT_ALWAYS(); out.h >>= 1; } else #endif { out.w >>= 1; } } return out; } /** ************************************************************************************************************************ * Lib::ComputeSurface2DMicroBlockOffset * * @brief * Internal function to calculate micro block (256B) offset from coord for 2D resource * * @return * micro block (256B) offset for 2D resource ************************************************************************************************************************ */ UINT_32 Lib::ComputeSurface2DMicroBlockOffset( const _ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn) const { ADDR_ASSERT(IsThin(pIn->resourceType, pIn->swizzleMode)); UINT_32 log2ElementBytes = Log2(pIn->bpp >> 3); UINT_32 microBlockOffset = 0; if (IsStandardSwizzle(pIn->resourceType, pIn->swizzleMode)) { UINT_32 xBits = pIn->x << log2ElementBytes; microBlockOffset = (xBits & 0xf) | ((pIn->y & 0x3) << 4); if (log2ElementBytes < 3) { microBlockOffset |= (pIn->y & 0x4) << 4; if (log2ElementBytes == 0) { microBlockOffset |= (pIn->y & 0x8) << 4; } else { microBlockOffset |= (xBits & 0x10) << 3; } } else { microBlockOffset |= (xBits & 0x30) << 2; } } else if (IsDisplaySwizzle(pIn->resourceType, pIn->swizzleMode)) { if (log2ElementBytes == 4) { microBlockOffset = (GetBit(pIn->x, 0) << 4) | (GetBit(pIn->y, 0) << 5) | (GetBit(pIn->x, 1) << 6) | (GetBit(pIn->y, 1) << 7); } else { microBlockOffset = GetBits(pIn->x, 0, 3, log2ElementBytes) | GetBits(pIn->y, 1, 2, 3 + log2ElementBytes) | GetBits(pIn->x, 3, 1, 5 + log2ElementBytes) | GetBits(pIn->y, 3, 1, 6 + log2ElementBytes); microBlockOffset = GetBits(microBlockOffset, 0, 4, 0) | (GetBit(pIn->y, 0) << 4) | GetBits(microBlockOffset, 4, 3, 5); } } else if (IsRotateSwizzle(pIn->swizzleMode)) { microBlockOffset = GetBits(pIn->y, 0, 3, log2ElementBytes) | GetBits(pIn->x, 1, 2, 3 + log2ElementBytes) | GetBits(pIn->x, 3, 1, 5 + log2ElementBytes) | GetBits(pIn->y, 3, 1, 6 + log2ElementBytes); microBlockOffset = GetBits(microBlockOffset, 0, 4, 0) | (GetBit(pIn->x, 0) << 4) | GetBits(microBlockOffset, 4, 3, 5); if (log2ElementBytes == 3) { microBlockOffset = GetBits(microBlockOffset, 0, 6, 0) | GetBits(pIn->x, 1, 2, 6); } } return microBlockOffset; } /** ************************************************************************************************************************ * Lib::ComputeSurface3DMicroBlockOffset * * @brief * Internal function to calculate micro block (1KB) offset from coord for 3D resource * * @return * micro block (1KB) offset for 3D resource ************************************************************************************************************************ */ UINT_32 Lib::ComputeSurface3DMicroBlockOffset( const _ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn) const { ADDR_ASSERT(IsThick(pIn->resourceType, pIn->swizzleMode)); UINT_32 log2ElementBytes = Log2(pIn->bpp >> 3); UINT_32 microBlockOffset = 0; if (IsStandardSwizzle(pIn->resourceType, pIn->swizzleMode)) { if (log2ElementBytes == 0) { microBlockOffset = ((pIn->slice & 4) >> 2) | ((pIn->y & 4) >> 1); } else if (log2ElementBytes == 1) { microBlockOffset = ((pIn->slice & 4) >> 2) | ((pIn->y & 4) >> 1); } else if (log2ElementBytes == 2) { microBlockOffset = ((pIn->y & 4) >> 2) | ((pIn->x & 4) >> 1); } else if (log2ElementBytes == 3) { microBlockOffset = (pIn->x & 6) >> 1; } else { microBlockOffset = pIn->x & 3; } microBlockOffset <<= 8; UINT_32 xBits = pIn->x << log2ElementBytes; microBlockOffset |= (xBits & 0xf) | ((pIn->y & 0x3) << 4) | ((pIn->slice & 0x3) << 6); } else if (IsZOrderSwizzle(pIn->swizzleMode)) { UINT_32 xh, yh, zh; if (log2ElementBytes == 0) { microBlockOffset = (pIn->x & 1) | ((pIn->y & 1) << 1) | ((pIn->x & 2) << 1) | ((pIn->y & 2) << 2); microBlockOffset = microBlockOffset | ((pIn->slice & 3) << 4) | ((pIn->x & 4) << 4); xh = pIn->x >> 3; yh = pIn->y >> 2; zh = pIn->slice >> 2; } else if (log2ElementBytes == 1) { microBlockOffset = (pIn->x & 1) | ((pIn->y & 1) << 1) | ((pIn->x & 2) << 1) | ((pIn->y & 2) << 2); microBlockOffset = (microBlockOffset << 1) | ((pIn->slice & 3) << 5); xh = pIn->x >> 2; yh = pIn->y >> 2; zh = pIn->slice >> 2; } else if (log2ElementBytes == 2) { microBlockOffset = (pIn->x & 1) | ((pIn->y & 1) << 1) | ((pIn->x & 2) << 1) | ((pIn->slice & 1) << 3); microBlockOffset = (microBlockOffset << 2) | ((pIn->y & 2) << 5); xh = pIn->x >> 2; yh = pIn->y >> 2; zh = pIn->slice >> 1; } else if (log2ElementBytes == 3) { microBlockOffset = (pIn->x & 1) | ((pIn->y & 1) << 1) | ((pIn->slice & 1) << 2) | ((pIn->x & 2) << 2); microBlockOffset <<= 3; xh = pIn->x >> 2; yh = pIn->y >> 1; zh = pIn->slice >> 1; } else { microBlockOffset = (((pIn->x & 1) | ((pIn->y & 1) << 1) | ((pIn->slice & 1) << 2)) << 4); xh = pIn->x >> 1; yh = pIn->y >> 1; zh = pIn->slice >> 1; } microBlockOffset |= ((MortonGen3d(xh, yh, zh, 1) << 7) & 0x380); } return microBlockOffset; } /** ************************************************************************************************************************ * Lib::GetPipeXorBits * * @brief * Internal function to get bits number for pipe/se xor operation * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ UINT_32 Lib::GetPipeXorBits( UINT_32 macroBlockBits) const { ADDR_ASSERT(macroBlockBits >= m_pipeInterleaveLog2); // Total available xor bits UINT_32 xorBits = macroBlockBits - m_pipeInterleaveLog2; // Pipe/Se xor bits UINT_32 pipeBits = Min(xorBits, m_pipesLog2 + m_seLog2); return pipeBits; } /** ************************************************************************************************************************ * Lib::Addr2GetPreferredSurfaceSetting * * @brief * Internal function to get suggested surface information for cliet to use * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::Addr2GetPreferredSurfaceSetting( const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut) const { ADDR_E_RETURNCODE returnCode; if ((GetFillSizeFieldsFlags() == TRUE) && ((pIn->size != sizeof(ADDR2_GET_PREFERRED_SURF_SETTING_INPUT)) || (pOut->size != sizeof(ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT)))) { returnCode = ADDR_INVALIDPARAMS; } else { returnCode = HwlGetPreferredSurfaceSetting(pIn, pOut); } return returnCode; } /** ************************************************************************************************************************ * Lib::ComputeBlock256Equation * * @brief * Compute equation for block 256B * * @return * If equation computed successfully * ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeBlock256Equation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_E_RETURNCODE ret; if (IsBlock256b(swMode)) { ret = HwlComputeBlock256Equation(rsrcType, swMode, elementBytesLog2, pEquation); } else { ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; } return ret; } /** ************************************************************************************************************************ * Lib::ComputeThinEquation * * @brief * Compute equation for 2D/3D resource which use THIN mode * * @return * If equation computed successfully * ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeThinEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_E_RETURNCODE ret; if (IsThin(rsrcType, swMode)) { ret = HwlComputeThinEquation(rsrcType, swMode, elementBytesLog2, pEquation); } else { ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; } return ret; } /** ************************************************************************************************************************ * Lib::ComputeThickEquation * * @brief * Compute equation for 3D resource which use THICK mode * * @return * If equation computed successfully * ************************************************************************************************************************ */ ADDR_E_RETURNCODE Lib::ComputeThickEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_E_RETURNCODE ret; if (IsThick(rsrcType, swMode)) { ret = HwlComputeThickEquation(rsrcType, swMode, elementBytesLog2, pEquation); } else { ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; } return ret; } /** ************************************************************************************************************************ * Lib::ComputeQbStereoInfo * * @brief * Get quad buffer stereo information * @return * N/A ************************************************************************************************************************ */ VOID Lib::ComputeQbStereoInfo( ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [in,out] updated pOut+pStereoInfo ) const { ADDR_ASSERT(pOut->bpp >= 8); ADDR_ASSERT((pOut->surfSize % pOut->baseAlign) == 0); // Save original height pOut->pStereoInfo->eyeHeight = pOut->height; // Right offset pOut->pStereoInfo->rightOffset = static_cast(pOut->surfSize); // Double height pOut->height <<= 1; ADDR_ASSERT(pOut->height <= MaxSurfaceHeight); pOut->pixelHeight <<= 1; // Double size pOut->surfSize <<= 1; pOut->sliceSize <<= 1; } /** ************************************************************************************************************************ * Lib::FilterInvalidEqSwizzleMode * * @brief * Filter out swizzle mode(s) if it doesn't have valid equation index * * @return * N/A ************************************************************************************************************************ */ VOID Lib::FilterInvalidEqSwizzleMode( ADDR2_SWMODE_SET& allowedSwModeSet, AddrResourceType resourceType, UINT_32 elemLog2 ) const { if (resourceType != ADDR_RSRC_TEX_1D) { UINT_32 allowedSwModeSetVal = allowedSwModeSet.value; const UINT_32 rsrcTypeIdx = static_cast(resourceType) - 1; UINT_32 validSwModeSet = allowedSwModeSetVal; for (UINT_32 swModeIdx = 0; validSwModeSet != 0; swModeIdx++) { if (validSwModeSet & 1) { if (m_equationLookupTable[rsrcTypeIdx][swModeIdx][elemLog2] == ADDR_INVALID_EQUATION_INDEX) { allowedSwModeSetVal &= ~(1u << swModeIdx); } } validSwModeSet >>= 1; } // Only apply the filtering if at least one valid swizzle mode remains if (allowedSwModeSetVal != 0) { allowedSwModeSet.value = allowedSwModeSetVal; } } } } // V2 } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrlib2.h000066400000000000000000000665011420110115200234600ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** ************************************************************************************************************************ * @file addrlib2.h * @brief Contains the Addr::V2::Lib class definition. ************************************************************************************************************************ */ #ifndef __ADDR2_LIB2_H__ #define __ADDR2_LIB2_H__ #include "addrlib.h" namespace rocr { namespace Addr { namespace V2 { /** ************************************************************************************************************************ * @brief Flags for SwizzleModeTable ************************************************************************************************************************ */ struct SwizzleModeFlags { // Swizzle mode UINT_32 isLinear : 1; // Linear // Block size UINT_32 is256b : 1; // Block size is 256B UINT_32 is4kb : 1; // Block size is 4KB UINT_32 is64kb : 1; // Block size is 64KB UINT_32 isVar : 1; // Block size is variable UINT_32 isZ : 1; // Z order swizzle mode UINT_32 isStd : 1; // Standard swizzle mode UINT_32 isDisp : 1; // Display swizzle mode UINT_32 isRot : 1; // Rotate swizzle mode // XOR mode UINT_32 isXor : 1; // XOR after swizzle if set UINT_32 isT : 1; // T mode UINT_32 isRtOpt : 1; // mode opt for render target UINT_32 reserved : 20; // Reserved bits }; struct Dim2d { UINT_32 w; UINT_32 h; }; struct Dim3d { UINT_32 w; UINT_32 h; UINT_32 d; }; // Macro define resource block type enum AddrBlockType { AddrBlockMicro = 0, // Resource uses 256B block AddrBlockThin4KB = 1, // Resource uses thin 4KB block AddrBlockThick4KB = 2, // Resource uses thick 4KB block AddrBlockThin64KB = 3, // Resource uses thin 64KB block AddrBlockThick64KB = 4, // Resource uses thick 64KB block AddrBlockVar = 5, // Resource uses var block, only valid for GFX9 AddrBlockLinear = 6, // Resource uses linear swizzle mode AddrBlockMaxTiledType = AddrBlockVar + 1, }; enum AddrSwSet { AddrSwSetZ = 1 << ADDR_SW_Z, AddrSwSetS = 1 << ADDR_SW_S, AddrSwSetD = 1 << ADDR_SW_D, AddrSwSetR = 1 << ADDR_SW_R, AddrSwSetAll = AddrSwSetZ | AddrSwSetS | AddrSwSetD | AddrSwSetR, }; const UINT_32 Size256 = 256u; const UINT_32 Size4K = 4096u; const UINT_32 Size64K = 65536u; const UINT_32 Log2Size256 = 8u; const UINT_32 Log2Size4K = 12u; const UINT_32 Log2Size64K = 16u; /** ************************************************************************************************************************ * @brief This class contains asic independent address lib functionalities ************************************************************************************************************************ */ class Lib : public Addr::Lib { public: virtual ~Lib(); static Lib* GetLib( ADDR_HANDLE hLib); // // Interface stubs // // For data surface ADDR_E_RETURNCODE ComputeSurfaceInfo( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceAddrFromCoord( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceCoordFromAddr( const ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) const; // For HTile ADDR_E_RETURNCODE ComputeHtileInfo( const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeHtileAddrFromCoord( const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut); ADDR_E_RETURNCODE ComputeHtileCoordFromAddr( const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut); // For CMask ADDR_E_RETURNCODE ComputeCmaskInfo( const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeCmaskAddrFromCoord( const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut); ADDR_E_RETURNCODE ComputeCmaskCoordFromAddr( const ADDR2_COMPUTE_CMASK_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_CMASK_COORDFROMADDR_OUTPUT* pOut) const; // For FMask ADDR_E_RETURNCODE ComputeFmaskInfo( const ADDR2_COMPUTE_FMASK_INFO_INPUT* pIn, ADDR2_COMPUTE_FMASK_INFO_OUTPUT* pOut); ADDR_E_RETURNCODE ComputeFmaskAddrFromCoord( const ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeFmaskCoordFromAddr( const ADDR2_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut) const; // For DCC key ADDR_E_RETURNCODE ComputeDccInfo( const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeDccAddrFromCoord( const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut); // Misc ADDR_E_RETURNCODE ComputePipeBankXor( const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut); ADDR_E_RETURNCODE ComputeSlicePipeBankXor( const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut); ADDR_E_RETURNCODE ComputeSubResourceOffsetForSwizzlePattern( const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut); ADDR_E_RETURNCODE Addr2GetPreferredSurfaceSetting( const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut) const; virtual BOOL_32 IsValidDisplaySwizzleMode( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTIMPLEMENTED; } protected: Lib(); // Constructor is protected Lib(const Client* pClient); static const UINT_32 MaxNumOfBpp = 5; static const UINT_32 MaxNumOfAA = 4; static const Dim2d Block256_2d[MaxNumOfBpp]; static const Dim3d Block1K_3d[MaxNumOfBpp]; static const UINT_32 PrtAlignment = 64 * 1024; static const UINT_32 MaxMacroBits = 20; static const UINT_32 MaxMipLevels = 16; BOOL_32 IsValidSwMode(AddrSwizzleMode swizzleMode) const { // Don't dereference a reinterpret_cast pointer so as not to break // strict-aliasing rules. UINT_32 mode; memcpy(&mode, &m_swizzleModeTable[swizzleMode], sizeof(UINT_32)); return mode != 0; } // Checking block size BOOL_32 IsBlock256b(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].is256b; } BOOL_32 IsBlock4kb(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].is4kb; } BOOL_32 IsBlock64kb(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].is64kb; } BOOL_32 IsBlockVariable(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isVar; } // Checking swizzle mode BOOL_32 IsLinear(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isLinear; } BOOL_32 IsRtOptSwizzle(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isRtOpt; } BOOL_32 IsZOrderSwizzle(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isZ; } BOOL_32 IsStandardSwizzle(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isStd; } BOOL_32 IsDisplaySwizzle(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isDisp; } BOOL_32 IsRotateSwizzle(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isRot; } BOOL_32 IsStandardSwizzle(AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return HwlIsStandardSwizzle(resourceType, swizzleMode); } BOOL_32 IsDisplaySwizzle(AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return HwlIsDisplaySwizzle(resourceType, swizzleMode); } BOOL_32 IsXor(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isXor; } BOOL_32 IsPrt(AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isT; } BOOL_32 IsNonPrtXor(AddrSwizzleMode swizzleMode) const { return (IsXor(swizzleMode) && (IsPrt(swizzleMode) == FALSE)); } // Checking resource type static BOOL_32 IsTex1d(AddrResourceType resourceType) { return (resourceType == ADDR_RSRC_TEX_1D); } static BOOL_32 IsTex2d(AddrResourceType resourceType) { return (resourceType == ADDR_RSRC_TEX_2D); } static BOOL_32 IsTex3d(AddrResourceType resourceType) { return (resourceType == ADDR_RSRC_TEX_3D); } BOOL_32 IsThick(AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return HwlIsThick(resourceType, swizzleMode); } BOOL_32 IsThin(AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return HwlIsThin(resourceType, swizzleMode); } UINT_32 GetBlockSizeLog2(AddrSwizzleMode swizzleMode) const { UINT_32 blockSizeLog2 = 0; if (IsBlock256b(swizzleMode) || IsLinear(swizzleMode)) { blockSizeLog2 = 8; } else if (IsBlock4kb(swizzleMode)) { blockSizeLog2 = 12; } else if (IsBlock64kb(swizzleMode)) { blockSizeLog2 = 16; } else if (IsBlockVariable(swizzleMode) && (m_blockVarSizeLog2 != 0)) { blockSizeLog2 = m_blockVarSizeLog2; } else { ADDR_ASSERT_ALWAYS(); } return blockSizeLog2; } UINT_32 GetBlockSize(AddrSwizzleMode swizzleMode) const { return (1 << GetBlockSizeLog2(swizzleMode)); } static UINT_32 GetFmaskBpp(UINT_32 sample, UINT_32 frag) { sample = (sample == 0) ? 1 : sample; frag = (frag == 0) ? sample : frag; UINT_32 fmaskBpp = QLog2(frag); if (sample > frag) { fmaskBpp++; } if (fmaskBpp == 3) { fmaskBpp = 4; } fmaskBpp = Max(8u, fmaskBpp * sample); return fmaskBpp; } virtual BOOL_32 HwlIsStandardSwizzle( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_NOT_IMPLEMENTED(); return FALSE; } virtual BOOL_32 HwlIsDisplaySwizzle( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_NOT_IMPLEMENTED(); return FALSE; } virtual BOOL_32 HwlIsThin( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_NOT_IMPLEMENTED(); return FALSE; } virtual BOOL_32 HwlIsThick( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_NOT_IMPLEMENTED(); return FALSE; } virtual ADDR_E_RETURNCODE HwlComputeHtileInfo( const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeCmaskInfo( const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeDccInfo( const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeDccAddrFromCoord( const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut) { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeCmaskAddrFromCoord( const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeHtileAddrFromCoord( const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeHtileCoordFromAddr( const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut) { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeBlock256Equation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeThinEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeThickEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual UINT_32 HwlGetEquationIndex( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_INVALID_EQUATION_INDEX; } UINT_32 GetEquationIndex( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const { return HwlGetEquationIndex(pIn, pOut); } virtual ADDR_E_RETURNCODE HwlComputePipeBankXor( const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeSlicePipeBankXor( const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeSubResourceOffsetForSwizzlePattern( const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlGetPreferredSurfaceSetting( const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoSanityCheck( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTSUPPORTED; } virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTIMPLEMENTED; } virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoLinear( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTIMPLEMENTED; } virtual ADDR_E_RETURNCODE HwlComputeSurfaceAddrFromCoordTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const { ADDR_NOT_IMPLEMENTED(); return ADDR_NOTIMPLEMENTED; } ADDR_E_RETURNCODE ComputeBlock256Equation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const; ADDR_E_RETURNCODE ComputeThinEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const; ADDR_E_RETURNCODE ComputeThickEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const; ADDR_E_RETURNCODE ComputeSurfaceInfoSanityCheck( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; ADDR_E_RETURNCODE ComputeSurfaceInfoLinear( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceInfoTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceAddrFromCoordLinear( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceAddrFromCoordTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceCoordFromAddrLinear( const ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceCoordFromAddrTiled( const ADDR2_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) const; UINT_32 ComputeSurface2DMicroBlockOffset( const _ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn) const; UINT_32 ComputeSurface3DMicroBlockOffset( const _ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn) const; // Misc ADDR_E_RETURNCODE ComputeBlockDimensionForSurf( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, UINT_32 numSamples, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const; ADDR_E_RETURNCODE ComputeBlockDimension( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const; virtual VOID ComputeThinBlockDimension( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, UINT_32 numSamples, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const; VOID ComputeThickBlockDimension( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const; static UINT_64 ComputePadSize( const Dim3d* pBlkDim, UINT_32 width, UINT_32 height, UINT_32 numSlices, Dim3d* pPadDim) { pPadDim->w = PowTwoAlign(width ,pBlkDim->w); pPadDim->h = PowTwoAlign(height ,pBlkDim->h); pPadDim->d = PowTwoAlign(numSlices, pBlkDim->d); return static_cast(pPadDim->w) * pPadDim->h * pPadDim->d; } static ADDR_E_RETURNCODE ExtractPipeBankXor( UINT_32 pipeBankXor, UINT_32 bankBits, UINT_32 pipeBits, UINT_32* pBankX, UINT_32* pPipeX); static BOOL_32 Valid3DMipSliceIdConstraint( UINT_32 numSlices, UINT_32 mipId, UINT_32 slice) { return (Max((numSlices >> mipId), 1u) > slice); } Dim3d GetMipTailDim( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 blockWidth, UINT_32 blockHeight, UINT_32 blockDepth) const; static BOOL_32 IsLocalHeap(AddrResrouceLocation resourceType) { return ((resourceType == ADDR_RSRC_LOC_LOCAL) || (resourceType == ADDR_RSRC_LOC_INVIS)); } static BOOL_32 IsInvisibleHeap(AddrResrouceLocation resourceType) { return (resourceType == ADDR_RSRC_LOC_INVIS); } static BOOL_32 IsNonlocalHeap(AddrResrouceLocation resourceType) { return ((resourceType == ADDR_RSRC_LOC_USWC) || (resourceType == ADDR_RSRC_LOC_CACHED)); } UINT_32 GetPipeLog2ForMetaAddressing(BOOL_32 pipeAligned, AddrSwizzleMode swizzleMode) const { UINT_32 numPipeLog2 = pipeAligned ? Min(m_pipesLog2 + m_seLog2, 5u) : 0; if (IsXor(swizzleMode)) { UINT_32 maxPipeLog2 = GetBlockSizeLog2(swizzleMode) - m_pipeInterleaveLog2; numPipeLog2 = Min(numPipeLog2, maxPipeLog2); } return numPipeLog2; } UINT_32 GetPipeNumForMetaAddressing(BOOL_32 pipeAligned, AddrSwizzleMode swizzleMode) const { return (1 << GetPipeLog2ForMetaAddressing(pipeAligned, swizzleMode)); } VOID VerifyMipLevelInfo(const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { #if DEBUG if (pIn->numMipLevels > 1) { UINT_32 actualMipLevels = 1; switch (pIn->resourceType) { case ADDR_RSRC_TEX_3D: // Fall through to share 2D case actualMipLevels = Max(actualMipLevels, Log2NonPow2(pIn->numSlices) + 1); case ADDR_RSRC_TEX_2D: // Fall through to share 1D case actualMipLevels = Max(actualMipLevels, Log2NonPow2(pIn->height) + 1); case ADDR_RSRC_TEX_1D: // Base 1D case actualMipLevels = Max(actualMipLevels, Log2NonPow2(pIn->width) + 1); break; default: ADDR_ASSERT_ALWAYS(); break; } // Client pass wrong number of MipLevels to addrlib and result will be bad. // Not sure if we should fail this calling instead of putting an assertion here. ADDR_ASSERT(actualMipLevels >= pIn->numMipLevels); } #endif } ADDR_E_RETURNCODE ApplyCustomerPipeBankXor( AddrSwizzleMode swizzleMode, UINT_32 pipeBankXor, UINT_32 bankBits, UINT_32 pipeBits, UINT_32* pBlockOffset) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (IsXor(swizzleMode)) { // Apply driver set bankPipeXor UINT_32 bankX = 0; UINT_32 pipeX = 0; returnCode = ExtractPipeBankXor(pipeBankXor, bankBits, pipeBits, &bankX, &pipeX); *pBlockOffset ^= (pipeX << m_pipeInterleaveLog2); *pBlockOffset ^= (bankX << (m_pipeInterleaveLog2 + pipeBits)); } return returnCode; } UINT_32 GetPipeXorBits(UINT_32 macroBlockBits) const; ADDR_E_RETURNCODE ApplyCustomizedPitchHeight( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32 elementBytes, UINT_32 pitchAlignInElement, UINT_32* pPitch, UINT_32* pHeight) const; VOID ComputeQbStereoInfo(ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; VOID FilterInvalidEqSwizzleMode( ADDR2_SWMODE_SET& allowedSwModeSet, AddrResourceType resourceType, UINT_32 elemLog2) const; UINT_32 m_se; ///< Number of shader engine UINT_32 m_rbPerSe; ///< Number of render backend per shader engine UINT_32 m_maxCompFrag; ///< Number of max compressed fragment UINT_32 m_banksLog2; ///< Number of bank Log2 UINT_32 m_pipesLog2; ///< Number of pipe per shader engine Log2 UINT_32 m_seLog2; ///< Number of shader engine Log2 UINT_32 m_rbPerSeLog2; ///< Number of render backend per shader engine Log2 UINT_32 m_maxCompFragLog2; ///< Number of max compressed fragment Log2 UINT_32 m_pipeInterleaveLog2; ///< Log2 of pipe interleave bytes UINT_32 m_blockVarSizeLog2; ///< Log2 of block var size SwizzleModeFlags m_swizzleModeTable[ADDR_SW_MAX_TYPE]; ///< Swizzle mode table // Max number of swizzle mode supported for equation static const UINT_32 MaxSwModeType = 32; // Max number of resource type (2D/3D) supported for equation static const UINT_32 MaxRsrcType = 2; // Max number of bpp (8bpp/16bpp/32bpp/64bpp/128bpp) static const UINT_32 MaxElementBytesLog2 = 5; // Almost all swizzle mode + resource type support equation static const UINT_32 EquationTableSize = MaxElementBytesLog2 * MaxSwModeType * MaxRsrcType; // Equation table ADDR_EQUATION m_equationTable[EquationTableSize]; // Number of equation entries in the table UINT_32 m_numEquations; // Equation lookup table according to bpp and tile index UINT_32 m_equationLookupTable[MaxRsrcType][MaxSwModeType][MaxElementBytesLog2]; private: // Disallow the copy constructor Lib(const Lib& a); // Disallow the assignment operator Lib& operator=(const Lib& a); }; } // V2 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrobject.cpp000066400000000000000000000160001420110115200244160ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrobject.cpp * @brief Contains the Object base class implementation. **************************************************************************************************** */ #include "addrinterface.h" #include "addrobject.h" namespace rocr { namespace Addr { /** **************************************************************************************************** * Object::Object * * @brief * Constructor for the Object class. **************************************************************************************************** */ Object::Object() { m_client.handle = NULL; m_client.callbacks.allocSysMem = NULL; m_client.callbacks.freeSysMem = NULL; m_client.callbacks.debugPrint = NULL; } /** **************************************************************************************************** * Object::Object * * @brief * Constructor for the Object class. **************************************************************************************************** */ Object::Object(const Client* pClient) { m_client = *pClient; } /** **************************************************************************************************** * Object::~Object * * @brief * Destructor for the Object class. **************************************************************************************************** */ Object::~Object() { } /** **************************************************************************************************** * Object::ClientAlloc * * @brief * Calls instanced allocSysMem inside Client **************************************************************************************************** */ VOID* Object::ClientAlloc( size_t objSize, ///< [in] Size to allocate const Client* pClient) ///< [in] Client pointer { VOID* pObjMem = NULL; if (pClient->callbacks.allocSysMem != NULL) { ADDR_ALLOCSYSMEM_INPUT allocInput = {0}; allocInput.size = sizeof(ADDR_ALLOCSYSMEM_INPUT); allocInput.flags.value = 0; allocInput.sizeInBytes = static_cast(objSize); allocInput.hClient = pClient->handle; pObjMem = pClient->callbacks.allocSysMem(&allocInput); } return pObjMem; } /** **************************************************************************************************** * Object::Alloc * * @brief * A wrapper of ClientAlloc **************************************************************************************************** */ VOID* Object::Alloc( size_t objSize ///< [in] Size to allocate ) const { return ClientAlloc(objSize, &m_client);; } /** **************************************************************************************************** * Object::ClientFree * * @brief * Calls freeSysMem inside Client **************************************************************************************************** */ VOID Object::ClientFree( VOID* pObjMem, ///< [in] User virtual address to free. const Client* pClient) ///< [in] Client pointer { if (pClient->callbacks.freeSysMem != NULL) { if (pObjMem != NULL) { ADDR_FREESYSMEM_INPUT freeInput = {0}; freeInput.size = sizeof(ADDR_FREESYSMEM_INPUT); freeInput.hClient = pClient->handle; freeInput.pVirtAddr = pObjMem; pClient->callbacks.freeSysMem(&freeInput); } } } /** **************************************************************************************************** * Object::Free * * @brief * A wrapper of ClientFree **************************************************************************************************** */ VOID Object::Free( VOID* pObjMem ///< [in] User virtual address to free. ) const { ClientFree(pObjMem, &m_client); } /** **************************************************************************************************** * Object::operator new * * @brief * Placement new operator. (with pre-allocated memory pointer) * * @return * Returns pre-allocated memory pointer. **************************************************************************************************** */ VOID* Object::operator new( size_t objSize, ///< [in] Size to allocate VOID* pMem) ///< [in] Pre-allocated pointer { return pMem; } /** **************************************************************************************************** * Object::operator delete * * @brief * Frees Object object memory. **************************************************************************************************** */ VOID Object::operator delete( VOID* pObjMem) ///< [in] User virtual address to free. { Object* pObj = static_cast(pObjMem); ClientFree(pObjMem, &pObj->m_client); } /** **************************************************************************************************** * Object::DebugPrint * * @brief * Print debug message * * @return * N/A **************************************************************************************************** */ VOID Object::DebugPrint( const CHAR* pDebugString, ///< [in] Debug string ... ) const { #if DEBUG if (m_client.callbacks.debugPrint != NULL) { va_list ap; va_start(ap, pDebugString); ADDR_DEBUGPRINT_INPUT debugPrintInput = {0}; debugPrintInput.size = sizeof(ADDR_DEBUGPRINT_INPUT); debugPrintInput.pDebugString = const_cast(pDebugString); debugPrintInput.hClient = m_client.handle; va_copy(debugPrintInput.ap, ap); m_client.callbacks.debugPrint(&debugPrintInput); va_end(ap); } #endif } } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/addrobject.h000066400000000000000000000063541420110115200240760ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file addrobject.h * @brief Contains the Object base class definition. **************************************************************************************************** */ #ifndef __ADDR_OBJECT_H__ #define __ADDR_OBJECT_H__ #include "addrtypes.h" #include "addrcommon.h" namespace rocr { namespace Addr { /** **************************************************************************************************** * @brief This structure contains client specific data **************************************************************************************************** */ struct Client { ADDR_CLIENT_HANDLE handle; ADDR_CALLBACKS callbacks; }; /** **************************************************************************************************** * @brief This class is the base class for all ADDR class objects. **************************************************************************************************** */ class Object { public: Object(); Object(const Client* pClient); virtual ~Object(); VOID* operator new(size_t size, VOID* pMem); VOID operator delete(VOID* pObj); /// Microsoft compiler requires a matching delete implementation, which seems to be called when /// bad_alloc is thrown. But currently C++ exception isn't allowed so a dummy implementation is /// added to eliminate the warning. VOID operator delete(VOID* pObj, VOID* pMem) { ADDR_ASSERT_ALWAYS(); } VOID* Alloc(size_t size) const; VOID Free(VOID* pObj) const; VOID DebugPrint(const CHAR* pDebugString, ...) const; const Client* GetClient() const {return &m_client;} protected: Client m_client; static VOID* ClientAlloc(size_t size, const Client* pClient); static VOID ClientFree(VOID* pObj, const Client* pClient); private: // disallow the copy constructor Object(const Object& a); // disallow the assignment operator Object& operator=(const Object& a); }; } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/coord.cpp000066400000000000000000000312041420110115200234260ustar00rootroot00000000000000 /* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ // Coordinate class implementation #include "addrcommon.h" #include "coord.h" namespace rocr { namespace Addr { namespace V2 { Coordinate::Coordinate() { dim = DIM_X; ord = 0; } Coordinate::Coordinate(enum Dim dim, INT_32 n) { set(dim, n); } VOID Coordinate::set(enum Dim d, INT_32 n) { dim = d; ord = static_cast(n); } UINT_32 Coordinate::ison(const UINT_32 *coords) const { UINT_32 bit = static_cast(1ull << static_cast(ord)); return (coords[dim] & bit) ? 1 : 0; } enum Dim Coordinate::getdim() { return dim; } INT_8 Coordinate::getord() { return ord; } BOOL_32 Coordinate::operator==(const Coordinate& b) { return (dim == b.dim) && (ord == b.ord); } BOOL_32 Coordinate::operator<(const Coordinate& b) { BOOL_32 ret; if (dim == b.dim) { ret = ord < b.ord; } else { if (dim == DIM_S || b.dim == DIM_M) { ret = TRUE; } else if (b.dim == DIM_S || dim == DIM_M) { ret = FALSE; } else if (ord == b.ord) { ret = dim < b.dim; } else { ret = ord < b.ord; } } return ret; } BOOL_32 Coordinate::operator>(const Coordinate& b) { BOOL_32 lt = *this < b; BOOL_32 eq = *this == b; return !lt && !eq; } BOOL_32 Coordinate::operator<=(const Coordinate& b) { return (*this < b) || (*this == b); } BOOL_32 Coordinate::operator>=(const Coordinate& b) { return !(*this < b); } BOOL_32 Coordinate::operator!=(const Coordinate& b) { return !(*this == b); } Coordinate& Coordinate::operator++(INT_32) { ord++; return *this; } // CoordTerm CoordTerm::CoordTerm() { num_coords = 0; } VOID CoordTerm::Clear() { num_coords = 0; } VOID CoordTerm::add(Coordinate& co) { // This function adds a coordinate INT_32o the list // It will prevent the same coordinate from appearing, // and will keep the list ordered from smallest to largest UINT_32 i; for (i = 0; i < num_coords; i++) { if (m_coord[i] == co) { break; } if (m_coord[i] > co) { for (UINT_32 j = num_coords; j > i; j--) { m_coord[j] = m_coord[j - 1]; } m_coord[i] = co; num_coords++; break; } } if (i == num_coords) { m_coord[num_coords] = co; num_coords++; } } VOID CoordTerm::add(CoordTerm& cl) { for (UINT_32 i = 0; i < cl.num_coords; i++) { add(cl.m_coord[i]); } } BOOL_32 CoordTerm::remove(Coordinate& co) { BOOL_32 remove = FALSE; for (UINT_32 i = 0; i < num_coords; i++) { if (m_coord[i] == co) { remove = TRUE; num_coords--; } if (remove) { m_coord[i] = m_coord[i + 1]; } } return remove; } BOOL_32 CoordTerm::Exists(Coordinate& co) { BOOL_32 exists = FALSE; for (UINT_32 i = 0; i < num_coords; i++) { if (m_coord[i] == co) { exists = TRUE; break; } } return exists; } VOID CoordTerm::copyto(CoordTerm& cl) { cl.num_coords = num_coords; for (UINT_32 i = 0; i < num_coords; i++) { cl.m_coord[i] = m_coord[i]; } } UINT_32 CoordTerm::getsize() { return num_coords; } UINT_32 CoordTerm::getxor(const UINT_32 *coords) const { UINT_32 out = 0; for (UINT_32 i = 0; i < num_coords; i++) { out = out ^ m_coord[i].ison(coords); } return out; } VOID CoordTerm::getsmallest(Coordinate& co) { co = m_coord[0]; } UINT_32 CoordTerm::Filter(INT_8 f, Coordinate& co, UINT_32 start, enum Dim axis) { for (UINT_32 i = start; i < num_coords;) { if (((f == '<' && m_coord[i] < co) || (f == '>' && m_coord[i] > co) || (f == '=' && m_coord[i] == co)) && (axis == NUM_DIMS || axis == m_coord[i].getdim())) { for (UINT_32 j = i; j < num_coords - 1; j++) { m_coord[j] = m_coord[j + 1]; } num_coords--; } else { i++; } } return num_coords; } Coordinate& CoordTerm::operator[](UINT_32 i) { return m_coord[i]; } BOOL_32 CoordTerm::operator==(const CoordTerm& b) { BOOL_32 ret = TRUE; if (num_coords != b.num_coords) { ret = FALSE; } else { for (UINT_32 i = 0; i < num_coords; i++) { // Note: the lists will always be in order, so we can compare the two lists at time if (m_coord[i] != b.m_coord[i]) { ret = FALSE; break; } } } return ret; } BOOL_32 CoordTerm::operator!=(const CoordTerm& b) { return !(*this == b); } BOOL_32 CoordTerm::exceedRange(const UINT_32 *ranges) { BOOL_32 exceed = FALSE; for (UINT_32 i = 0; (i < num_coords) && (exceed == FALSE); i++) { exceed = ((1u << m_coord[i].getord()) <= ranges[m_coord[i].getdim()]); } return exceed; } // coordeq CoordEq::CoordEq() { m_numBits = 0; } VOID CoordEq::remove(Coordinate& co) { for (UINT_32 i = 0; i < m_numBits; i++) { m_eq[i].remove(co); } } BOOL_32 CoordEq::Exists(Coordinate& co) { BOOL_32 exists = FALSE; for (UINT_32 i = 0; i < m_numBits; i++) { if (m_eq[i].Exists(co)) { exists = TRUE; } } return exists; } VOID CoordEq::resize(UINT_32 n) { if (n > m_numBits) { for (UINT_32 i = m_numBits; i < n; i++) { m_eq[i].Clear(); } } m_numBits = n; } UINT_32 CoordEq::getsize() { return m_numBits; } UINT_64 CoordEq::solve(const UINT_32 *coords) const { UINT_64 out = 0; for (UINT_32 i = 0; i < m_numBits; i++) { out |= static_cast(m_eq[i].getxor(coords)) << i; } return out; } VOID CoordEq::solveAddr( UINT_64 addr, UINT_32 sliceInM, UINT_32 *coords) const { UINT_32 BitsValid[NUM_DIMS] = {0}; CoordEq temp = *this; memset(coords, 0, NUM_DIMS * sizeof(coords[0])); UINT_32 bitsLeft = 0; for (UINT_32 i = 0; i < temp.m_numBits; i++) { UINT_32 termSize = temp.m_eq[i].getsize(); if (termSize == 1) { INT_8 bit = (addr >> i) & 1; enum Dim dim = temp.m_eq[i][0].getdim(); INT_8 ord = temp.m_eq[i][0].getord(); ADDR_ASSERT((ord < 32) || (bit == 0)); BitsValid[dim] |= 1u << ord; coords[dim] |= bit << ord; temp.m_eq[i].Clear(); } else if (termSize > 1) { bitsLeft++; } } if (bitsLeft > 0) { if (sliceInM != 0) { coords[DIM_Z] = coords[DIM_M] / sliceInM; BitsValid[DIM_Z] = 0xffffffff; } do { bitsLeft = 0; for (UINT_32 i = 0; i < temp.m_numBits; i++) { UINT_32 termSize = temp.m_eq[i].getsize(); if (termSize == 1) { INT_8 bit = (addr >> i) & 1; enum Dim dim = temp.m_eq[i][0].getdim(); INT_8 ord = temp.m_eq[i][0].getord(); ADDR_ASSERT((ord < 32) || (bit == 0)); ADDR_ASSERT(dim < DIM_S); BitsValid[dim] |= 1u << ord; coords[dim] |= bit << ord; temp.m_eq[i].Clear(); } else if (termSize > 1) { CoordTerm tmpTerm = temp.m_eq[i]; for (UINT_32 j = 0; j < termSize; j++) { enum Dim dim = temp.m_eq[i][j].getdim(); INT_8 ord = temp.m_eq[i][j].getord(); ADDR_ASSERT(dim < DIM_S); if (BitsValid[dim] & (1u << ord)) { UINT_32 v = (((coords[dim] >> ord) & 1) << i); addr ^= static_cast(v); tmpTerm.remove(temp.m_eq[i][j]); } } temp.m_eq[i] = tmpTerm; bitsLeft++; } } } while (bitsLeft > 0); } } VOID CoordEq::copy(CoordEq& o, UINT_32 start, UINT_32 num) { o.m_numBits = (num == 0xFFFFFFFF) ? m_numBits : num; for (UINT_32 i = 0; i < o.m_numBits; i++) { m_eq[start + i].copyto(o.m_eq[i]); } } VOID CoordEq::reverse(UINT_32 start, UINT_32 num) { UINT_32 n = (num == 0xFFFFFFFF) ? m_numBits : num; for (UINT_32 i = 0; i < n / 2; i++) { CoordTerm temp; m_eq[start + i].copyto(temp); m_eq[start + n - 1 - i].copyto(m_eq[start + i]); temp.copyto(m_eq[start + n - 1 - i]); } } VOID CoordEq::xorin(CoordEq& x, UINT_32 start) { UINT_32 n = ((m_numBits - start) < x.m_numBits) ? (m_numBits - start) : x.m_numBits; for (UINT_32 i = 0; i < n; i++) { m_eq[start + i].add(x.m_eq[i]); } } UINT_32 CoordEq::Filter(INT_8 f, Coordinate& co, UINT_32 start, enum Dim axis) { for (UINT_32 i = start; i < m_numBits;) { UINT_32 m = m_eq[i].Filter(f, co, 0, axis); if (m == 0) { for (UINT_32 j = i; j < m_numBits - 1; j++) { m_eq[j] = m_eq[j + 1]; } m_numBits--; } else { i++; } } return m_numBits; } VOID CoordEq::shift(INT_32 amount, INT_32 start) { if (amount != 0) { INT_32 numBits = static_cast(m_numBits); amount = -amount; INT_32 inc = (amount < 0) ? -1 : 1; INT_32 i = (amount < 0) ? numBits - 1 : start; INT_32 end = (amount < 0) ? start - 1 : numBits; for (; (inc > 0) ? i < end : i > end; i += inc) { if ((i + amount < start) || (i + amount >= numBits)) { m_eq[i].Clear(); } else { m_eq[i + amount].copyto(m_eq[i]); } } } } CoordTerm& CoordEq::operator[](UINT_32 i) { return m_eq[i]; } VOID CoordEq::mort2d(Coordinate& c0, Coordinate& c1, UINT_32 start, UINT_32 end) { if (end == 0) { ADDR_ASSERT(m_numBits > 0); end = m_numBits - 1; } for (UINT_32 i = start; i <= end; i++) { UINT_32 select = (i - start) % 2; Coordinate& c = (select == 0) ? c0 : c1; m_eq[i].add(c); c++; } } VOID CoordEq::mort3d(Coordinate& c0, Coordinate& c1, Coordinate& c2, UINT_32 start, UINT_32 end) { if (end == 0) { ADDR_ASSERT(m_numBits > 0); end = m_numBits - 1; } for (UINT_32 i = start; i <= end; i++) { UINT_32 select = (i - start) % 3; Coordinate& c = (select == 0) ? c0 : ((select == 1) ? c1 : c2); m_eq[i].add(c); c++; } } BOOL_32 CoordEq::operator==(const CoordEq& b) { BOOL_32 ret = TRUE; if (m_numBits != b.m_numBits) { ret = FALSE; } else { for (UINT_32 i = 0; i < m_numBits; i++) { if (m_eq[i] != b.m_eq[i]) { ret = FALSE; break; } } } return ret; } BOOL_32 CoordEq::operator!=(const CoordEq& b) { return !(*this == b); } } // V2 } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/core/coord.h000066400000000000000000000073561420110115200231060ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ // Class used to define a coordinate bit #ifndef __COORD_H #define __COORD_H namespace rocr { namespace Addr { namespace V2 { enum Dim { DIM_X, DIM_Y, DIM_Z, DIM_S, DIM_M, NUM_DIMS }; class Coordinate { public: Coordinate(); Coordinate(enum Dim dim, INT_32 n); VOID set(enum Dim dim, INT_32 n); UINT_32 ison(const UINT_32 *coords) const; enum Dim getdim(); INT_8 getord(); BOOL_32 operator==(const Coordinate& b); BOOL_32 operator<(const Coordinate& b); BOOL_32 operator>(const Coordinate& b); BOOL_32 operator<=(const Coordinate& b); BOOL_32 operator>=(const Coordinate& b); BOOL_32 operator!=(const Coordinate& b); Coordinate& operator++(INT_32); private: enum Dim dim; INT_8 ord; }; class CoordTerm { public: CoordTerm(); VOID Clear(); VOID add(Coordinate& co); VOID add(CoordTerm& cl); BOOL_32 remove(Coordinate& co); BOOL_32 Exists(Coordinate& co); VOID copyto(CoordTerm& cl); UINT_32 getsize(); UINT_32 getxor(const UINT_32 *coords) const; VOID getsmallest(Coordinate& co); UINT_32 Filter(INT_8 f, Coordinate& co, UINT_32 start = 0, enum Dim axis = NUM_DIMS); Coordinate& operator[](UINT_32 i); BOOL_32 operator==(const CoordTerm& b); BOOL_32 operator!=(const CoordTerm& b); BOOL_32 exceedRange(const UINT_32 *ranges); private: static const UINT_32 MaxCoords = 8; UINT_32 num_coords; Coordinate m_coord[MaxCoords]; }; class CoordEq { public: CoordEq(); VOID remove(Coordinate& co); BOOL_32 Exists(Coordinate& co); VOID resize(UINT_32 n); UINT_32 getsize(); virtual UINT_64 solve(const UINT_32 *coords) const; virtual VOID solveAddr(UINT_64 addr, UINT_32 sliceInM, UINT_32 *coords) const; VOID copy(CoordEq& o, UINT_32 start = 0, UINT_32 num = 0xFFFFFFFF); VOID reverse(UINT_32 start = 0, UINT_32 num = 0xFFFFFFFF); VOID xorin(CoordEq& x, UINT_32 start = 0); UINT_32 Filter(INT_8 f, Coordinate& co, UINT_32 start = 0, enum Dim axis = NUM_DIMS); VOID shift(INT_32 amount, INT_32 start = 0); virtual CoordTerm& operator[](UINT_32 i); VOID mort2d(Coordinate& c0, Coordinate& c1, UINT_32 start = 0, UINT_32 end = 0); VOID mort3d(Coordinate& c0, Coordinate& c1, Coordinate& c2, UINT_32 start = 0, UINT_32 end = 0); BOOL_32 operator==(const CoordEq& b); BOOL_32 operator!=(const CoordEq& b); private: static const UINT_32 MaxEqBits = 64; UINT_32 m_numBits; CoordTerm m_eq[MaxEqBits]; }; } // V2 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/gfx10/000077500000000000000000000000001420110115200216115ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/gfx10/gfx10SwizzlePattern.h000066400000000000000000017374041420110115200256550ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** ************************************************************************************************************************ * @file gfx10SwizzlePattern.h * @brief swizzle pattern for gfx10. ************************************************************************************************************************ */ #ifndef __GFX10_SWIZZLE_PATTERN_H__ #define __GFX10_SWIZZLE_PATTERN_H__ namespace rocr { namespace Addr { namespace V2 { /** ************************************************************************************************************************ * @brief Bit setting for swizzle pattern ************************************************************************************************************************ */ union ADDR_BIT_SETTING { struct { UINT_16 x; UINT_16 y; UINT_16 z; UINT_16 s; }; UINT_64 value; }; /** ************************************************************************************************************************ * @brief Swizzle pattern information ************************************************************************************************************************ */ struct ADDR_SW_PATINFO { UINT_8 maxItemCount; UINT_8 nibble01Idx; UINT_16 nibble2Idx; UINT_16 nibble3Idx; UINT_8 nibble4Idx; }; /** ************************************************************************************************************************ * InitBit * * @brief * Initialize bit setting value via a return value ************************************************************************************************************************ */ #define InitBit(c, index) (1ull << ((c << 4) + index)) const UINT_64 X0 = InitBit(0, 0); const UINT_64 X1 = InitBit(0, 1); const UINT_64 X2 = InitBit(0, 2); const UINT_64 X3 = InitBit(0, 3); const UINT_64 X4 = InitBit(0, 4); const UINT_64 X5 = InitBit(0, 5); const UINT_64 X6 = InitBit(0, 6); const UINT_64 X7 = InitBit(0, 7); const UINT_64 X8 = InitBit(0, 8); const UINT_64 X9 = InitBit(0, 9); const UINT_64 X10 = InitBit(0, 10); const UINT_64 X11 = InitBit(0, 11); const UINT_64 Y0 = InitBit(1, 0); const UINT_64 Y1 = InitBit(1, 1); const UINT_64 Y2 = InitBit(1, 2); const UINT_64 Y3 = InitBit(1, 3); const UINT_64 Y4 = InitBit(1, 4); const UINT_64 Y5 = InitBit(1, 5); const UINT_64 Y6 = InitBit(1, 6); const UINT_64 Y7 = InitBit(1, 7); const UINT_64 Y8 = InitBit(1, 8); const UINT_64 Y9 = InitBit(1, 9); const UINT_64 Y10 = InitBit(1, 10); const UINT_64 Y11 = InitBit(1, 11); const UINT_64 Z0 = InitBit(2, 0); const UINT_64 Z1 = InitBit(2, 1); const UINT_64 Z2 = InitBit(2, 2); const UINT_64 Z3 = InitBit(2, 3); const UINT_64 Z4 = InitBit(2, 4); const UINT_64 Z5 = InitBit(2, 5); const UINT_64 Z6 = InitBit(2, 6); const UINT_64 Z7 = InitBit(2, 7); const UINT_64 Z8 = InitBit(2, 8); const UINT_64 S0 = InitBit(3, 0); const UINT_64 S1 = InitBit(3, 1); const UINT_64 S2 = InitBit(3, 2); const ADDR_SW_PATINFO SW_256_S_PATINFO[] = { { 1, 0, 0, 0, 0, } , // 1 pipes 1 bpe @ SW_256_S @ Navi1x { 1, 1, 0, 0, 0, } , // 1 pipes 2 bpe @ SW_256_S @ Navi1x { 1, 2, 0, 0, 0, } , // 1 pipes 4 bpe @ SW_256_S @ Navi1x { 1, 3, 0, 0, 0, } , // 1 pipes 8 bpe @ SW_256_S @ Navi1x { 1, 4, 0, 0, 0, } , // 1 pipes 16 bpe @ SW_256_S @ Navi1x { 1, 0, 0, 0, 0, } , // 2 pipes 1 bpe @ SW_256_S @ Navi1x { 1, 1, 0, 0, 0, } , // 2 pipes 2 bpe @ SW_256_S @ Navi1x { 1, 2, 0, 0, 0, } , // 2 pipes 4 bpe @ SW_256_S @ Navi1x { 1, 3, 0, 0, 0, } , // 2 pipes 8 bpe @ SW_256_S @ Navi1x { 1, 4, 0, 0, 0, } , // 2 pipes 16 bpe @ SW_256_S @ Navi1x { 1, 0, 0, 0, 0, } , // 4 pipes 1 bpe @ SW_256_S @ Navi1x { 1, 1, 0, 0, 0, } , // 4 pipes 2 bpe @ SW_256_S @ Navi1x { 1, 2, 0, 0, 0, } , // 4 pipes 4 bpe @ SW_256_S @ Navi1x { 1, 3, 0, 0, 0, } , // 4 pipes 8 bpe @ SW_256_S @ Navi1x { 1, 4, 0, 0, 0, } , // 4 pipes 16 bpe @ SW_256_S @ Navi1x { 1, 0, 0, 0, 0, } , // 8 pipes 1 bpe @ SW_256_S @ Navi1x { 1, 1, 0, 0, 0, } , // 8 pipes 2 bpe @ SW_256_S @ Navi1x { 1, 2, 0, 0, 0, } , // 8 pipes 4 bpe @ SW_256_S @ Navi1x { 1, 3, 0, 0, 0, } , // 8 pipes 8 bpe @ SW_256_S @ Navi1x { 1, 4, 0, 0, 0, } , // 8 pipes 16 bpe @ SW_256_S @ Navi1x { 1, 0, 0, 0, 0, } , // 16 pipes 1 bpe @ SW_256_S @ Navi1x { 1, 1, 0, 0, 0, } , // 16 pipes 2 bpe @ SW_256_S @ Navi1x { 1, 2, 0, 0, 0, } , // 16 pipes 4 bpe @ SW_256_S @ Navi1x { 1, 3, 0, 0, 0, } , // 16 pipes 8 bpe @ SW_256_S @ Navi1x { 1, 4, 0, 0, 0, } , // 16 pipes 16 bpe @ SW_256_S @ Navi1x { 1, 0, 0, 0, 0, } , // 32 pipes 1 bpe @ SW_256_S @ Navi1x { 1, 1, 0, 0, 0, } , // 32 pipes 2 bpe @ SW_256_S @ Navi1x { 1, 2, 0, 0, 0, } , // 32 pipes 4 bpe @ SW_256_S @ Navi1x { 1, 3, 0, 0, 0, } , // 32 pipes 8 bpe @ SW_256_S @ Navi1x { 1, 4, 0, 0, 0, } , // 32 pipes 16 bpe @ SW_256_S @ Navi1x { 1, 0, 0, 0, 0, } , // 64 pipes 1 bpe @ SW_256_S @ Navi1x { 1, 1, 0, 0, 0, } , // 64 pipes 2 bpe @ SW_256_S @ Navi1x { 1, 2, 0, 0, 0, } , // 64 pipes 4 bpe @ SW_256_S @ Navi1x { 1, 3, 0, 0, 0, } , // 64 pipes 8 bpe @ SW_256_S @ Navi1x { 1, 4, 0, 0, 0, } , // 64 pipes 16 bpe @ SW_256_S @ Navi1x }; const ADDR_SW_PATINFO SW_256_D_PATINFO[] = { { 1, 5, 0, 0, 0, } , // 1 pipes 1 bpe @ SW_256_D @ Navi1x { 1, 1, 0, 0, 0, } , // 1 pipes 2 bpe @ SW_256_D @ Navi1x { 1, 2, 0, 0, 0, } , // 1 pipes 4 bpe @ SW_256_D @ Navi1x { 1, 6, 0, 0, 0, } , // 1 pipes 8 bpe @ SW_256_D @ Navi1x { 1, 7, 0, 0, 0, } , // 1 pipes 16 bpe @ SW_256_D @ Navi1x { 1, 5, 0, 0, 0, } , // 2 pipes 1 bpe @ SW_256_D @ Navi1x { 1, 1, 0, 0, 0, } , // 2 pipes 2 bpe @ SW_256_D @ Navi1x { 1, 2, 0, 0, 0, } , // 2 pipes 4 bpe @ SW_256_D @ Navi1x { 1, 6, 0, 0, 0, } , // 2 pipes 8 bpe @ SW_256_D @ Navi1x { 1, 7, 0, 0, 0, } , // 2 pipes 16 bpe @ SW_256_D @ Navi1x { 1, 5, 0, 0, 0, } , // 4 pipes 1 bpe @ SW_256_D @ Navi1x { 1, 1, 0, 0, 0, } , // 4 pipes 2 bpe @ SW_256_D @ Navi1x { 1, 2, 0, 0, 0, } , // 4 pipes 4 bpe @ SW_256_D @ Navi1x { 1, 6, 0, 0, 0, } , // 4 pipes 8 bpe @ SW_256_D @ Navi1x { 1, 7, 0, 0, 0, } , // 4 pipes 16 bpe @ SW_256_D @ Navi1x { 1, 5, 0, 0, 0, } , // 8 pipes 1 bpe @ SW_256_D @ Navi1x { 1, 1, 0, 0, 0, } , // 8 pipes 2 bpe @ SW_256_D @ Navi1x { 1, 2, 0, 0, 0, } , // 8 pipes 4 bpe @ SW_256_D @ Navi1x { 1, 6, 0, 0, 0, } , // 8 pipes 8 bpe @ SW_256_D @ Navi1x { 1, 7, 0, 0, 0, } , // 8 pipes 16 bpe @ SW_256_D @ Navi1x { 1, 5, 0, 0, 0, } , // 16 pipes 1 bpe @ SW_256_D @ Navi1x { 1, 1, 0, 0, 0, } , // 16 pipes 2 bpe @ SW_256_D @ Navi1x { 1, 2, 0, 0, 0, } , // 16 pipes 4 bpe @ SW_256_D @ Navi1x { 1, 6, 0, 0, 0, } , // 16 pipes 8 bpe @ SW_256_D @ Navi1x { 1, 7, 0, 0, 0, } , // 16 pipes 16 bpe @ SW_256_D @ Navi1x { 1, 5, 0, 0, 0, } , // 32 pipes 1 bpe @ SW_256_D @ Navi1x { 1, 1, 0, 0, 0, } , // 32 pipes 2 bpe @ SW_256_D @ Navi1x { 1, 2, 0, 0, 0, } , // 32 pipes 4 bpe @ SW_256_D @ Navi1x { 1, 6, 0, 0, 0, } , // 32 pipes 8 bpe @ SW_256_D @ Navi1x { 1, 7, 0, 0, 0, } , // 32 pipes 16 bpe @ SW_256_D @ Navi1x { 1, 5, 0, 0, 0, } , // 64 pipes 1 bpe @ SW_256_D @ Navi1x { 1, 1, 0, 0, 0, } , // 64 pipes 2 bpe @ SW_256_D @ Navi1x { 1, 2, 0, 0, 0, } , // 64 pipes 4 bpe @ SW_256_D @ Navi1x { 1, 6, 0, 0, 0, } , // 64 pipes 8 bpe @ SW_256_D @ Navi1x { 1, 7, 0, 0, 0, } , // 64 pipes 16 bpe @ SW_256_D @ Navi1x }; const ADDR_SW_PATINFO SW_4K_S_PATINFO[] = { { 1, 0, 1, 0, 0, } , // 1 pipes 1 bpe @ SW_4K_S @ Navi1x { 1, 1, 2, 0, 0, } , // 1 pipes 2 bpe @ SW_4K_S @ Navi1x { 1, 2, 3, 0, 0, } , // 1 pipes 4 bpe @ SW_4K_S @ Navi1x { 1, 3, 4, 0, 0, } , // 1 pipes 8 bpe @ SW_4K_S @ Navi1x { 1, 4, 5, 0, 0, } , // 1 pipes 16 bpe @ SW_4K_S @ Navi1x { 1, 0, 1, 0, 0, } , // 2 pipes 1 bpe @ SW_4K_S @ Navi1x { 1, 1, 2, 0, 0, } , // 2 pipes 2 bpe @ SW_4K_S @ Navi1x { 1, 2, 3, 0, 0, } , // 2 pipes 4 bpe @ SW_4K_S @ Navi1x { 1, 3, 4, 0, 0, } , // 2 pipes 8 bpe @ SW_4K_S @ Navi1x { 1, 4, 5, 0, 0, } , // 2 pipes 16 bpe @ SW_4K_S @ Navi1x { 1, 0, 1, 0, 0, } , // 4 pipes 1 bpe @ SW_4K_S @ Navi1x { 1, 1, 2, 0, 0, } , // 4 pipes 2 bpe @ SW_4K_S @ Navi1x { 1, 2, 3, 0, 0, } , // 4 pipes 4 bpe @ SW_4K_S @ Navi1x { 1, 3, 4, 0, 0, } , // 4 pipes 8 bpe @ SW_4K_S @ Navi1x { 1, 4, 5, 0, 0, } , // 4 pipes 16 bpe @ SW_4K_S @ Navi1x { 1, 0, 1, 0, 0, } , // 8 pipes 1 bpe @ SW_4K_S @ Navi1x { 1, 1, 2, 0, 0, } , // 8 pipes 2 bpe @ SW_4K_S @ Navi1x { 1, 2, 3, 0, 0, } , // 8 pipes 4 bpe @ SW_4K_S @ Navi1x { 1, 3, 4, 0, 0, } , // 8 pipes 8 bpe @ SW_4K_S @ Navi1x { 1, 4, 5, 0, 0, } , // 8 pipes 16 bpe @ SW_4K_S @ Navi1x { 1, 0, 1, 0, 0, } , // 16 pipes 1 bpe @ SW_4K_S @ Navi1x { 1, 1, 2, 0, 0, } , // 16 pipes 2 bpe @ SW_4K_S @ Navi1x { 1, 2, 3, 0, 0, } , // 16 pipes 4 bpe @ SW_4K_S @ Navi1x { 1, 3, 4, 0, 0, } , // 16 pipes 8 bpe @ SW_4K_S @ Navi1x { 1, 4, 5, 0, 0, } , // 16 pipes 16 bpe @ SW_4K_S @ Navi1x { 1, 0, 1, 0, 0, } , // 32 pipes 1 bpe @ SW_4K_S @ Navi1x { 1, 1, 2, 0, 0, } , // 32 pipes 2 bpe @ SW_4K_S @ Navi1x { 1, 2, 3, 0, 0, } , // 32 pipes 4 bpe @ SW_4K_S @ Navi1x { 1, 3, 4, 0, 0, } , // 32 pipes 8 bpe @ SW_4K_S @ Navi1x { 1, 4, 5, 0, 0, } , // 32 pipes 16 bpe @ SW_4K_S @ Navi1x { 1, 0, 1, 0, 0, } , // 64 pipes 1 bpe @ SW_4K_S @ Navi1x { 1, 1, 2, 0, 0, } , // 64 pipes 2 bpe @ SW_4K_S @ Navi1x { 1, 2, 3, 0, 0, } , // 64 pipes 4 bpe @ SW_4K_S @ Navi1x { 1, 3, 4, 0, 0, } , // 64 pipes 8 bpe @ SW_4K_S @ Navi1x { 1, 4, 5, 0, 0, } , // 64 pipes 16 bpe @ SW_4K_S @ Navi1x }; const ADDR_SW_PATINFO SW_4K_D_PATINFO[] = { { 1, 5, 1, 0, 0, } , // 1 pipes 1 bpe @ SW_4K_D @ Navi1x { 1, 1, 2, 0, 0, } , // 1 pipes 2 bpe @ SW_4K_D @ Navi1x { 1, 2, 3, 0, 0, } , // 1 pipes 4 bpe @ SW_4K_D @ Navi1x { 1, 6, 4, 0, 0, } , // 1 pipes 8 bpe @ SW_4K_D @ Navi1x { 1, 7, 5, 0, 0, } , // 1 pipes 16 bpe @ SW_4K_D @ Navi1x { 1, 5, 1, 0, 0, } , // 2 pipes 1 bpe @ SW_4K_D @ Navi1x { 1, 1, 2, 0, 0, } , // 2 pipes 2 bpe @ SW_4K_D @ Navi1x { 1, 2, 3, 0, 0, } , // 2 pipes 4 bpe @ SW_4K_D @ Navi1x { 1, 6, 4, 0, 0, } , // 2 pipes 8 bpe @ SW_4K_D @ Navi1x { 1, 7, 5, 0, 0, } , // 2 pipes 16 bpe @ SW_4K_D @ Navi1x { 1, 5, 1, 0, 0, } , // 4 pipes 1 bpe @ SW_4K_D @ Navi1x { 1, 1, 2, 0, 0, } , // 4 pipes 2 bpe @ SW_4K_D @ Navi1x { 1, 2, 3, 0, 0, } , // 4 pipes 4 bpe @ SW_4K_D @ Navi1x { 1, 6, 4, 0, 0, } , // 4 pipes 8 bpe @ SW_4K_D @ Navi1x { 1, 7, 5, 0, 0, } , // 4 pipes 16 bpe @ SW_4K_D @ Navi1x { 1, 5, 1, 0, 0, } , // 8 pipes 1 bpe @ SW_4K_D @ Navi1x { 1, 1, 2, 0, 0, } , // 8 pipes 2 bpe @ SW_4K_D @ Navi1x { 1, 2, 3, 0, 0, } , // 8 pipes 4 bpe @ SW_4K_D @ Navi1x { 1, 6, 4, 0, 0, } , // 8 pipes 8 bpe @ SW_4K_D @ Navi1x { 1, 7, 5, 0, 0, } , // 8 pipes 16 bpe @ SW_4K_D @ Navi1x { 1, 5, 1, 0, 0, } , // 16 pipes 1 bpe @ SW_4K_D @ Navi1x { 1, 1, 2, 0, 0, } , // 16 pipes 2 bpe @ SW_4K_D @ Navi1x { 1, 2, 3, 0, 0, } , // 16 pipes 4 bpe @ SW_4K_D @ Navi1x { 1, 6, 4, 0, 0, } , // 16 pipes 8 bpe @ SW_4K_D @ Navi1x { 1, 7, 5, 0, 0, } , // 16 pipes 16 bpe @ SW_4K_D @ Navi1x { 1, 5, 1, 0, 0, } , // 32 pipes 1 bpe @ SW_4K_D @ Navi1x { 1, 1, 2, 0, 0, } , // 32 pipes 2 bpe @ SW_4K_D @ Navi1x { 1, 2, 3, 0, 0, } , // 32 pipes 4 bpe @ SW_4K_D @ Navi1x { 1, 6, 4, 0, 0, } , // 32 pipes 8 bpe @ SW_4K_D @ Navi1x { 1, 7, 5, 0, 0, } , // 32 pipes 16 bpe @ SW_4K_D @ Navi1x { 1, 5, 1, 0, 0, } , // 64 pipes 1 bpe @ SW_4K_D @ Navi1x { 1, 1, 2, 0, 0, } , // 64 pipes 2 bpe @ SW_4K_D @ Navi1x { 1, 2, 3, 0, 0, } , // 64 pipes 4 bpe @ SW_4K_D @ Navi1x { 1, 6, 4, 0, 0, } , // 64 pipes 8 bpe @ SW_4K_D @ Navi1x { 1, 7, 5, 0, 0, } , // 64 pipes 16 bpe @ SW_4K_D @ Navi1x }; const ADDR_SW_PATINFO SW_4K_S_X_PATINFO[] = { { 1, 0, 1, 0, 0, } , // 1 pipes 1 bpe @ SW_4K_S_X @ Navi1x { 1, 1, 2, 0, 0, } , // 1 pipes 2 bpe @ SW_4K_S_X @ Navi1x { 1, 2, 3, 0, 0, } , // 1 pipes 4 bpe @ SW_4K_S_X @ Navi1x { 1, 3, 4, 0, 0, } , // 1 pipes 8 bpe @ SW_4K_S_X @ Navi1x { 1, 4, 5, 0, 0, } , // 1 pipes 16 bpe @ SW_4K_S_X @ Navi1x { 3, 0, 6, 0, 0, } , // 2 pipes 1 bpe @ SW_4K_S_X @ Navi1x { 3, 1, 7, 0, 0, } , // 2 pipes 2 bpe @ SW_4K_S_X @ Navi1x { 3, 2, 8, 0, 0, } , // 2 pipes 4 bpe @ SW_4K_S_X @ Navi1x { 3, 3, 9, 0, 0, } , // 2 pipes 8 bpe @ SW_4K_S_X @ Navi1x { 3, 4, 10, 0, 0, } , // 2 pipes 16 bpe @ SW_4K_S_X @ Navi1x { 3, 0, 11, 0, 0, } , // 4 pipes 1 bpe @ SW_4K_S_X @ Navi1x { 3, 1, 12, 0, 0, } , // 4 pipes 2 bpe @ SW_4K_S_X @ Navi1x { 3, 2, 13, 0, 0, } , // 4 pipes 4 bpe @ SW_4K_S_X @ Navi1x { 3, 3, 14, 0, 0, } , // 4 pipes 8 bpe @ SW_4K_S_X @ Navi1x { 3, 4, 15, 0, 0, } , // 4 pipes 16 bpe @ SW_4K_S_X @ Navi1x { 3, 0, 16, 0, 0, } , // 8 pipes 1 bpe @ SW_4K_S_X @ Navi1x { 3, 1, 17, 0, 0, } , // 8 pipes 2 bpe @ SW_4K_S_X @ Navi1x { 3, 2, 18, 0, 0, } , // 8 pipes 4 bpe @ SW_4K_S_X @ Navi1x { 3, 3, 19, 0, 0, } , // 8 pipes 8 bpe @ SW_4K_S_X @ Navi1x { 3, 4, 20, 0, 0, } , // 8 pipes 16 bpe @ SW_4K_S_X @ Navi1x { 3, 0, 21, 0, 0, } , // 16 pipes 1 bpe @ SW_4K_S_X @ Navi1x { 3, 1, 22, 0, 0, } , // 16 pipes 2 bpe @ SW_4K_S_X @ Navi1x { 3, 2, 23, 0, 0, } , // 16 pipes 4 bpe @ SW_4K_S_X @ Navi1x { 3, 3, 24, 0, 0, } , // 16 pipes 8 bpe @ SW_4K_S_X @ Navi1x { 3, 4, 25, 0, 0, } , // 16 pipes 16 bpe @ SW_4K_S_X @ Navi1x { 3, 0, 21, 0, 0, } , // 32 pipes 1 bpe @ SW_4K_S_X @ Navi1x { 3, 1, 22, 0, 0, } , // 32 pipes 2 bpe @ SW_4K_S_X @ Navi1x { 3, 2, 23, 0, 0, } , // 32 pipes 4 bpe @ SW_4K_S_X @ Navi1x { 3, 3, 24, 0, 0, } , // 32 pipes 8 bpe @ SW_4K_S_X @ Navi1x { 3, 4, 25, 0, 0, } , // 32 pipes 16 bpe @ SW_4K_S_X @ Navi1x { 3, 0, 21, 0, 0, } , // 64 pipes 1 bpe @ SW_4K_S_X @ Navi1x { 3, 1, 22, 0, 0, } , // 64 pipes 2 bpe @ SW_4K_S_X @ Navi1x { 3, 2, 23, 0, 0, } , // 64 pipes 4 bpe @ SW_4K_S_X @ Navi1x { 3, 3, 24, 0, 0, } , // 64 pipes 8 bpe @ SW_4K_S_X @ Navi1x { 3, 4, 25, 0, 0, } , // 64 pipes 16 bpe @ SW_4K_S_X @ Navi1x }; const ADDR_SW_PATINFO SW_4K_D_X_PATINFO[] = { { 1, 5, 1, 0, 0, } , // 1 pipes 1 bpe @ SW_4K_D_X @ Navi1x { 1, 1, 2, 0, 0, } , // 1 pipes 2 bpe @ SW_4K_D_X @ Navi1x { 1, 2, 3, 0, 0, } , // 1 pipes 4 bpe @ SW_4K_D_X @ Navi1x { 1, 6, 4, 0, 0, } , // 1 pipes 8 bpe @ SW_4K_D_X @ Navi1x { 1, 7, 5, 0, 0, } , // 1 pipes 16 bpe @ SW_4K_D_X @ Navi1x { 3, 5, 6, 0, 0, } , // 2 pipes 1 bpe @ SW_4K_D_X @ Navi1x { 3, 1, 7, 0, 0, } , // 2 pipes 2 bpe @ SW_4K_D_X @ Navi1x { 3, 2, 8, 0, 0, } , // 2 pipes 4 bpe @ SW_4K_D_X @ Navi1x { 3, 6, 9, 0, 0, } , // 2 pipes 8 bpe @ SW_4K_D_X @ Navi1x { 3, 7, 10, 0, 0, } , // 2 pipes 16 bpe @ SW_4K_D_X @ Navi1x { 3, 5, 11, 0, 0, } , // 4 pipes 1 bpe @ SW_4K_D_X @ Navi1x { 3, 1, 12, 0, 0, } , // 4 pipes 2 bpe @ SW_4K_D_X @ Navi1x { 3, 2, 13, 0, 0, } , // 4 pipes 4 bpe @ SW_4K_D_X @ Navi1x { 3, 6, 14, 0, 0, } , // 4 pipes 8 bpe @ SW_4K_D_X @ Navi1x { 3, 7, 15, 0, 0, } , // 4 pipes 16 bpe @ SW_4K_D_X @ Navi1x { 3, 5, 16, 0, 0, } , // 8 pipes 1 bpe @ SW_4K_D_X @ Navi1x { 3, 1, 17, 0, 0, } , // 8 pipes 2 bpe @ SW_4K_D_X @ Navi1x { 3, 2, 18, 0, 0, } , // 8 pipes 4 bpe @ SW_4K_D_X @ Navi1x { 3, 6, 19, 0, 0, } , // 8 pipes 8 bpe @ SW_4K_D_X @ Navi1x { 3, 7, 20, 0, 0, } , // 8 pipes 16 bpe @ SW_4K_D_X @ Navi1x { 3, 5, 21, 0, 0, } , // 16 pipes 1 bpe @ SW_4K_D_X @ Navi1x { 3, 1, 22, 0, 0, } , // 16 pipes 2 bpe @ SW_4K_D_X @ Navi1x { 3, 2, 23, 0, 0, } , // 16 pipes 4 bpe @ SW_4K_D_X @ Navi1x { 3, 6, 24, 0, 0, } , // 16 pipes 8 bpe @ SW_4K_D_X @ Navi1x { 3, 7, 25, 0, 0, } , // 16 pipes 16 bpe @ SW_4K_D_X @ Navi1x { 3, 5, 21, 0, 0, } , // 32 pipes 1 bpe @ SW_4K_D_X @ Navi1x { 3, 1, 22, 0, 0, } , // 32 pipes 2 bpe @ SW_4K_D_X @ Navi1x { 3, 2, 23, 0, 0, } , // 32 pipes 4 bpe @ SW_4K_D_X @ Navi1x { 3, 6, 24, 0, 0, } , // 32 pipes 8 bpe @ SW_4K_D_X @ Navi1x { 3, 7, 25, 0, 0, } , // 32 pipes 16 bpe @ SW_4K_D_X @ Navi1x { 3, 5, 21, 0, 0, } , // 64 pipes 1 bpe @ SW_4K_D_X @ Navi1x { 3, 1, 22, 0, 0, } , // 64 pipes 2 bpe @ SW_4K_D_X @ Navi1x { 3, 2, 23, 0, 0, } , // 64 pipes 4 bpe @ SW_4K_D_X @ Navi1x { 3, 6, 24, 0, 0, } , // 64 pipes 8 bpe @ SW_4K_D_X @ Navi1x { 3, 7, 25, 0, 0, } , // 64 pipes 16 bpe @ SW_4K_D_X @ Navi1x }; const ADDR_SW_PATINFO SW_4K_S3_PATINFO[] = { { 1, 29, 131, 0, 0, } , // 1 pipes 1 bpe @ SW_4K_S3 @ Navi1x { 1, 30, 132, 0, 0, } , // 1 pipes 2 bpe @ SW_4K_S3 @ Navi1x { 1, 31, 133, 0, 0, } , // 1 pipes 4 bpe @ SW_4K_S3 @ Navi1x { 1, 32, 134, 0, 0, } , // 1 pipes 8 bpe @ SW_4K_S3 @ Navi1x { 1, 33, 135, 0, 0, } , // 1 pipes 16 bpe @ SW_4K_S3 @ Navi1x { 1, 29, 131, 0, 0, } , // 2 pipes 1 bpe @ SW_4K_S3 @ Navi1x { 1, 30, 132, 0, 0, } , // 2 pipes 2 bpe @ SW_4K_S3 @ Navi1x { 1, 31, 133, 0, 0, } , // 2 pipes 4 bpe @ SW_4K_S3 @ Navi1x { 1, 32, 134, 0, 0, } , // 2 pipes 8 bpe @ SW_4K_S3 @ Navi1x { 1, 33, 135, 0, 0, } , // 2 pipes 16 bpe @ SW_4K_S3 @ Navi1x { 1, 29, 131, 0, 0, } , // 4 pipes 1 bpe @ SW_4K_S3 @ Navi1x { 1, 30, 132, 0, 0, } , // 4 pipes 2 bpe @ SW_4K_S3 @ Navi1x { 1, 31, 133, 0, 0, } , // 4 pipes 4 bpe @ SW_4K_S3 @ Navi1x { 1, 32, 134, 0, 0, } , // 4 pipes 8 bpe @ SW_4K_S3 @ Navi1x { 1, 33, 135, 0, 0, } , // 4 pipes 16 bpe @ SW_4K_S3 @ Navi1x { 1, 29, 131, 0, 0, } , // 8 pipes 1 bpe @ SW_4K_S3 @ Navi1x { 1, 30, 132, 0, 0, } , // 8 pipes 2 bpe @ SW_4K_S3 @ Navi1x { 1, 31, 133, 0, 0, } , // 8 pipes 4 bpe @ SW_4K_S3 @ Navi1x { 1, 32, 134, 0, 0, } , // 8 pipes 8 bpe @ SW_4K_S3 @ Navi1x { 1, 33, 135, 0, 0, } , // 8 pipes 16 bpe @ SW_4K_S3 @ Navi1x { 1, 29, 131, 0, 0, } , // 16 pipes 1 bpe @ SW_4K_S3 @ Navi1x { 1, 30, 132, 0, 0, } , // 16 pipes 2 bpe @ SW_4K_S3 @ Navi1x { 1, 31, 133, 0, 0, } , // 16 pipes 4 bpe @ SW_4K_S3 @ Navi1x { 1, 32, 134, 0, 0, } , // 16 pipes 8 bpe @ SW_4K_S3 @ Navi1x { 1, 33, 135, 0, 0, } , // 16 pipes 16 bpe @ SW_4K_S3 @ Navi1x { 1, 29, 131, 0, 0, } , // 32 pipes 1 bpe @ SW_4K_S3 @ Navi1x { 1, 30, 132, 0, 0, } , // 32 pipes 2 bpe @ SW_4K_S3 @ Navi1x { 1, 31, 133, 0, 0, } , // 32 pipes 4 bpe @ SW_4K_S3 @ Navi1x { 1, 32, 134, 0, 0, } , // 32 pipes 8 bpe @ SW_4K_S3 @ Navi1x { 1, 33, 135, 0, 0, } , // 32 pipes 16 bpe @ SW_4K_S3 @ Navi1x { 1, 29, 131, 0, 0, } , // 64 pipes 1 bpe @ SW_4K_S3 @ Navi1x { 1, 30, 132, 0, 0, } , // 64 pipes 2 bpe @ SW_4K_S3 @ Navi1x { 1, 31, 133, 0, 0, } , // 64 pipes 4 bpe @ SW_4K_S3 @ Navi1x { 1, 32, 134, 0, 0, } , // 64 pipes 8 bpe @ SW_4K_S3 @ Navi1x { 1, 33, 135, 0, 0, } , // 64 pipes 16 bpe @ SW_4K_S3 @ Navi1x }; const ADDR_SW_PATINFO SW_4K_S3_X_PATINFO[] = { { 1, 29, 131, 0, 0, } , // 1 pipes 1 bpe @ SW_4K_S3_X @ Navi1x { 1, 30, 132, 0, 0, } , // 1 pipes 2 bpe @ SW_4K_S3_X @ Navi1x { 1, 31, 133, 0, 0, } , // 1 pipes 4 bpe @ SW_4K_S3_X @ Navi1x { 1, 32, 134, 0, 0, } , // 1 pipes 8 bpe @ SW_4K_S3_X @ Navi1x { 1, 33, 135, 0, 0, } , // 1 pipes 16 bpe @ SW_4K_S3_X @ Navi1x { 3, 29, 136, 0, 0, } , // 2 pipes 1 bpe @ SW_4K_S3_X @ Navi1x { 3, 30, 137, 0, 0, } , // 2 pipes 2 bpe @ SW_4K_S3_X @ Navi1x { 3, 31, 138, 0, 0, } , // 2 pipes 4 bpe @ SW_4K_S3_X @ Navi1x { 3, 32, 139, 0, 0, } , // 2 pipes 8 bpe @ SW_4K_S3_X @ Navi1x { 3, 33, 140, 0, 0, } , // 2 pipes 16 bpe @ SW_4K_S3_X @ Navi1x { 3, 29, 141, 0, 0, } , // 4 pipes 1 bpe @ SW_4K_S3_X @ Navi1x { 3, 30, 142, 0, 0, } , // 4 pipes 2 bpe @ SW_4K_S3_X @ Navi1x { 3, 31, 143, 0, 0, } , // 4 pipes 4 bpe @ SW_4K_S3_X @ Navi1x { 3, 32, 144, 0, 0, } , // 4 pipes 8 bpe @ SW_4K_S3_X @ Navi1x { 3, 33, 145, 0, 0, } , // 4 pipes 16 bpe @ SW_4K_S3_X @ Navi1x { 3, 29, 146, 0, 0, } , // 8 pipes 1 bpe @ SW_4K_S3_X @ Navi1x { 3, 30, 147, 0, 0, } , // 8 pipes 2 bpe @ SW_4K_S3_X @ Navi1x { 3, 31, 148, 0, 0, } , // 8 pipes 4 bpe @ SW_4K_S3_X @ Navi1x { 3, 32, 149, 0, 0, } , // 8 pipes 8 bpe @ SW_4K_S3_X @ Navi1x { 3, 33, 150, 0, 0, } , // 8 pipes 16 bpe @ SW_4K_S3_X @ Navi1x { 3, 29, 151, 0, 0, } , // 16 pipes 1 bpe @ SW_4K_S3_X @ Navi1x { 3, 30, 152, 0, 0, } , // 16 pipes 2 bpe @ SW_4K_S3_X @ Navi1x { 3, 31, 153, 0, 0, } , // 16 pipes 4 bpe @ SW_4K_S3_X @ Navi1x { 3, 32, 154, 0, 0, } , // 16 pipes 8 bpe @ SW_4K_S3_X @ Navi1x { 3, 33, 155, 0, 0, } , // 16 pipes 16 bpe @ SW_4K_S3_X @ Navi1x { 3, 29, 151, 0, 0, } , // 32 pipes 1 bpe @ SW_4K_S3_X @ Navi1x { 3, 30, 152, 0, 0, } , // 32 pipes 2 bpe @ SW_4K_S3_X @ Navi1x { 3, 31, 153, 0, 0, } , // 32 pipes 4 bpe @ SW_4K_S3_X @ Navi1x { 3, 32, 154, 0, 0, } , // 32 pipes 8 bpe @ SW_4K_S3_X @ Navi1x { 3, 33, 155, 0, 0, } , // 32 pipes 16 bpe @ SW_4K_S3_X @ Navi1x { 3, 29, 151, 0, 0, } , // 64 pipes 1 bpe @ SW_4K_S3_X @ Navi1x { 3, 30, 152, 0, 0, } , // 64 pipes 2 bpe @ SW_4K_S3_X @ Navi1x { 3, 31, 153, 0, 0, } , // 64 pipes 4 bpe @ SW_4K_S3_X @ Navi1x { 3, 32, 154, 0, 0, } , // 64 pipes 8 bpe @ SW_4K_S3_X @ Navi1x { 3, 33, 155, 0, 0, } , // 64 pipes 16 bpe @ SW_4K_S3_X @ Navi1x }; const ADDR_SW_PATINFO SW_64K_S_PATINFO[] = { { 1, 0, 1, 1, 0, } , // 1 pipes 1 bpe @ SW_64K_S @ Navi1x { 1, 1, 2, 2, 0, } , // 1 pipes 2 bpe @ SW_64K_S @ Navi1x { 1, 2, 3, 3, 0, } , // 1 pipes 4 bpe @ SW_64K_S @ Navi1x { 1, 3, 4, 4, 0, } , // 1 pipes 8 bpe @ SW_64K_S @ Navi1x { 1, 4, 5, 5, 0, } , // 1 pipes 16 bpe @ SW_64K_S @ Navi1x { 1, 0, 1, 1, 0, } , // 2 pipes 1 bpe @ SW_64K_S @ Navi1x { 1, 1, 2, 2, 0, } , // 2 pipes 2 bpe @ SW_64K_S @ Navi1x { 1, 2, 3, 3, 0, } , // 2 pipes 4 bpe @ SW_64K_S @ Navi1x { 1, 3, 4, 4, 0, } , // 2 pipes 8 bpe @ SW_64K_S @ Navi1x { 1, 4, 5, 5, 0, } , // 2 pipes 16 bpe @ SW_64K_S @ Navi1x { 1, 0, 1, 1, 0, } , // 4 pipes 1 bpe @ SW_64K_S @ Navi1x { 1, 1, 2, 2, 0, } , // 4 pipes 2 bpe @ SW_64K_S @ Navi1x { 1, 2, 3, 3, 0, } , // 4 pipes 4 bpe @ SW_64K_S @ Navi1x { 1, 3, 4, 4, 0, } , // 4 pipes 8 bpe @ SW_64K_S @ Navi1x { 1, 4, 5, 5, 0, } , // 4 pipes 16 bpe @ SW_64K_S @ Navi1x { 1, 0, 1, 1, 0, } , // 8 pipes 1 bpe @ SW_64K_S @ Navi1x { 1, 1, 2, 2, 0, } , // 8 pipes 2 bpe @ SW_64K_S @ Navi1x { 1, 2, 3, 3, 0, } , // 8 pipes 4 bpe @ SW_64K_S @ Navi1x { 1, 3, 4, 4, 0, } , // 8 pipes 8 bpe @ SW_64K_S @ Navi1x { 1, 4, 5, 5, 0, } , // 8 pipes 16 bpe @ SW_64K_S @ Navi1x { 1, 0, 1, 1, 0, } , // 16 pipes 1 bpe @ SW_64K_S @ Navi1x { 1, 1, 2, 2, 0, } , // 16 pipes 2 bpe @ SW_64K_S @ Navi1x { 1, 2, 3, 3, 0, } , // 16 pipes 4 bpe @ SW_64K_S @ Navi1x { 1, 3, 4, 4, 0, } , // 16 pipes 8 bpe @ SW_64K_S @ Navi1x { 1, 4, 5, 5, 0, } , // 16 pipes 16 bpe @ SW_64K_S @ Navi1x { 1, 0, 1, 1, 0, } , // 32 pipes 1 bpe @ SW_64K_S @ Navi1x { 1, 1, 2, 2, 0, } , // 32 pipes 2 bpe @ SW_64K_S @ Navi1x { 1, 2, 3, 3, 0, } , // 32 pipes 4 bpe @ SW_64K_S @ Navi1x { 1, 3, 4, 4, 0, } , // 32 pipes 8 bpe @ SW_64K_S @ Navi1x { 1, 4, 5, 5, 0, } , // 32 pipes 16 bpe @ SW_64K_S @ Navi1x { 1, 0, 1, 1, 0, } , // 64 pipes 1 bpe @ SW_64K_S @ Navi1x { 1, 1, 2, 2, 0, } , // 64 pipes 2 bpe @ SW_64K_S @ Navi1x { 1, 2, 3, 3, 0, } , // 64 pipes 4 bpe @ SW_64K_S @ Navi1x { 1, 3, 4, 4, 0, } , // 64 pipes 8 bpe @ SW_64K_S @ Navi1x { 1, 4, 5, 5, 0, } , // 64 pipes 16 bpe @ SW_64K_S @ Navi1x }; const ADDR_SW_PATINFO SW_64K_D_PATINFO[] = { { 1, 5, 1, 1, 0, } , // 1 pipes 1 bpe @ SW_64K_D @ Navi1x { 1, 1, 2, 2, 0, } , // 1 pipes 2 bpe @ SW_64K_D @ Navi1x { 1, 2, 3, 3, 0, } , // 1 pipes 4 bpe @ SW_64K_D @ Navi1x { 1, 6, 4, 4, 0, } , // 1 pipes 8 bpe @ SW_64K_D @ Navi1x { 1, 7, 5, 5, 0, } , // 1 pipes 16 bpe @ SW_64K_D @ Navi1x { 1, 5, 1, 1, 0, } , // 2 pipes 1 bpe @ SW_64K_D @ Navi1x { 1, 1, 2, 2, 0, } , // 2 pipes 2 bpe @ SW_64K_D @ Navi1x { 1, 2, 3, 3, 0, } , // 2 pipes 4 bpe @ SW_64K_D @ Navi1x { 1, 6, 4, 4, 0, } , // 2 pipes 8 bpe @ SW_64K_D @ Navi1x { 1, 7, 5, 5, 0, } , // 2 pipes 16 bpe @ SW_64K_D @ Navi1x { 1, 5, 1, 1, 0, } , // 4 pipes 1 bpe @ SW_64K_D @ Navi1x { 1, 1, 2, 2, 0, } , // 4 pipes 2 bpe @ SW_64K_D @ Navi1x { 1, 2, 3, 3, 0, } , // 4 pipes 4 bpe @ SW_64K_D @ Navi1x { 1, 6, 4, 4, 0, } , // 4 pipes 8 bpe @ SW_64K_D @ Navi1x { 1, 7, 5, 5, 0, } , // 4 pipes 16 bpe @ SW_64K_D @ Navi1x { 1, 5, 1, 1, 0, } , // 8 pipes 1 bpe @ SW_64K_D @ Navi1x { 1, 1, 2, 2, 0, } , // 8 pipes 2 bpe @ SW_64K_D @ Navi1x { 1, 2, 3, 3, 0, } , // 8 pipes 4 bpe @ SW_64K_D @ Navi1x { 1, 6, 4, 4, 0, } , // 8 pipes 8 bpe @ SW_64K_D @ Navi1x { 1, 7, 5, 5, 0, } , // 8 pipes 16 bpe @ SW_64K_D @ Navi1x { 1, 5, 1, 1, 0, } , // 16 pipes 1 bpe @ SW_64K_D @ Navi1x { 1, 1, 2, 2, 0, } , // 16 pipes 2 bpe @ SW_64K_D @ Navi1x { 1, 2, 3, 3, 0, } , // 16 pipes 4 bpe @ SW_64K_D @ Navi1x { 1, 6, 4, 4, 0, } , // 16 pipes 8 bpe @ SW_64K_D @ Navi1x { 1, 7, 5, 5, 0, } , // 16 pipes 16 bpe @ SW_64K_D @ Navi1x { 1, 5, 1, 1, 0, } , // 32 pipes 1 bpe @ SW_64K_D @ Navi1x { 1, 1, 2, 2, 0, } , // 32 pipes 2 bpe @ SW_64K_D @ Navi1x { 1, 2, 3, 3, 0, } , // 32 pipes 4 bpe @ SW_64K_D @ Navi1x { 1, 6, 4, 4, 0, } , // 32 pipes 8 bpe @ SW_64K_D @ Navi1x { 1, 7, 5, 5, 0, } , // 32 pipes 16 bpe @ SW_64K_D @ Navi1x { 1, 5, 1, 1, 0, } , // 64 pipes 1 bpe @ SW_64K_D @ Navi1x { 1, 1, 2, 2, 0, } , // 64 pipes 2 bpe @ SW_64K_D @ Navi1x { 1, 2, 3, 3, 0, } , // 64 pipes 4 bpe @ SW_64K_D @ Navi1x { 1, 6, 4, 4, 0, } , // 64 pipes 8 bpe @ SW_64K_D @ Navi1x { 1, 7, 5, 5, 0, } , // 64 pipes 16 bpe @ SW_64K_D @ Navi1x }; const ADDR_SW_PATINFO SW_64K_S_T_PATINFO[] = { { 1, 0, 1, 1, 0, } , // 1 pipes 1 bpe @ SW_64K_S_T @ Navi1x { 1, 1, 2, 2, 0, } , // 1 pipes 2 bpe @ SW_64K_S_T @ Navi1x { 1, 2, 3, 3, 0, } , // 1 pipes 4 bpe @ SW_64K_S_T @ Navi1x { 1, 3, 4, 4, 0, } , // 1 pipes 8 bpe @ SW_64K_S_T @ Navi1x { 1, 4, 5, 5, 0, } , // 1 pipes 16 bpe @ SW_64K_S_T @ Navi1x { 2, 0, 36, 1, 0, } , // 2 pipes 1 bpe @ SW_64K_S_T @ Navi1x { 2, 1, 37, 2, 0, } , // 2 pipes 2 bpe @ SW_64K_S_T @ Navi1x { 2, 2, 38, 3, 0, } , // 2 pipes 4 bpe @ SW_64K_S_T @ Navi1x { 2, 3, 39, 4, 0, } , // 2 pipes 8 bpe @ SW_64K_S_T @ Navi1x { 2, 4, 40, 5, 0, } , // 2 pipes 16 bpe @ SW_64K_S_T @ Navi1x { 2, 0, 41, 1, 0, } , // 4 pipes 1 bpe @ SW_64K_S_T @ Navi1x { 2, 1, 42, 2, 0, } , // 4 pipes 2 bpe @ SW_64K_S_T @ Navi1x { 2, 2, 43, 3, 0, } , // 4 pipes 4 bpe @ SW_64K_S_T @ Navi1x { 2, 3, 44, 4, 0, } , // 4 pipes 8 bpe @ SW_64K_S_T @ Navi1x { 2, 4, 45, 5, 0, } , // 4 pipes 16 bpe @ SW_64K_S_T @ Navi1x { 2, 0, 46, 1, 0, } , // 8 pipes 1 bpe @ SW_64K_S_T @ Navi1x { 2, 1, 47, 2, 0, } , // 8 pipes 2 bpe @ SW_64K_S_T @ Navi1x { 2, 2, 48, 3, 0, } , // 8 pipes 4 bpe @ SW_64K_S_T @ Navi1x { 2, 3, 49, 4, 0, } , // 8 pipes 8 bpe @ SW_64K_S_T @ Navi1x { 2, 4, 50, 5, 0, } , // 8 pipes 16 bpe @ SW_64K_S_T @ Navi1x { 2, 0, 51, 1, 0, } , // 16 pipes 1 bpe @ SW_64K_S_T @ Navi1x { 2, 1, 52, 2, 0, } , // 16 pipes 2 bpe @ SW_64K_S_T @ Navi1x { 2, 2, 53, 3, 0, } , // 16 pipes 4 bpe @ SW_64K_S_T @ Navi1x { 2, 3, 54, 4, 0, } , // 16 pipes 8 bpe @ SW_64K_S_T @ Navi1x { 2, 4, 55, 5, 0, } , // 16 pipes 16 bpe @ SW_64K_S_T @ Navi1x { 2, 0, 56, 16, 0, } , // 32 pipes 1 bpe @ SW_64K_S_T @ Navi1x { 2, 1, 57, 17, 0, } , // 32 pipes 2 bpe @ SW_64K_S_T @ Navi1x { 2, 2, 58, 18, 0, } , // 32 pipes 4 bpe @ SW_64K_S_T @ Navi1x { 2, 3, 59, 19, 0, } , // 32 pipes 8 bpe @ SW_64K_S_T @ Navi1x { 2, 4, 60, 20, 0, } , // 32 pipes 16 bpe @ SW_64K_S_T @ Navi1x { 2, 0, 1, 21, 0, } , // 64 pipes 1 bpe @ SW_64K_S_T @ Navi1x { 2, 1, 2, 22, 0, } , // 64 pipes 2 bpe @ SW_64K_S_T @ Navi1x { 2, 2, 3, 23, 0, } , // 64 pipes 4 bpe @ SW_64K_S_T @ Navi1x { 2, 3, 4, 24, 0, } , // 64 pipes 8 bpe @ SW_64K_S_T @ Navi1x { 2, 4, 5, 25, 0, } , // 64 pipes 16 bpe @ SW_64K_S_T @ Navi1x }; const ADDR_SW_PATINFO SW_64K_D_T_PATINFO[] = { { 1, 5, 1, 1, 0, } , // 1 pipes 1 bpe @ SW_64K_D_T @ Navi1x { 1, 1, 2, 2, 0, } , // 1 pipes 2 bpe @ SW_64K_D_T @ Navi1x { 1, 2, 3, 3, 0, } , // 1 pipes 4 bpe @ SW_64K_D_T @ Navi1x { 1, 6, 4, 4, 0, } , // 1 pipes 8 bpe @ SW_64K_D_T @ Navi1x { 1, 7, 5, 5, 0, } , // 1 pipes 16 bpe @ SW_64K_D_T @ Navi1x { 2, 5, 36, 1, 0, } , // 2 pipes 1 bpe @ SW_64K_D_T @ Navi1x { 2, 1, 37, 2, 0, } , // 2 pipes 2 bpe @ SW_64K_D_T @ Navi1x { 2, 2, 38, 3, 0, } , // 2 pipes 4 bpe @ SW_64K_D_T @ Navi1x { 2, 6, 39, 4, 0, } , // 2 pipes 8 bpe @ SW_64K_D_T @ Navi1x { 2, 7, 40, 5, 0, } , // 2 pipes 16 bpe @ SW_64K_D_T @ Navi1x { 2, 5, 41, 1, 0, } , // 4 pipes 1 bpe @ SW_64K_D_T @ Navi1x { 2, 1, 42, 2, 0, } , // 4 pipes 2 bpe @ SW_64K_D_T @ Navi1x { 2, 2, 43, 3, 0, } , // 4 pipes 4 bpe @ SW_64K_D_T @ Navi1x { 2, 6, 44, 4, 0, } , // 4 pipes 8 bpe @ SW_64K_D_T @ Navi1x { 2, 7, 45, 5, 0, } , // 4 pipes 16 bpe @ SW_64K_D_T @ Navi1x { 2, 5, 46, 1, 0, } , // 8 pipes 1 bpe @ SW_64K_D_T @ Navi1x { 2, 1, 47, 2, 0, } , // 8 pipes 2 bpe @ SW_64K_D_T @ Navi1x { 2, 2, 48, 3, 0, } , // 8 pipes 4 bpe @ SW_64K_D_T @ Navi1x { 2, 6, 49, 4, 0, } , // 8 pipes 8 bpe @ SW_64K_D_T @ Navi1x { 2, 7, 50, 5, 0, } , // 8 pipes 16 bpe @ SW_64K_D_T @ Navi1x { 2, 5, 51, 1, 0, } , // 16 pipes 1 bpe @ SW_64K_D_T @ Navi1x { 2, 1, 52, 2, 0, } , // 16 pipes 2 bpe @ SW_64K_D_T @ Navi1x { 2, 2, 53, 3, 0, } , // 16 pipes 4 bpe @ SW_64K_D_T @ Navi1x { 2, 6, 54, 4, 0, } , // 16 pipes 8 bpe @ SW_64K_D_T @ Navi1x { 2, 7, 55, 5, 0, } , // 16 pipes 16 bpe @ SW_64K_D_T @ Navi1x { 2, 5, 56, 16, 0, } , // 32 pipes 1 bpe @ SW_64K_D_T @ Navi1x { 2, 1, 57, 17, 0, } , // 32 pipes 2 bpe @ SW_64K_D_T @ Navi1x { 2, 2, 58, 18, 0, } , // 32 pipes 4 bpe @ SW_64K_D_T @ Navi1x { 2, 6, 59, 19, 0, } , // 32 pipes 8 bpe @ SW_64K_D_T @ Navi1x { 2, 7, 60, 20, 0, } , // 32 pipes 16 bpe @ SW_64K_D_T @ Navi1x { 2, 5, 1, 21, 0, } , // 64 pipes 1 bpe @ SW_64K_D_T @ Navi1x { 2, 1, 2, 22, 0, } , // 64 pipes 2 bpe @ SW_64K_D_T @ Navi1x { 2, 2, 3, 23, 0, } , // 64 pipes 4 bpe @ SW_64K_D_T @ Navi1x { 2, 6, 4, 24, 0, } , // 64 pipes 8 bpe @ SW_64K_D_T @ Navi1x { 2, 7, 5, 25, 0, } , // 64 pipes 16 bpe @ SW_64K_D_T @ Navi1x }; const ADDR_SW_PATINFO SW_64K_S_X_PATINFO[] = { { 1, 0, 1, 1, 0, } , // 1 pipes 1 bpe @ SW_64K_S_X @ Navi1x { 1, 1, 2, 2, 0, } , // 1 pipes 2 bpe @ SW_64K_S_X @ Navi1x { 1, 2, 3, 3, 0, } , // 1 pipes 4 bpe @ SW_64K_S_X @ Navi1x { 1, 3, 4, 4, 0, } , // 1 pipes 8 bpe @ SW_64K_S_X @ Navi1x { 1, 4, 5, 5, 0, } , // 1 pipes 16 bpe @ SW_64K_S_X @ Navi1x { 3, 0, 6, 1, 0, } , // 2 pipes 1 bpe @ SW_64K_S_X @ Navi1x { 3, 1, 7, 2, 0, } , // 2 pipes 2 bpe @ SW_64K_S_X @ Navi1x { 3, 2, 8, 3, 0, } , // 2 pipes 4 bpe @ SW_64K_S_X @ Navi1x { 3, 3, 9, 4, 0, } , // 2 pipes 8 bpe @ SW_64K_S_X @ Navi1x { 3, 4, 10, 5, 0, } , // 2 pipes 16 bpe @ SW_64K_S_X @ Navi1x { 3, 0, 11, 1, 0, } , // 4 pipes 1 bpe @ SW_64K_S_X @ Navi1x { 3, 1, 12, 2, 0, } , // 4 pipes 2 bpe @ SW_64K_S_X @ Navi1x { 3, 2, 13, 3, 0, } , // 4 pipes 4 bpe @ SW_64K_S_X @ Navi1x { 3, 3, 14, 4, 0, } , // 4 pipes 8 bpe @ SW_64K_S_X @ Navi1x { 3, 4, 15, 5, 0, } , // 4 pipes 16 bpe @ SW_64K_S_X @ Navi1x { 3, 0, 16, 1, 0, } , // 8 pipes 1 bpe @ SW_64K_S_X @ Navi1x { 3, 1, 17, 2, 0, } , // 8 pipes 2 bpe @ SW_64K_S_X @ Navi1x { 3, 2, 18, 3, 0, } , // 8 pipes 4 bpe @ SW_64K_S_X @ Navi1x { 3, 3, 19, 4, 0, } , // 8 pipes 8 bpe @ SW_64K_S_X @ Navi1x { 3, 4, 20, 5, 0, } , // 8 pipes 16 bpe @ SW_64K_S_X @ Navi1x { 3, 0, 21, 1, 0, } , // 16 pipes 1 bpe @ SW_64K_S_X @ Navi1x { 3, 1, 22, 2, 0, } , // 16 pipes 2 bpe @ SW_64K_S_X @ Navi1x { 3, 2, 23, 3, 0, } , // 16 pipes 4 bpe @ SW_64K_S_X @ Navi1x { 3, 3, 24, 4, 0, } , // 16 pipes 8 bpe @ SW_64K_S_X @ Navi1x { 3, 4, 25, 5, 0, } , // 16 pipes 16 bpe @ SW_64K_S_X @ Navi1x { 3, 0, 26, 6, 0, } , // 32 pipes 1 bpe @ SW_64K_S_X @ Navi1x { 3, 1, 27, 7, 0, } , // 32 pipes 2 bpe @ SW_64K_S_X @ Navi1x { 3, 2, 28, 8, 0, } , // 32 pipes 4 bpe @ SW_64K_S_X @ Navi1x { 3, 3, 29, 9, 0, } , // 32 pipes 8 bpe @ SW_64K_S_X @ Navi1x { 3, 4, 30, 10, 0, } , // 32 pipes 16 bpe @ SW_64K_S_X @ Navi1x { 3, 0, 31, 11, 0, } , // 64 pipes 1 bpe @ SW_64K_S_X @ Navi1x { 3, 1, 32, 12, 0, } , // 64 pipes 2 bpe @ SW_64K_S_X @ Navi1x { 3, 2, 33, 13, 0, } , // 64 pipes 4 bpe @ SW_64K_S_X @ Navi1x { 3, 3, 34, 14, 0, } , // 64 pipes 8 bpe @ SW_64K_S_X @ Navi1x { 3, 4, 35, 15, 0, } , // 64 pipes 16 bpe @ SW_64K_S_X @ Navi1x }; const ADDR_SW_PATINFO SW_64K_D_X_PATINFO[] = { { 1, 5, 1, 1, 0, } , // 1 pipes 1 bpe @ SW_64K_D_X @ Navi1x { 1, 1, 2, 2, 0, } , // 1 pipes 2 bpe @ SW_64K_D_X @ Navi1x { 1, 2, 3, 3, 0, } , // 1 pipes 4 bpe @ SW_64K_D_X @ Navi1x { 1, 6, 4, 4, 0, } , // 1 pipes 8 bpe @ SW_64K_D_X @ Navi1x { 1, 7, 5, 5, 0, } , // 1 pipes 16 bpe @ SW_64K_D_X @ Navi1x { 3, 5, 6, 1, 0, } , // 2 pipes 1 bpe @ SW_64K_D_X @ Navi1x { 3, 1, 7, 2, 0, } , // 2 pipes 2 bpe @ SW_64K_D_X @ Navi1x { 3, 2, 8, 3, 0, } , // 2 pipes 4 bpe @ SW_64K_D_X @ Navi1x { 3, 6, 9, 4, 0, } , // 2 pipes 8 bpe @ SW_64K_D_X @ Navi1x { 3, 7, 10, 5, 0, } , // 2 pipes 16 bpe @ SW_64K_D_X @ Navi1x { 3, 5, 11, 1, 0, } , // 4 pipes 1 bpe @ SW_64K_D_X @ Navi1x { 3, 1, 12, 2, 0, } , // 4 pipes 2 bpe @ SW_64K_D_X @ Navi1x { 3, 2, 13, 3, 0, } , // 4 pipes 4 bpe @ SW_64K_D_X @ Navi1x { 3, 6, 14, 4, 0, } , // 4 pipes 8 bpe @ SW_64K_D_X @ Navi1x { 3, 7, 15, 5, 0, } , // 4 pipes 16 bpe @ SW_64K_D_X @ Navi1x { 3, 5, 16, 1, 0, } , // 8 pipes 1 bpe @ SW_64K_D_X @ Navi1x { 3, 1, 17, 2, 0, } , // 8 pipes 2 bpe @ SW_64K_D_X @ Navi1x { 3, 2, 18, 3, 0, } , // 8 pipes 4 bpe @ SW_64K_D_X @ Navi1x { 3, 6, 19, 4, 0, } , // 8 pipes 8 bpe @ SW_64K_D_X @ Navi1x { 3, 7, 20, 5, 0, } , // 8 pipes 16 bpe @ SW_64K_D_X @ Navi1x { 3, 5, 21, 1, 0, } , // 16 pipes 1 bpe @ SW_64K_D_X @ Navi1x { 3, 1, 22, 2, 0, } , // 16 pipes 2 bpe @ SW_64K_D_X @ Navi1x { 3, 2, 23, 3, 0, } , // 16 pipes 4 bpe @ SW_64K_D_X @ Navi1x { 3, 6, 24, 4, 0, } , // 16 pipes 8 bpe @ SW_64K_D_X @ Navi1x { 3, 7, 25, 5, 0, } , // 16 pipes 16 bpe @ SW_64K_D_X @ Navi1x { 3, 5, 26, 6, 0, } , // 32 pipes 1 bpe @ SW_64K_D_X @ Navi1x { 3, 1, 27, 7, 0, } , // 32 pipes 2 bpe @ SW_64K_D_X @ Navi1x { 3, 2, 28, 8, 0, } , // 32 pipes 4 bpe @ SW_64K_D_X @ Navi1x { 3, 6, 29, 9, 0, } , // 32 pipes 8 bpe @ SW_64K_D_X @ Navi1x { 3, 7, 30, 10, 0, } , // 32 pipes 16 bpe @ SW_64K_D_X @ Navi1x { 3, 5, 31, 11, 0, } , // 64 pipes 1 bpe @ SW_64K_D_X @ Navi1x { 3, 1, 32, 12, 0, } , // 64 pipes 2 bpe @ SW_64K_D_X @ Navi1x { 3, 2, 33, 13, 0, } , // 64 pipes 4 bpe @ SW_64K_D_X @ Navi1x { 3, 6, 34, 14, 0, } , // 64 pipes 8 bpe @ SW_64K_D_X @ Navi1x { 3, 7, 35, 15, 0, } , // 64 pipes 16 bpe @ SW_64K_D_X @ Navi1x }; const ADDR_SW_PATINFO SW_64K_R_X_1xaa_PATINFO[] = { { 1, 5, 1, 1, 0, } , // 1 pipes 1 bpe @ SW_64K_R_X 1xaa @ Navi1x { 1, 1, 2, 2, 0, } , // 1 pipes 2 bpe @ SW_64K_R_X 1xaa @ Navi1x { 1, 2, 3, 3, 0, } , // 1 pipes 4 bpe @ SW_64K_R_X 1xaa @ Navi1x { 1, 6, 4, 4, 0, } , // 1 pipes 8 bpe @ SW_64K_R_X 1xaa @ Navi1x { 1, 7, 5, 5, 0, } , // 1 pipes 16 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 28, 61, 1, 0, } , // 2 pipes 1 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 1, 62, 2, 0, } , // 2 pipes 2 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 2, 8, 3, 0, } , // 2 pipes 4 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 6, 63, 4, 0, } , // 2 pipes 8 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 7, 64, 5, 0, } , // 2 pipes 16 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 28, 65, 1, 0, } , // 4 pipes 1 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 1, 66, 2, 0, } , // 4 pipes 2 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 2, 67, 3, 0, } , // 4 pipes 4 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 6, 68, 4, 0, } , // 4 pipes 8 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 7, 69, 26, 0, } , // 4 pipes 16 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 28, 70, 1, 0, } , // 8 pipes 1 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 1, 71, 2, 0, } , // 8 pipes 2 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 2, 72, 27, 0, } , // 8 pipes 4 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 6, 72, 28, 0, } , // 8 pipes 8 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 7, 73, 29, 0, } , // 8 pipes 16 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 28, 74, 1, 0, } , // 16 pipes 1 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 1, 74, 30, 0, } , // 16 pipes 2 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 2, 74, 31, 0, } , // 16 pipes 4 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 6, 74, 32, 0, } , // 16 pipes 8 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 7, 74, 33, 0, } , // 16 pipes 16 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 28, 75, 6, 0, } , // 32 pipes 1 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 1, 75, 34, 0, } , // 32 pipes 2 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 2, 75, 35, 0, } , // 32 pipes 4 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 6, 75, 36, 0, } , // 32 pipes 8 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 7, 76, 37, 0, } , // 32 pipes 16 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 28, 77, 11, 0, } , // 64 pipes 1 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 1, 77, 38, 0, } , // 64 pipes 2 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 2, 77, 39, 0, } , // 64 pipes 4 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 6, 78, 40, 0, } , // 64 pipes 8 bpe @ SW_64K_R_X 1xaa @ Navi1x { 3, 7, 79, 41, 0, } , // 64 pipes 16 bpe @ SW_64K_R_X 1xaa @ Navi1x }; const ADDR_SW_PATINFO SW_64K_R_X_2xaa_PATINFO[] = { { 2, 5, 1, 99, 0, } , // 1 pipes 1 bpe @ SW_64K_R_X 2xaa @ Navi1x { 2, 1, 2, 100, 0, } , // 1 pipes 2 bpe @ SW_64K_R_X 2xaa @ Navi1x { 2, 2, 3, 101, 0, } , // 1 pipes 4 bpe @ SW_64K_R_X 2xaa @ Navi1x { 2, 6, 4, 102, 0, } , // 1 pipes 8 bpe @ SW_64K_R_X 2xaa @ Navi1x { 2, 7, 5, 103, 0, } , // 1 pipes 16 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 28, 61, 99, 0, } , // 2 pipes 1 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 1, 62, 100, 0, } , // 2 pipes 2 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 2, 8, 101, 0, } , // 2 pipes 4 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 6, 63, 102, 0, } , // 2 pipes 8 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 7, 64, 103, 0, } , // 2 pipes 16 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 28, 65, 99, 0, } , // 4 pipes 1 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 1, 66, 100, 0, } , // 4 pipes 2 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 2, 67, 101, 0, } , // 4 pipes 4 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 6, 68, 102, 0, } , // 4 pipes 8 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 7, 69, 104, 0, } , // 4 pipes 16 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 28, 70, 99, 0, } , // 8 pipes 1 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 1, 71, 100, 0, } , // 8 pipes 2 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 2, 72, 105, 0, } , // 8 pipes 4 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 6, 72, 106, 0, } , // 8 pipes 8 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 7, 73, 107, 0, } , // 8 pipes 16 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 28, 74, 99, 0, } , // 16 pipes 1 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 1, 74, 108, 0, } , // 16 pipes 2 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 2, 74, 109, 0, } , // 16 pipes 4 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 6, 74, 107, 0, } , // 16 pipes 8 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 7, 113, 33, 0, } , // 16 pipes 16 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 28, 75, 110, 0, } , // 32 pipes 1 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 1, 75, 111, 0, } , // 32 pipes 2 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 2, 75, 112, 0, } , // 32 pipes 4 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 6, 76, 113, 0, } , // 32 pipes 8 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 7, 114, 37, 0, } , // 32 pipes 16 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 28, 78, 114, 0, } , // 64 pipes 1 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 1, 78, 115, 0, } , // 64 pipes 2 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 2, 78, 116, 0, } , // 64 pipes 4 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 6, 79, 117, 0, } , // 64 pipes 8 bpe @ SW_64K_R_X 2xaa @ Navi1x { 3, 7, 115, 41, 0, } , // 64 pipes 16 bpe @ SW_64K_R_X 2xaa @ Navi1x }; const ADDR_SW_PATINFO SW_64K_R_X_4xaa_PATINFO[] = { { 2, 5, 1, 118, 0, } , // 1 pipes 1 bpe @ SW_64K_R_X 4xaa @ Navi1x { 2, 1, 2, 119, 0, } , // 1 pipes 2 bpe @ SW_64K_R_X 4xaa @ Navi1x { 2, 2, 3, 120, 0, } , // 1 pipes 4 bpe @ SW_64K_R_X 4xaa @ Navi1x { 2, 6, 4, 121, 0, } , // 1 pipes 8 bpe @ SW_64K_R_X 4xaa @ Navi1x { 2, 7, 5, 122, 0, } , // 1 pipes 16 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 28, 61, 118, 0, } , // 2 pipes 1 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 1, 62, 119, 0, } , // 2 pipes 2 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 2, 8, 120, 0, } , // 2 pipes 4 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 6, 63, 121, 0, } , // 2 pipes 8 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 7, 64, 122, 0, } , // 2 pipes 16 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 28, 65, 118, 0, } , // 4 pipes 1 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 1, 66, 119, 0, } , // 4 pipes 2 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 2, 67, 120, 0, } , // 4 pipes 4 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 6, 68, 121, 0, } , // 4 pipes 8 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 7, 69, 123, 0, } , // 4 pipes 16 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 28, 70, 118, 0, } , // 8 pipes 1 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 1, 71, 119, 0, } , // 8 pipes 2 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 2, 72, 124, 0, } , // 8 pipes 4 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 6, 93, 125, 0, } , // 8 pipes 8 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 7, 116, 107, 0, } , // 8 pipes 16 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 28, 74, 118, 0, } , // 16 pipes 1 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 1, 74, 126, 0, } , // 16 pipes 2 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 2, 74, 127, 0, } , // 16 pipes 4 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 6, 117, 107, 0, } , // 16 pipes 8 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 7, 118, 33, 0, } , // 16 pipes 16 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 28, 76, 128, 0, } , // 32 pipes 1 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 1, 76, 129, 0, } , // 32 pipes 2 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 2, 76, 130, 0, } , // 32 pipes 4 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 6, 119, 113, 0, } , // 32 pipes 8 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 7, 120, 37, 0, } , // 32 pipes 16 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 28, 79, 131, 0, } , // 64 pipes 1 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 1, 79, 132, 0, } , // 64 pipes 2 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 2, 79, 133, 0, } , // 64 pipes 4 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 6, 121, 117, 0, } , // 64 pipes 8 bpe @ SW_64K_R_X 4xaa @ Navi1x { 3, 7, 122, 41, 0, } , // 64 pipes 16 bpe @ SW_64K_R_X 4xaa @ Navi1x }; const ADDR_SW_PATINFO SW_64K_R_X_8xaa_PATINFO[] = { { 2, 5, 1, 134, 0, } , // 1 pipes 1 bpe @ SW_64K_R_X 8xaa @ Navi1x { 2, 1, 2, 135, 0, } , // 1 pipes 2 bpe @ SW_64K_R_X 8xaa @ Navi1x { 2, 2, 3, 135, 0, } , // 1 pipes 4 bpe @ SW_64K_R_X 8xaa @ Navi1x { 2, 6, 4, 136, 0, } , // 1 pipes 8 bpe @ SW_64K_R_X 8xaa @ Navi1x { 2, 7, 5, 136, 0, } , // 1 pipes 16 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 28, 61, 134, 0, } , // 2 pipes 1 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 1, 62, 135, 0, } , // 2 pipes 2 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 2, 8, 135, 0, } , // 2 pipes 4 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 6, 63, 136, 0, } , // 2 pipes 8 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 7, 64, 136, 0, } , // 2 pipes 16 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 28, 65, 134, 0, } , // 4 pipes 1 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 1, 66, 135, 0, } , // 4 pipes 2 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 2, 67, 135, 0, } , // 4 pipes 4 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 6, 68, 136, 0, } , // 4 pipes 8 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 7, 102, 137, 0, } , // 4 pipes 16 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 28, 70, 134, 0, } , // 8 pipes 1 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 1, 71, 135, 0, } , // 8 pipes 2 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 2, 72, 138, 0, } , // 8 pipes 4 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 6, 123, 139, 0, } , // 8 pipes 8 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 7, 124, 140, 0, } , // 8 pipes 16 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 28, 105, 134, 0, } , // 16 pipes 1 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 1, 105, 138, 0, } , // 16 pipes 2 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 2, 125, 127, 0, } , // 16 pipes 4 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 6, 126, 107, 0, } , // 16 pipes 8 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 7, 126, 141, 0, } , // 16 pipes 16 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 28, 107, 142, 0, } , // 32 pipes 1 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 1, 108, 143, 0, } , // 32 pipes 2 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 2, 127, 130, 0, } , // 32 pipes 4 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 6, 128, 113, 0, } , // 32 pipes 8 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 7, 128, 144, 0, } , // 32 pipes 16 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 28, 110, 145, 0, } , // 64 pipes 1 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 1, 111, 146, 0, } , // 64 pipes 2 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 2, 129, 133, 0, } , // 64 pipes 4 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 6, 130, 117, 0, } , // 64 pipes 8 bpe @ SW_64K_R_X 8xaa @ Navi1x { 3, 7, 130, 147, 0, } , // 64 pipes 16 bpe @ SW_64K_R_X 8xaa @ Navi1x }; const ADDR_SW_PATINFO SW_64K_Z_X_1xaa_PATINFO[] = { { 1, 8, 1, 1, 0, } , // 1 pipes 1 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 1, 9, 2, 2, 0, } , // 1 pipes 2 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 1, 10, 3, 3, 0, } , // 1 pipes 4 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 1, 11, 4, 4, 0, } , // 1 pipes 8 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 1, 7, 5, 5, 0, } , // 1 pipes 16 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 12, 61, 1, 0, } , // 2 pipes 1 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 9, 62, 2, 0, } , // 2 pipes 2 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 10, 8, 3, 0, } , // 2 pipes 4 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 11, 63, 4, 0, } , // 2 pipes 8 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 7, 64, 5, 0, } , // 2 pipes 16 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 12, 65, 1, 0, } , // 4 pipes 1 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 9, 66, 2, 0, } , // 4 pipes 2 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 10, 67, 3, 0, } , // 4 pipes 4 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 11, 68, 4, 0, } , // 4 pipes 8 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 7, 69, 26, 0, } , // 4 pipes 16 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 12, 70, 1, 0, } , // 8 pipes 1 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 9, 71, 2, 0, } , // 8 pipes 2 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 10, 72, 27, 0, } , // 8 pipes 4 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 11, 72, 28, 0, } , // 8 pipes 8 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 7, 73, 29, 0, } , // 8 pipes 16 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 12, 74, 1, 0, } , // 16 pipes 1 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 9, 74, 30, 0, } , // 16 pipes 2 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 10, 74, 31, 0, } , // 16 pipes 4 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 11, 74, 32, 0, } , // 16 pipes 8 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 7, 74, 33, 0, } , // 16 pipes 16 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 12, 75, 6, 0, } , // 32 pipes 1 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 9, 75, 34, 0, } , // 32 pipes 2 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 10, 75, 35, 0, } , // 32 pipes 4 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 11, 75, 36, 0, } , // 32 pipes 8 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 7, 76, 37, 0, } , // 32 pipes 16 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 12, 77, 11, 0, } , // 64 pipes 1 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 9, 77, 38, 0, } , // 64 pipes 2 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 10, 77, 39, 0, } , // 64 pipes 4 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 11, 78, 40, 0, } , // 64 pipes 8 bpe @ SW_64K_Z_X 1xaa @ Navi1x { 3, 7, 79, 41, 0, } , // 64 pipes 16 bpe @ SW_64K_Z_X 1xaa @ Navi1x }; const ADDR_SW_PATINFO SW_64K_Z_X_2xaa_PATINFO[] = { { 1, 13, 80, 42, 0, } , // 1 pipes 1 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 1, 14, 3, 3, 0, } , // 1 pipes 2 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 2, 15, 3, 43, 0, } , // 1 pipes 4 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 2, 16, 81, 44, 0, } , // 1 pipes 8 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 2, 17, 5, 45, 0, } , // 1 pipes 16 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 13, 82, 42, 0, } , // 2 pipes 1 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 14, 8, 3, 0, } , // 2 pipes 2 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 15, 8, 43, 0, } , // 2 pipes 4 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 16, 83, 44, 0, } , // 2 pipes 8 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 17, 64, 45, 0, } , // 2 pipes 16 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 13, 84, 42, 0, } , // 4 pipes 1 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 14, 67, 3, 0, } , // 4 pipes 2 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 15, 67, 43, 0, } , // 4 pipes 4 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 16, 85, 44, 0, } , // 4 pipes 8 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 17, 69, 46, 0, } , // 4 pipes 16 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 13, 86, 42, 0, } , // 8 pipes 1 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 14, 72, 27, 0, } , // 8 pipes 2 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 15, 72, 47, 0, } , // 8 pipes 4 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 16, 73, 48, 0, } , // 8 pipes 8 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 17, 73, 49, 0, } , // 8 pipes 16 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 13, 74, 50, 0, } , // 16 pipes 1 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 14, 74, 31, 0, } , // 16 pipes 2 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 15, 74, 51, 0, } , // 16 pipes 4 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 16, 74, 52, 0, } , // 16 pipes 8 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 17, 87, 53, 0, } , // 16 pipes 16 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 13, 75, 54, 0, } , // 32 pipes 1 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 14, 75, 35, 0, } , // 32 pipes 2 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 15, 75, 55, 0, } , // 32 pipes 4 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 16, 76, 56, 0, } , // 32 pipes 8 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 17, 88, 57, 0, } , // 32 pipes 16 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 13, 78, 58, 0, } , // 64 pipes 1 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 14, 78, 59, 0, } , // 64 pipes 2 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 15, 78, 60, 0, } , // 64 pipes 4 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 16, 79, 41, 0, } , // 64 pipes 8 bpe @ SW_64K_Z_X 2xaa @ Navi1x { 3, 17, 89, 61, 0, } , // 64 pipes 16 bpe @ SW_64K_Z_X 2xaa @ Navi1x }; const ADDR_SW_PATINFO SW_64K_Z_X_4xaa_PATINFO[] = { { 1, 18, 3, 3, 0, } , // 1 pipes 1 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 2, 19, 90, 62, 0, } , // 1 pipes 2 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 2, 20, 3, 63, 0, } , // 1 pipes 4 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 2, 21, 4, 64, 0, } , // 1 pipes 8 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 2, 22, 5, 65, 0, } , // 1 pipes 16 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 18, 8, 3, 0, } , // 2 pipes 1 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 19, 91, 62, 0, } , // 2 pipes 2 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 20, 8, 66, 0, } , // 2 pipes 4 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 21, 63, 67, 0, } , // 2 pipes 8 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 22, 64, 68, 0, } , // 2 pipes 16 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 18, 67, 3, 0, } , // 4 pipes 1 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 19, 92, 62, 0, } , // 4 pipes 2 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 20, 67, 63, 0, } , // 4 pipes 4 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 21, 68, 64, 0, } , // 4 pipes 8 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 22, 69, 69, 0, } , // 4 pipes 16 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 18, 72, 27, 0, } , // 8 pipes 1 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 19, 72, 70, 0, } , // 8 pipes 2 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 20, 72, 71, 0, } , // 8 pipes 4 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 21, 93, 72, 0, } , // 8 pipes 8 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 22, 94, 73, 0, } , // 8 pipes 16 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 18, 74, 31, 0, } , // 16 pipes 1 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 19, 74, 74, 0, } , // 16 pipes 2 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 20, 74, 75, 0, } , // 16 pipes 4 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 21, 95, 76, 0, } , // 16 pipes 8 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 22, 96, 76, 0, } , // 16 pipes 16 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 18, 76, 77, 0, } , // 32 pipes 1 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 19, 76, 78, 0, } , // 32 pipes 2 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 20, 76, 56, 0, } , // 32 pipes 4 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 21, 97, 79, 0, } , // 32 pipes 8 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 22, 98, 79, 0, } , // 32 pipes 16 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 18, 79, 80, 0, } , // 64 pipes 1 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 19, 79, 81, 0, } , // 64 pipes 2 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 20, 79, 41, 0, } , // 64 pipes 4 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 21, 99, 82, 0, } , // 64 pipes 8 bpe @ SW_64K_Z_X 4xaa @ Navi1x { 3, 22, 100, 82, 0, } , // 64 pipes 16 bpe @ SW_64K_Z_X 4xaa @ Navi1x }; const ADDR_SW_PATINFO SW_64K_Z_X_8xaa_PATINFO[] = { { 2, 23, 3, 43, 0, } , // 1 pipes 1 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 2, 24, 3, 63, 0, } , // 1 pipes 2 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 2, 25, 3, 83, 0, } , // 1 pipes 4 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 2, 26, 81, 84, 0, } , // 1 pipes 8 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 2, 27, 5, 85, 0, } , // 1 pipes 16 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 23, 8, 43, 0, } , // 2 pipes 1 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 24, 8, 66, 0, } , // 2 pipes 2 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 25, 8, 86, 0, } , // 2 pipes 4 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 26, 101, 87, 0, } , // 2 pipes 8 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 27, 64, 88, 0, } , // 2 pipes 16 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 23, 67, 43, 0, } , // 4 pipes 1 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 24, 67, 63, 0, } , // 4 pipes 2 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 25, 67, 83, 0, } , // 4 pipes 4 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 26, 85, 84, 0, } , // 4 pipes 8 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 27, 102, 89, 0, } , // 4 pipes 16 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 23, 72, 47, 0, } , // 8 pipes 1 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 24, 72, 71, 0, } , // 8 pipes 2 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 25, 72, 90, 0, } , // 8 pipes 4 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 26, 103, 91, 0, } , // 8 pipes 8 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 27, 104, 92, 0, } , // 8 pipes 16 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 23, 105, 51, 0, } , // 16 pipes 1 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 24, 105, 75, 0, } , // 16 pipes 2 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 25, 87, 93, 0, } , // 16 pipes 4 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 26, 96, 76, 0, } , // 16 pipes 8 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 27, 106, 94, 0, } , // 16 pipes 16 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 23, 107, 95, 0, } , // 32 pipes 1 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 24, 108, 56, 0, } , // 32 pipes 2 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 25, 88, 57, 0, } , // 32 pipes 4 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 26, 98, 79, 0, } , // 32 pipes 8 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 27, 109, 96, 0, } , // 32 pipes 16 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 23, 110, 97, 0, } , // 64 pipes 1 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 24, 111, 41, 0, } , // 64 pipes 2 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 25, 89, 61, 0, } , // 64 pipes 4 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 26, 100, 82, 0, } , // 64 pipes 8 bpe @ SW_64K_Z_X 8xaa @ Navi1x { 3, 27, 112, 98, 0, } , // 64 pipes 16 bpe @ SW_64K_Z_X 8xaa @ Navi1x }; const ADDR_SW_PATINFO SW_64K_S3_PATINFO[] = { { 1, 29, 131, 148, 0, } , // 1 pipes 1 bpe @ SW_64K_S3 @ Navi1x { 1, 30, 132, 149, 0, } , // 1 pipes 2 bpe @ SW_64K_S3 @ Navi1x { 1, 31, 133, 150, 0, } , // 1 pipes 4 bpe @ SW_64K_S3 @ Navi1x { 1, 32, 134, 151, 0, } , // 1 pipes 8 bpe @ SW_64K_S3 @ Navi1x { 1, 33, 135, 152, 0, } , // 1 pipes 16 bpe @ SW_64K_S3 @ Navi1x { 1, 29, 131, 148, 0, } , // 2 pipes 1 bpe @ SW_64K_S3 @ Navi1x { 1, 30, 132, 149, 0, } , // 2 pipes 2 bpe @ SW_64K_S3 @ Navi1x { 1, 31, 133, 150, 0, } , // 2 pipes 4 bpe @ SW_64K_S3 @ Navi1x { 1, 32, 134, 151, 0, } , // 2 pipes 8 bpe @ SW_64K_S3 @ Navi1x { 1, 33, 135, 152, 0, } , // 2 pipes 16 bpe @ SW_64K_S3 @ Navi1x { 1, 29, 131, 148, 0, } , // 4 pipes 1 bpe @ SW_64K_S3 @ Navi1x { 1, 30, 132, 149, 0, } , // 4 pipes 2 bpe @ SW_64K_S3 @ Navi1x { 1, 31, 133, 150, 0, } , // 4 pipes 4 bpe @ SW_64K_S3 @ Navi1x { 1, 32, 134, 151, 0, } , // 4 pipes 8 bpe @ SW_64K_S3 @ Navi1x { 1, 33, 135, 152, 0, } , // 4 pipes 16 bpe @ SW_64K_S3 @ Navi1x { 1, 29, 131, 148, 0, } , // 8 pipes 1 bpe @ SW_64K_S3 @ Navi1x { 1, 30, 132, 149, 0, } , // 8 pipes 2 bpe @ SW_64K_S3 @ Navi1x { 1, 31, 133, 150, 0, } , // 8 pipes 4 bpe @ SW_64K_S3 @ Navi1x { 1, 32, 134, 151, 0, } , // 8 pipes 8 bpe @ SW_64K_S3 @ Navi1x { 1, 33, 135, 152, 0, } , // 8 pipes 16 bpe @ SW_64K_S3 @ Navi1x { 1, 29, 131, 148, 0, } , // 16 pipes 1 bpe @ SW_64K_S3 @ Navi1x { 1, 30, 132, 149, 0, } , // 16 pipes 2 bpe @ SW_64K_S3 @ Navi1x { 1, 31, 133, 150, 0, } , // 16 pipes 4 bpe @ SW_64K_S3 @ Navi1x { 1, 32, 134, 151, 0, } , // 16 pipes 8 bpe @ SW_64K_S3 @ Navi1x { 1, 33, 135, 152, 0, } , // 16 pipes 16 bpe @ SW_64K_S3 @ Navi1x { 1, 29, 131, 148, 0, } , // 32 pipes 1 bpe @ SW_64K_S3 @ Navi1x { 1, 30, 132, 149, 0, } , // 32 pipes 2 bpe @ SW_64K_S3 @ Navi1x { 1, 31, 133, 150, 0, } , // 32 pipes 4 bpe @ SW_64K_S3 @ Navi1x { 1, 32, 134, 151, 0, } , // 32 pipes 8 bpe @ SW_64K_S3 @ Navi1x { 1, 33, 135, 152, 0, } , // 32 pipes 16 bpe @ SW_64K_S3 @ Navi1x { 1, 29, 131, 148, 0, } , // 64 pipes 1 bpe @ SW_64K_S3 @ Navi1x { 1, 30, 132, 149, 0, } , // 64 pipes 2 bpe @ SW_64K_S3 @ Navi1x { 1, 31, 133, 150, 0, } , // 64 pipes 4 bpe @ SW_64K_S3 @ Navi1x { 1, 32, 134, 151, 0, } , // 64 pipes 8 bpe @ SW_64K_S3 @ Navi1x { 1, 33, 135, 152, 0, } , // 64 pipes 16 bpe @ SW_64K_S3 @ Navi1x }; const ADDR_SW_PATINFO SW_64K_S3_X_PATINFO[] = { { 1, 29, 131, 148, 0, } , // 1 pipes 1 bpe @ SW_64K_S3_X @ Navi1x { 1, 30, 132, 149, 0, } , // 1 pipes 2 bpe @ SW_64K_S3_X @ Navi1x { 1, 31, 133, 150, 0, } , // 1 pipes 4 bpe @ SW_64K_S3_X @ Navi1x { 1, 32, 134, 151, 0, } , // 1 pipes 8 bpe @ SW_64K_S3_X @ Navi1x { 1, 33, 135, 152, 0, } , // 1 pipes 16 bpe @ SW_64K_S3_X @ Navi1x { 3, 29, 136, 148, 0, } , // 2 pipes 1 bpe @ SW_64K_S3_X @ Navi1x { 3, 30, 137, 149, 0, } , // 2 pipes 2 bpe @ SW_64K_S3_X @ Navi1x { 3, 31, 138, 150, 0, } , // 2 pipes 4 bpe @ SW_64K_S3_X @ Navi1x { 3, 32, 139, 151, 0, } , // 2 pipes 8 bpe @ SW_64K_S3_X @ Navi1x { 3, 33, 140, 152, 0, } , // 2 pipes 16 bpe @ SW_64K_S3_X @ Navi1x { 3, 29, 141, 148, 0, } , // 4 pipes 1 bpe @ SW_64K_S3_X @ Navi1x { 3, 30, 142, 149, 0, } , // 4 pipes 2 bpe @ SW_64K_S3_X @ Navi1x { 3, 31, 143, 150, 0, } , // 4 pipes 4 bpe @ SW_64K_S3_X @ Navi1x { 3, 32, 144, 151, 0, } , // 4 pipes 8 bpe @ SW_64K_S3_X @ Navi1x { 3, 33, 145, 152, 0, } , // 4 pipes 16 bpe @ SW_64K_S3_X @ Navi1x { 3, 29, 146, 148, 0, } , // 8 pipes 1 bpe @ SW_64K_S3_X @ Navi1x { 3, 30, 147, 149, 0, } , // 8 pipes 2 bpe @ SW_64K_S3_X @ Navi1x { 3, 31, 148, 150, 0, } , // 8 pipes 4 bpe @ SW_64K_S3_X @ Navi1x { 3, 32, 149, 151, 0, } , // 8 pipes 8 bpe @ SW_64K_S3_X @ Navi1x { 3, 33, 150, 152, 0, } , // 8 pipes 16 bpe @ SW_64K_S3_X @ Navi1x { 3, 29, 151, 148, 0, } , // 16 pipes 1 bpe @ SW_64K_S3_X @ Navi1x { 3, 30, 152, 149, 0, } , // 16 pipes 2 bpe @ SW_64K_S3_X @ Navi1x { 3, 31, 153, 150, 0, } , // 16 pipes 4 bpe @ SW_64K_S3_X @ Navi1x { 3, 32, 154, 151, 0, } , // 16 pipes 8 bpe @ SW_64K_S3_X @ Navi1x { 3, 33, 155, 152, 0, } , // 16 pipes 16 bpe @ SW_64K_S3_X @ Navi1x { 3, 29, 156, 153, 0, } , // 32 pipes 1 bpe @ SW_64K_S3_X @ Navi1x { 3, 30, 157, 154, 0, } , // 32 pipes 2 bpe @ SW_64K_S3_X @ Navi1x { 3, 31, 158, 155, 0, } , // 32 pipes 4 bpe @ SW_64K_S3_X @ Navi1x { 3, 32, 159, 156, 0, } , // 32 pipes 8 bpe @ SW_64K_S3_X @ Navi1x { 3, 33, 160, 157, 0, } , // 32 pipes 16 bpe @ SW_64K_S3_X @ Navi1x { 3, 29, 161, 158, 0, } , // 64 pipes 1 bpe @ SW_64K_S3_X @ Navi1x { 3, 30, 162, 159, 0, } , // 64 pipes 2 bpe @ SW_64K_S3_X @ Navi1x { 3, 31, 163, 160, 0, } , // 64 pipes 4 bpe @ SW_64K_S3_X @ Navi1x { 3, 32, 164, 161, 0, } , // 64 pipes 8 bpe @ SW_64K_S3_X @ Navi1x { 3, 33, 165, 162, 0, } , // 64 pipes 16 bpe @ SW_64K_S3_X @ Navi1x }; const ADDR_SW_PATINFO SW_64K_S3_T_PATINFO[] = { { 1, 29, 131, 148, 0, } , // 1 pipes 1 bpe @ SW_64K_S3_T @ Navi1x { 1, 30, 132, 149, 0, } , // 1 pipes 2 bpe @ SW_64K_S3_T @ Navi1x { 1, 31, 133, 150, 0, } , // 1 pipes 4 bpe @ SW_64K_S3_T @ Navi1x { 1, 32, 134, 151, 0, } , // 1 pipes 8 bpe @ SW_64K_S3_T @ Navi1x { 1, 33, 135, 152, 0, } , // 1 pipes 16 bpe @ SW_64K_S3_T @ Navi1x { 3, 29, 136, 148, 0, } , // 2 pipes 1 bpe @ SW_64K_S3_T @ Navi1x { 3, 30, 137, 149, 0, } , // 2 pipes 2 bpe @ SW_64K_S3_T @ Navi1x { 3, 31, 138, 150, 0, } , // 2 pipes 4 bpe @ SW_64K_S3_T @ Navi1x { 3, 32, 139, 151, 0, } , // 2 pipes 8 bpe @ SW_64K_S3_T @ Navi1x { 3, 33, 140, 152, 0, } , // 2 pipes 16 bpe @ SW_64K_S3_T @ Navi1x { 3, 29, 141, 148, 0, } , // 4 pipes 1 bpe @ SW_64K_S3_T @ Navi1x { 3, 30, 142, 149, 0, } , // 4 pipes 2 bpe @ SW_64K_S3_T @ Navi1x { 3, 31, 143, 150, 0, } , // 4 pipes 4 bpe @ SW_64K_S3_T @ Navi1x { 3, 32, 144, 151, 0, } , // 4 pipes 8 bpe @ SW_64K_S3_T @ Navi1x { 3, 33, 145, 152, 0, } , // 4 pipes 16 bpe @ SW_64K_S3_T @ Navi1x { 3, 29, 166, 148, 0, } , // 8 pipes 1 bpe @ SW_64K_S3_T @ Navi1x { 3, 30, 167, 149, 0, } , // 8 pipes 2 bpe @ SW_64K_S3_T @ Navi1x { 3, 31, 168, 150, 0, } , // 8 pipes 4 bpe @ SW_64K_S3_T @ Navi1x { 3, 32, 169, 151, 0, } , // 8 pipes 8 bpe @ SW_64K_S3_T @ Navi1x { 3, 33, 170, 152, 0, } , // 8 pipes 16 bpe @ SW_64K_S3_T @ Navi1x { 3, 29, 171, 148, 0, } , // 16 pipes 1 bpe @ SW_64K_S3_T @ Navi1x { 3, 30, 172, 149, 0, } , // 16 pipes 2 bpe @ SW_64K_S3_T @ Navi1x { 3, 31, 173, 150, 0, } , // 16 pipes 4 bpe @ SW_64K_S3_T @ Navi1x { 3, 32, 174, 151, 0, } , // 16 pipes 8 bpe @ SW_64K_S3_T @ Navi1x { 3, 33, 175, 152, 0, } , // 16 pipes 16 bpe @ SW_64K_S3_T @ Navi1x { 3, 29, 176, 153, 0, } , // 32 pipes 1 bpe @ SW_64K_S3_T @ Navi1x { 3, 30, 177, 154, 0, } , // 32 pipes 2 bpe @ SW_64K_S3_T @ Navi1x { 3, 31, 178, 155, 0, } , // 32 pipes 4 bpe @ SW_64K_S3_T @ Navi1x { 3, 32, 179, 156, 0, } , // 32 pipes 8 bpe @ SW_64K_S3_T @ Navi1x { 3, 33, 180, 157, 0, } , // 32 pipes 16 bpe @ SW_64K_S3_T @ Navi1x { 3, 29, 131, 163, 0, } , // 64 pipes 1 bpe @ SW_64K_S3_T @ Navi1x { 3, 30, 132, 164, 0, } , // 64 pipes 2 bpe @ SW_64K_S3_T @ Navi1x { 3, 31, 133, 165, 0, } , // 64 pipes 4 bpe @ SW_64K_S3_T @ Navi1x { 3, 32, 134, 166, 0, } , // 64 pipes 8 bpe @ SW_64K_S3_T @ Navi1x { 3, 33, 135, 167, 0, } , // 64 pipes 16 bpe @ SW_64K_S3_T @ Navi1x }; const ADDR_SW_PATINFO SW_64K_D3_X_PATINFO[] = { { 1, 34, 131, 148, 0, } , // 1 pipes 1 bpe @ SW_64K_D3_X @ Navi1x { 1, 35, 132, 149, 0, } , // 1 pipes 2 bpe @ SW_64K_D3_X @ Navi1x { 1, 36, 133, 150, 0, } , // 1 pipes 4 bpe @ SW_64K_D3_X @ Navi1x { 1, 37, 134, 151, 0, } , // 1 pipes 8 bpe @ SW_64K_D3_X @ Navi1x { 1, 38, 135, 152, 0, } , // 1 pipes 16 bpe @ SW_64K_D3_X @ Navi1x { 2, 34, 181, 148, 0, } , // 2 pipes 1 bpe @ SW_64K_D3_X @ Navi1x { 2, 35, 182, 149, 0, } , // 2 pipes 2 bpe @ SW_64K_D3_X @ Navi1x { 2, 36, 183, 150, 0, } , // 2 pipes 4 bpe @ SW_64K_D3_X @ Navi1x { 2, 37, 184, 168, 0, } , // 2 pipes 8 bpe @ SW_64K_D3_X @ Navi1x { 2, 38, 185, 169, 0, } , // 2 pipes 16 bpe @ SW_64K_D3_X @ Navi1x { 2, 34, 186, 170, 0, } , // 4 pipes 1 bpe @ SW_64K_D3_X @ Navi1x { 2, 35, 186, 171, 0, } , // 4 pipes 2 bpe @ SW_64K_D3_X @ Navi1x { 2, 36, 187, 172, 0, } , // 4 pipes 4 bpe @ SW_64K_D3_X @ Navi1x { 2, 37, 188, 169, 0, } , // 4 pipes 8 bpe @ SW_64K_D3_X @ Navi1x { 3, 38, 189, 169, 0, } , // 4 pipes 16 bpe @ SW_64K_D3_X @ Navi1x { 2, 34, 190, 173, 0, } , // 8 pipes 1 bpe @ SW_64K_D3_X @ Navi1x { 3, 35, 191, 171, 0, } , // 8 pipes 2 bpe @ SW_64K_D3_X @ Navi1x { 3, 36, 192, 172, 0, } , // 8 pipes 4 bpe @ SW_64K_D3_X @ Navi1x { 3, 37, 193, 169, 0, } , // 8 pipes 8 bpe @ SW_64K_D3_X @ Navi1x { 3, 38, 194, 169, 0, } , // 8 pipes 16 bpe @ SW_64K_D3_X @ Navi1x { 3, 34, 195, 174, 0, } , // 16 pipes 1 bpe @ SW_64K_D3_X @ Navi1x { 3, 35, 196, 171, 0, } , // 16 pipes 2 bpe @ SW_64K_D3_X @ Navi1x { 3, 36, 197, 172, 0, } , // 16 pipes 4 bpe @ SW_64K_D3_X @ Navi1x { 3, 37, 198, 169, 0, } , // 16 pipes 8 bpe @ SW_64K_D3_X @ Navi1x { 3, 38, 199, 169, 0, } , // 16 pipes 16 bpe @ SW_64K_D3_X @ Navi1x { 3, 34, 200, 175, 0, } , // 32 pipes 1 bpe @ SW_64K_D3_X @ Navi1x { 3, 35, 201, 176, 0, } , // 32 pipes 2 bpe @ SW_64K_D3_X @ Navi1x { 3, 36, 202, 177, 0, } , // 32 pipes 4 bpe @ SW_64K_D3_X @ Navi1x { 3, 37, 203, 178, 0, } , // 32 pipes 8 bpe @ SW_64K_D3_X @ Navi1x { 3, 38, 204, 178, 0, } , // 32 pipes 16 bpe @ SW_64K_D3_X @ Navi1x { 3, 34, 205, 179, 0, } , // 64 pipes 1 bpe @ SW_64K_D3_X @ Navi1x { 3, 35, 206, 180, 0, } , // 64 pipes 2 bpe @ SW_64K_D3_X @ Navi1x { 3, 36, 207, 181, 0, } , // 64 pipes 4 bpe @ SW_64K_D3_X @ Navi1x { 3, 37, 208, 182, 0, } , // 64 pipes 8 bpe @ SW_64K_D3_X @ Navi1x { 3, 38, 209, 182, 0, } , // 64 pipes 16 bpe @ SW_64K_D3_X @ Navi1x }; const ADDR_SW_PATINFO SW_256_S_RBPLUS_PATINFO[] = { { 1, 0, 0, 0, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_256_S @ RbPlus { 1, 0, 0, 0, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_256_S @ RbPlus { 1, 1, 0, 0, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_256_S @ RbPlus { 1, 2, 0, 0, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_256_S @ RbPlus { 1, 3, 0, 0, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_256_S @ RbPlus { 1, 4, 0, 0, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_256_S @ RbPlus }; const ADDR_SW_PATINFO SW_256_D_RBPLUS_PATINFO[] = { { 1, 5, 0, 0, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_256_D @ RbPlus { 1, 5, 0, 0, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_256_D @ RbPlus { 1, 1, 0, 0, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_256_D @ RbPlus { 1, 39, 0, 0, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_256_D @ RbPlus { 1, 6, 0, 0, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_256_D @ RbPlus { 1, 7, 0, 0, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_256_D @ RbPlus }; const ADDR_SW_PATINFO SW_4K_S_RBPLUS_PATINFO[] = { { 1, 0, 1, 0, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_4K_S @ RbPlus { 1, 0, 1, 0, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_4K_S @ RbPlus { 1, 1, 2, 0, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_4K_S @ RbPlus { 1, 2, 3, 0, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_4K_S @ RbPlus { 1, 3, 4, 0, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_4K_S @ RbPlus { 1, 4, 5, 0, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_4K_S @ RbPlus }; const ADDR_SW_PATINFO SW_4K_D_RBPLUS_PATINFO[] = { { 1, 5, 1, 0, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_4K_D @ RbPlus { 1, 5, 1, 0, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_4K_D @ RbPlus { 1, 1, 2, 0, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_4K_D @ RbPlus { 1, 39, 3, 0, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_4K_D @ RbPlus { 1, 6, 4, 0, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_4K_D @ RbPlus { 1, 7, 5, 0, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_4K_D @ RbPlus }; const ADDR_SW_PATINFO SW_4K_S_X_RBPLUS_PATINFO[] = { { 1, 0, 1, 0, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 1, 1, 2, 0, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 1, 2, 3, 0, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 1, 3, 4, 0, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 1, 4, 5, 0, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 6, 0, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 7, 0, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 8, 0, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 9, 0, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 10, 0, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 210, 0, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 211, 0, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 212, 0, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 213, 0, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 214, 0, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 215, 0, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 216, 0, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 217, 0, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 218, 0, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 219, 0, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 11, 0, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 12, 0, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 13, 0, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 14, 0, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 15, 0, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 220, 0, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 221, 0, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 222, 0, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 223, 0, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 224, 0, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 225, 0, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 226, 0, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 227, 0, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 228, 0, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 229, 0, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 16, 0, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 17, 0, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 18, 0, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 19, 0, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 20, 0, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 230, 0, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 231, 0, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 232, 0, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 233, 0, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 234, 0, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 235, 0, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 236, 0, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 237, 0, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 238, 0, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 239, 0, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 21, 0, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 22, 0, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 23, 0, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 24, 0, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 25, 0, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 240, 0, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 241, 0, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 242, 0, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 243, 0, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 244, 0, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 245, 0, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 246, 0, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 247, 0, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 248, 0, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 249, 0, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 21, 0, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 22, 0, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 23, 0, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 24, 0, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 25, 0, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus { 3, 0, 240, 0, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_4K_S_X @ RbPlus { 3, 1, 241, 0, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_4K_S_X @ RbPlus { 3, 2, 242, 0, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_4K_S_X @ RbPlus { 3, 3, 243, 0, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_4K_S_X @ RbPlus { 3, 4, 244, 0, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_4K_S_X @ RbPlus }; const ADDR_SW_PATINFO SW_4K_D_X_RBPLUS_PATINFO[] = { { 1, 5, 1, 0, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 1, 1, 2, 0, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 1, 39, 3, 0, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 1, 6, 4, 0, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 1, 7, 5, 0, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 6, 0, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 7, 0, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 8, 0, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 9, 0, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 10, 0, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 210, 0, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 211, 0, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 212, 0, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 213, 0, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 214, 0, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 215, 0, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 216, 0, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 217, 0, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 218, 0, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 219, 0, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 11, 0, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 12, 0, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 13, 0, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 14, 0, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 15, 0, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 220, 0, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 221, 0, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 222, 0, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 223, 0, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 224, 0, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 225, 0, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 226, 0, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 227, 0, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 228, 0, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 229, 0, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 16, 0, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 17, 0, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 18, 0, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 19, 0, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 20, 0, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 230, 0, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 231, 0, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 232, 0, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 233, 0, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 234, 0, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 235, 0, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 236, 0, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 237, 0, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 238, 0, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 239, 0, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 21, 0, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 22, 0, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 23, 0, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 24, 0, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 25, 0, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 240, 0, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 241, 0, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 242, 0, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 243, 0, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 244, 0, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 245, 0, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 246, 0, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 247, 0, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 248, 0, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 249, 0, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 21, 0, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 22, 0, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 23, 0, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 24, 0, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 25, 0, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus { 3, 5, 240, 0, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_4K_D_X @ RbPlus { 3, 1, 241, 0, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_4K_D_X @ RbPlus { 3, 39, 242, 0, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_4K_D_X @ RbPlus { 3, 6, 243, 0, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_4K_D_X @ RbPlus { 3, 7, 244, 0, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_4K_D_X @ RbPlus }; const ADDR_SW_PATINFO SW_4K_S3_RBPLUS_PATINFO[] = { { 1, 29, 131, 0, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus { 1, 29, 131, 0, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_4K_S3 @ RbPlus { 1, 30, 132, 0, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_4K_S3 @ RbPlus { 1, 31, 133, 0, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_4K_S3 @ RbPlus { 1, 32, 134, 0, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_4K_S3 @ RbPlus { 1, 33, 135, 0, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_4K_S3 @ RbPlus }; const ADDR_SW_PATINFO SW_4K_S3_X_RBPLUS_PATINFO[] = { { 1, 29, 131, 0, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 1, 30, 132, 0, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 1, 31, 133, 0, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 1, 32, 134, 0, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 1, 33, 135, 0, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 136, 0, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 137, 0, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 138, 0, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 139, 0, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 140, 0, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 141, 0, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 142, 0, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 143, 0, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 144, 0, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 145, 0, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 146, 0, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 147, 0, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 148, 0, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 149, 0, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 150, 0, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 141, 0, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 142, 0, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 143, 0, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 144, 0, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 145, 0, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 146, 0, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 147, 0, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 148, 0, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 149, 0, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 150, 0, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 151, 0, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 152, 0, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 153, 0, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 154, 0, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 155, 0, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 146, 0, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 147, 0, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 148, 0, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 149, 0, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 150, 0, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 151, 0, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 152, 0, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 153, 0, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 154, 0, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 155, 0, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 151, 0, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 152, 0, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 153, 0, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 154, 0, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 155, 0, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 151, 0, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 152, 0, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 153, 0, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 154, 0, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 155, 0, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 151, 0, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 152, 0, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 153, 0, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 154, 0, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 155, 0, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 151, 0, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 152, 0, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 153, 0, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 154, 0, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 155, 0, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 151, 0, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 152, 0, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 153, 0, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 154, 0, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 155, 0, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus { 3, 29, 151, 0, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_4K_S3_X @ RbPlus { 3, 30, 152, 0, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_4K_S3_X @ RbPlus { 3, 31, 153, 0, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_4K_S3_X @ RbPlus { 3, 32, 154, 0, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_4K_S3_X @ RbPlus { 3, 33, 155, 0, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_4K_S3_X @ RbPlus }; const ADDR_SW_PATINFO SW_64K_S_RBPLUS_PATINFO[] = { { 1, 0, 1, 1, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_S @ RbPlus { 1, 0, 1, 1, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_S @ RbPlus { 1, 1, 2, 2, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_S @ RbPlus { 1, 2, 3, 3, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_S @ RbPlus { 1, 3, 4, 4, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_S @ RbPlus { 1, 4, 5, 5, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_S @ RbPlus }; const ADDR_SW_PATINFO SW_64K_D_RBPLUS_PATINFO[] = { { 1, 5, 1, 1, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_D @ RbPlus { 1, 5, 1, 1, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_D @ RbPlus { 1, 1, 2, 2, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_D @ RbPlus { 1, 39, 3, 3, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_D @ RbPlus { 1, 6, 4, 4, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_D @ RbPlus { 1, 7, 5, 5, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_D @ RbPlus }; const ADDR_SW_PATINFO SW_64K_S_T_RBPLUS_PATINFO[] = { { 1, 0, 1, 1, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 1, 1, 2, 2, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 1, 2, 3, 3, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 1, 3, 4, 4, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 1, 4, 5, 5, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 36, 1, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 37, 2, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 38, 3, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 39, 4, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 40, 5, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 41, 1, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 42, 2, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 43, 3, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 44, 4, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 45, 5, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 46, 1, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 47, 2, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 48, 3, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 49, 4, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 50, 5, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 41, 1, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 42, 2, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 43, 3, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 44, 4, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 45, 5, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 46, 1, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 47, 2, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 48, 3, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 49, 4, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 50, 5, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 51, 1, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 52, 2, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 53, 3, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 54, 4, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 55, 5, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 46, 1, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 47, 2, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 48, 3, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 49, 4, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 50, 5, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 51, 1, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 52, 2, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 53, 3, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 54, 4, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 55, 5, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 56, 16, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 57, 17, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 58, 18, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 59, 19, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 60, 20, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 51, 1, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 52, 2, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 53, 3, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 54, 4, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 55, 5, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 56, 16, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 57, 17, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 58, 18, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 59, 19, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 60, 20, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 1, 21, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 2, 22, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 3, 23, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 4, 24, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 5, 25, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 56, 16, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 57, 17, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 58, 18, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 59, 19, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 60, 20, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus { 2, 0, 1, 21, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_S_T @ RbPlus { 2, 1, 2, 22, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_S_T @ RbPlus { 2, 2, 3, 23, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_S_T @ RbPlus { 2, 3, 4, 24, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_S_T @ RbPlus { 2, 4, 5, 25, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_S_T @ RbPlus }; const ADDR_SW_PATINFO SW_64K_D_T_RBPLUS_PATINFO[] = { { 1, 5, 1, 1, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 1, 1, 2, 2, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 1, 39, 3, 3, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 1, 6, 4, 4, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 1, 7, 5, 5, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 36, 1, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 37, 2, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 38, 3, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 39, 4, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 40, 5, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 41, 1, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 42, 2, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 43, 3, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 44, 4, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 45, 5, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 46, 1, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 47, 2, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 48, 3, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 49, 4, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 50, 5, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 41, 1, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 42, 2, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 43, 3, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 44, 4, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 45, 5, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 46, 1, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 47, 2, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 48, 3, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 49, 4, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 50, 5, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 51, 1, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 52, 2, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 53, 3, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 54, 4, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 55, 5, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 46, 1, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 47, 2, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 48, 3, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 49, 4, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 50, 5, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 51, 1, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 52, 2, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 53, 3, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 54, 4, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 55, 5, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 56, 16, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 57, 17, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 58, 18, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 59, 19, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 60, 20, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 51, 1, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 52, 2, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 53, 3, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 54, 4, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 55, 5, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 56, 16, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 57, 17, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 58, 18, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 59, 19, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 60, 20, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 1, 21, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 2, 22, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 3, 23, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 4, 24, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 5, 25, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 56, 16, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 57, 17, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 58, 18, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 59, 19, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 60, 20, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus { 2, 5, 1, 21, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_D_T @ RbPlus { 2, 1, 2, 22, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_D_T @ RbPlus { 2, 39, 3, 23, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_D_T @ RbPlus { 2, 6, 4, 24, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_D_T @ RbPlus { 2, 7, 5, 25, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_D_T @ RbPlus }; const ADDR_SW_PATINFO SW_64K_S_X_RBPLUS_PATINFO[] = { { 1, 0, 1, 1, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 1, 1, 2, 2, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 1, 2, 3, 3, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 1, 3, 4, 4, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 1, 4, 5, 5, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 6, 1, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 7, 2, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 8, 3, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 9, 4, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 10, 5, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 210, 1, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 211, 2, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 212, 3, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 213, 4, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 214, 5, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 215, 1, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 216, 2, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 217, 3, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 218, 4, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 219, 5, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 11, 1, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 12, 2, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 13, 3, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 14, 4, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 15, 5, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 220, 1, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 221, 2, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 222, 3, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 223, 4, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 224, 5, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 225, 1, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 226, 2, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 227, 3, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 228, 4, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 229, 5, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 16, 1, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 17, 2, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 18, 3, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 19, 4, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 20, 5, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 230, 1, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 231, 2, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 232, 3, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 233, 4, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 234, 5, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 250, 6, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 251, 7, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 252, 8, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 253, 9, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 254, 10, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 21, 1, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 22, 2, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 23, 3, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 24, 4, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 25, 5, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 255, 6, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 256, 7, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 257, 8, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 258, 9, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 259, 10, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 260, 11, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 261, 12, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 262, 13, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 263, 14, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 264, 15, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 26, 6, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 27, 7, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 28, 8, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 29, 9, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 30, 10, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus { 3, 0, 265, 11, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_S_X @ RbPlus { 3, 1, 266, 12, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_S_X @ RbPlus { 3, 2, 267, 13, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_S_X @ RbPlus { 3, 3, 268, 14, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_S_X @ RbPlus { 3, 4, 269, 15, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_S_X @ RbPlus }; const ADDR_SW_PATINFO SW_64K_D_X_RBPLUS_PATINFO[] = { { 1, 5, 1, 1, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 1, 1, 2, 2, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 1, 39, 3, 3, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 1, 6, 4, 4, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 1, 7, 5, 5, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 6, 1, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 7, 2, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 8, 3, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 9, 4, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 10, 5, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 210, 1, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 211, 2, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 212, 3, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 213, 4, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 214, 5, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 215, 1, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 216, 2, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 217, 3, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 218, 4, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 219, 5, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 11, 1, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 12, 2, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 13, 3, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 14, 4, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 15, 5, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 220, 1, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 221, 2, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 222, 3, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 223, 4, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 224, 5, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 225, 1, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 226, 2, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 227, 3, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 228, 4, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 229, 5, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 16, 1, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 17, 2, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 18, 3, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 19, 4, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 20, 5, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 230, 1, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 231, 2, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 232, 3, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 233, 4, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 234, 5, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 250, 6, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 251, 7, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 252, 8, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 253, 9, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 254, 10, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 21, 1, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 22, 2, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 23, 3, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 24, 4, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 25, 5, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 255, 6, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 256, 7, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 257, 8, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 258, 9, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 259, 10, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 260, 11, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 261, 12, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 262, 13, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 263, 14, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 264, 15, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 26, 6, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 27, 7, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 28, 8, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 29, 9, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 30, 10, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus { 3, 5, 265, 11, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_D_X @ RbPlus { 3, 1, 266, 12, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_D_X @ RbPlus { 3, 39, 267, 13, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_D_X @ RbPlus { 3, 6, 268, 14, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_D_X @ RbPlus { 3, 7, 269, 15, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_D_X @ RbPlus }; const ADDR_SW_PATINFO SW_64K_R_X_1xaa_RBPLUS_PATINFO[] = { { 2, 0, 347, 193, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 2, 1, 348, 366, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 2, 39, 349, 195, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 2, 6, 350, 367, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 2, 7, 351, 368, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 352, 193, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 353, 194, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 354, 195, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 355, 369, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 356, 370, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 280, 193, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 281, 194, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 282, 195, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 283, 196, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 284, 197, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 394, 219, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 395, 371, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 396, 372, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 397, 373, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 398, 374, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 290, 203, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 291, 204, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 292, 205, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 293, 206, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 294, 207, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 295, 219, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 296, 375, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 297, 376, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 298, 377, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 299, 378, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 399, 379, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 399, 380, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 399, 381, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 399, 382, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 399, 383, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 400, 669, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 401, 670, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 402, 671, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 304, 387, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 305, 388, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 307, 379, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 307, 389, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 307, 381, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 307, 382, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 307, 390, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 307, 672, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 307, 673, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 307, 674, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 307, 675, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 307, 676, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 309, 677, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 309, 678, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 309, 679, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 309, 399, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 323, 400, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 309, 680, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 309, 681, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 309, 682, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 309, 404, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 323, 405, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 309, 505, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 309, 506, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 309, 507, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 309, 683, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 323, 684, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 311, 685, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 311, 686, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 311, 687, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 318, 411, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 324, 412, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 0, 311, 513, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 1, 311, 514, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 39, 311, 515, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 6, 318, 413, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_R_X 1xaa @ RbPlus { 3, 7, 324, 414, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_R_X 1xaa @ RbPlus }; const ADDR_SW_PATINFO SW_64K_R_X_2xaa_RBPLUS_PATINFO[] = { { 3, 0, 424, 526, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 348, 527, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 358, 528, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 350, 688, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 359, 689, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 352, 526, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 353, 527, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 354, 528, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 355, 688, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 356, 690, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 280, 526, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 281, 527, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 282, 528, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 283, 529, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 284, 530, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 394, 691, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 395, 692, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 396, 693, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 397, 694, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 425, 695, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 290, 534, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 291, 535, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 292, 536, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 293, 537, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 294, 538, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 295, 691, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 296, 696, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 297, 697, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 298, 698, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 299, 699, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 399, 700, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 399, 701, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 399, 702, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 399, 703, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 426, 429, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 400, 704, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 401, 705, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 402, 706, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 304, 707, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 364, 708, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 307, 700, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 307, 701, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 307, 702, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 307, 703, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 427, 390, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 307, 709, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 307, 710, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 307, 711, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 307, 712, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 427, 676, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 309, 713, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 309, 714, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 309, 715, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 323, 716, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 428, 400, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 309, 717, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 309, 718, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 309, 719, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 323, 720, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 428, 405, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 309, 721, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 309, 722, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 309, 723, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 323, 724, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 428, 684, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 318, 725, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 318, 726, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 318, 727, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 324, 728, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 429, 412, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 0, 318, 729, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 1, 318, 730, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 39, 318, 731, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 6, 324, 732, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_R_X 2xaa @ RbPlus { 3, 7, 429, 414, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_R_X 2xaa @ RbPlus }; const ADDR_SW_PATINFO SW_64K_R_X_4xaa_RBPLUS_PATINFO[] = { { 3, 0, 347, 566, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 348, 733, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 349, 568, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 350, 734, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 351, 735, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 352, 566, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 353, 567, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 354, 568, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 355, 736, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 356, 737, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 280, 566, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 281, 567, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 282, 568, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 283, 569, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 284, 570, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 394, 587, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 395, 738, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 396, 739, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 397, 740, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 430, 741, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 290, 576, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 291, 577, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 292, 578, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 293, 579, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 405, 580, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 295, 587, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 296, 742, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 297, 743, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 298, 740, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 431, 699, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 399, 744, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 399, 745, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 399, 746, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 432, 747, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 433, 429, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 400, 748, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 401, 749, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 402, 750, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 434, 707, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 435, 708, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 307, 744, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 307, 751, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 307, 746, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 436, 703, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 437, 390, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 307, 752, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 307, 753, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 307, 754, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 436, 712, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 437, 676, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 323, 755, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 323, 756, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 323, 757, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 438, 716, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 439, 400, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 323, 758, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 323, 759, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 323, 760, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 438, 720, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 439, 405, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 323, 761, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 323, 762, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 323, 763, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 438, 724, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 439, 684, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 324, 764, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 324, 765, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 324, 766, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 440, 728, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 441, 412, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 0, 324, 767, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 1, 324, 768, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 39, 324, 769, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 6, 440, 732, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_R_X 4xaa @ RbPlus { 3, 7, 441, 414, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_R_X 4xaa @ RbPlus }; const ADDR_SW_PATINFO SW_64K_R_X_8xaa_RBPLUS_PATINFO[] = { { 3, 0, 424, 619, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 348, 620, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 358, 621, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 350, 770, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 359, 771, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 352, 619, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 353, 620, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 354, 621, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 355, 770, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 378, 772, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 280, 619, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 281, 620, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 282, 621, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 283, 622, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 413, 623, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 394, 773, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 395, 774, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 442, 775, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 443, 776, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 444, 777, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 415, 629, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 291, 630, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 292, 631, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 416, 632, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 417, 580, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 295, 773, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 296, 778, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 297, 779, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 445, 780, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 446, 699, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 399, 781, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 399, 782, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 447, 783, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 448, 784, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 449, 429, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 450, 785, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 302, 786, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 303, 787, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 420, 788, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 451, 708, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 339, 781, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 339, 782, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 422, 746, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 452, 703, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 453, 390, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 339, 789, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 339, 790, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 422, 754, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 452, 712, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 453, 676, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 343, 791, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 341, 792, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 423, 757, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 454, 716, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 455, 400, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 343, 793, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 341, 794, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 423, 760, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 454, 720, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 455, 405, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 343, 795, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 341, 796, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 423, 763, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 454, 724, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 455, 684, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 344, 797, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 345, 798, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 456, 766, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 457, 728, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 458, 412, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 0, 344, 799, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 1, 345, 800, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 39, 456, 769, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 6, 457, 732, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_R_X 8xaa @ RbPlus { 3, 7, 458, 414, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_R_X 8xaa @ RbPlus }; const ADDR_SW_PATINFO SW_64K_Z_X_1xaa_RBPLUS_PATINFO[] = { { 2, 8, 347, 193, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 2, 9, 348, 366, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 2, 10, 349, 195, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 2, 11, 350, 367, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 2, 7, 351, 368, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 352, 193, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 353, 194, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 354, 195, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 355, 369, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 356, 370, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 280, 193, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 281, 194, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 282, 195, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 283, 196, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 284, 197, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 285, 219, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 286, 371, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 287, 372, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 288, 373, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 289, 374, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 290, 203, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 291, 204, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 292, 205, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 293, 206, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 294, 207, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 295, 219, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 296, 375, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 297, 376, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 298, 377, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 299, 378, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 300, 379, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 300, 380, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 300, 381, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 300, 382, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 300, 383, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 301, 384, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 302, 385, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 303, 386, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 304, 387, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 305, 388, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 306, 379, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 306, 389, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 306, 381, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 307, 382, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 307, 390, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 306, 391, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 306, 392, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 306, 393, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 307, 394, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 307, 395, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 308, 396, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 308, 397, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 308, 398, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 309, 399, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 323, 400, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 308, 401, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 308, 402, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 308, 403, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 309, 404, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 323, 405, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 308, 240, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 308, 241, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 308, 242, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 309, 406, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 323, 407, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 310, 408, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 310, 409, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 310, 410, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 318, 411, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 324, 412, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 8, 310, 250, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 9, 310, 251, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 10, 310, 252, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 11, 318, 413, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_Z_X 1xaa @ RbPlus { 3, 7, 324, 414, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_Z_X 1xaa @ RbPlus }; const ADDR_SW_PATINFO SW_64K_Z_X_2xaa_RBPLUS_PATINFO[] = { { 2, 13, 357, 415, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 2, 14, 349, 195, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 358, 263, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 350, 416, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 359, 417, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 360, 415, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 354, 195, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 354, 263, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 361, 418, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 356, 419, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 281, 262, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 282, 195, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 282, 263, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 317, 264, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 284, 265, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 286, 420, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 287, 376, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 287, 421, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 289, 422, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 289, 423, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 291, 268, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 292, 205, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 292, 269, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 293, 270, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 294, 271, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 296, 420, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 297, 376, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 297, 421, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 298, 424, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 299, 423, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 300, 425, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 300, 426, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 300, 427, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 362, 428, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 363, 429, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 302, 430, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 303, 386, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 303, 431, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 305, 432, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 364, 433, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 306, 380, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 306, 381, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 306, 434, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 307, 435, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 365, 435, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 306, 402, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 306, 403, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 306, 436, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 307, 405, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 365, 405, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 308, 397, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 308, 398, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 308, 437, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 323, 438, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 366, 438, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 308, 402, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 308, 403, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 308, 436, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 323, 439, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 366, 439, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 308, 440, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 308, 242, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 308, 441, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 323, 442, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 366, 442, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 310, 443, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 310, 410, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 310, 444, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 324, 412, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 367, 412, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 13, 310, 445, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 14, 310, 252, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 15, 310, 446, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 16, 324, 414, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_Z_X 2xaa @ RbPlus { 3, 17, 367, 414, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_Z_X 2xaa @ RbPlus }; const ADDR_SW_PATINFO SW_64K_Z_X_4xaa_RBPLUS_PATINFO[] = { { 2, 18, 349, 195, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 349, 447, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 349, 448, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 350, 449, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 351, 450, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 354, 195, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 368, 451, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 354, 299, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 355, 452, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 356, 453, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 282, 195, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 282, 298, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 282, 299, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 283, 300, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 284, 301, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 287, 372, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 287, 454, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 287, 455, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 288, 456, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 331, 457, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 292, 205, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 292, 306, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 292, 307, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 320, 308, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 321, 309, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 297, 376, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 297, 458, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 297, 459, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 299, 460, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 369, 461, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 300, 381, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 300, 462, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 300, 463, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 363, 464, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 370, 465, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 303, 386, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 303, 466, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 303, 467, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 371, 468, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 337, 469, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 306, 381, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 306, 462, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 306, 470, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 372, 470, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 373, 470, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 306, 393, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 306, 471, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 306, 472, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 372, 472, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 373, 472, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 308, 398, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 308, 473, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 308, 438, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 374, 438, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 375, 438, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 308, 403, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 308, 471, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 308, 439, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 374, 439, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 375, 439, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 308, 242, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 308, 441, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 308, 442, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 374, 442, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 375, 442, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 310, 410, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 310, 474, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 310, 412, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 376, 412, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 377, 412, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 18, 310, 252, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 19, 310, 475, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 20, 310, 414, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 21, 376, 414, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_Z_X 4xaa @ RbPlus { 3, 22, 377, 414, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_Z_X 4xaa @ RbPlus }; const ADDR_SW_PATINFO SW_64K_Z_X_8xaa_RBPLUS_PATINFO[] = { { 3, 23, 358, 263, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 349, 448, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 358, 332, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 350, 476, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 359, 477, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 354, 263, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 354, 299, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 354, 332, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 361, 478, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 378, 479, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 282, 263, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 282, 299, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 282, 332, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 317, 333, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 329, 334, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 287, 421, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 287, 480, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 287, 481, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 379, 482, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 380, 483, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 292, 269, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 292, 307, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 292, 339, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 332, 340, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 333, 341, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 297, 421, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 297, 459, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 297, 481, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 381, 484, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 382, 485, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 300, 434, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 300, 463, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 383, 486, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 384, 487, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 385, 488, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 303, 431, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 303, 467, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 303, 489, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 337, 469, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 386, 469, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 306, 434, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 306, 470, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 387, 490, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 373, 470, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 388, 470, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 306, 436, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 306, 472, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 387, 491, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 373, 472, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 388, 492, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 308, 437, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 308, 438, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 389, 493, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 375, 438, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 390, 438, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 308, 436, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 308, 439, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 391, 494, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 375, 439, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 390, 439, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 308, 441, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 308, 442, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 391, 495, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 375, 442, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 390, 442, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 310, 444, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 310, 412, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 392, 496, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 377, 412, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 393, 412, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 23, 310, 446, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 24, 310, 414, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 25, 367, 414, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 26, 377, 414, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_Z_X 8xaa @ RbPlus { 3, 27, 393, 414, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_Z_X 8xaa @ RbPlus }; const ADDR_SW_PATINFO SW_64K_S3_RBPLUS_PATINFO[] = { { 1, 29, 131, 148, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus { 1, 29, 131, 148, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_S3 @ RbPlus { 1, 30, 132, 149, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_S3 @ RbPlus { 1, 31, 133, 150, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_S3 @ RbPlus { 1, 32, 134, 151, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_S3 @ RbPlus { 1, 33, 135, 152, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_S3 @ RbPlus }; const ADDR_SW_PATINFO SW_64K_S3_X_RBPLUS_PATINFO[] = { { 1, 29, 131, 148, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 1, 30, 132, 149, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 1, 31, 133, 150, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 1, 32, 134, 151, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 1, 33, 135, 152, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 136, 148, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 137, 149, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 138, 150, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 139, 151, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 140, 152, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 141, 148, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 142, 149, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 143, 150, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 144, 151, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 145, 152, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 146, 148, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 147, 149, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 148, 150, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 149, 151, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 150, 152, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 141, 148, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 142, 149, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 143, 150, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 144, 151, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 145, 152, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 146, 148, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 147, 149, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 148, 150, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 149, 151, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 150, 152, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 151, 148, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 152, 149, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 153, 150, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 154, 151, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 155, 152, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 146, 148, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 147, 149, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 148, 150, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 149, 151, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 150, 152, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 151, 148, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 152, 149, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 153, 150, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 154, 151, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 155, 152, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 156, 153, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 157, 154, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 158, 155, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 159, 156, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 160, 157, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 151, 148, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 152, 149, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 153, 150, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 154, 151, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 155, 152, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 156, 153, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 157, 154, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 158, 155, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 159, 156, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 160, 157, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 161, 158, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 162, 159, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 163, 160, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 164, 161, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 165, 162, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 156, 153, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 157, 154, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 158, 155, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 159, 156, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 160, 157, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus { 3, 29, 161, 158, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_S3_X @ RbPlus { 3, 30, 162, 159, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_S3_X @ RbPlus { 3, 31, 163, 160, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_S3_X @ RbPlus { 3, 32, 164, 161, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_S3_X @ RbPlus { 3, 33, 165, 162, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_S3_X @ RbPlus }; const ADDR_SW_PATINFO SW_64K_S3_T_RBPLUS_PATINFO[] = { { 1, 29, 131, 148, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 1, 30, 132, 149, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 1, 31, 133, 150, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 1, 32, 134, 151, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 1, 33, 135, 152, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 136, 148, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 137, 149, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 138, 150, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 139, 151, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 140, 152, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 141, 148, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 142, 149, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 143, 150, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 144, 151, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 145, 152, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 166, 148, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 167, 149, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 168, 150, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 169, 151, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 170, 152, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 141, 148, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 142, 149, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 143, 150, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 144, 151, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 145, 152, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 166, 148, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 167, 149, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 168, 150, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 169, 151, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 170, 152, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 171, 148, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 172, 149, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 173, 150, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 174, 151, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 175, 152, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 166, 148, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 167, 149, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 168, 150, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 169, 151, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 170, 152, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 171, 148, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 172, 149, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 173, 150, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 174, 151, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 175, 152, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 176, 153, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 177, 154, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 178, 155, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 179, 156, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 180, 157, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 171, 148, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 172, 149, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 173, 150, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 174, 151, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 175, 152, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 176, 153, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 177, 154, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 178, 155, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 179, 156, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 180, 157, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 131, 163, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 132, 164, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 133, 165, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 134, 166, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 135, 167, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 176, 153, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 177, 154, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 178, 155, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 179, 156, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 180, 157, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus { 3, 29, 131, 163, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_S3_T @ RbPlus { 3, 30, 132, 164, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_S3_T @ RbPlus { 3, 31, 133, 165, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_S3_T @ RbPlus { 3, 32, 134, 166, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_S3_T @ RbPlus { 3, 33, 135, 167, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_S3_T @ RbPlus }; const ADDR_SW_PATINFO SW_64K_D3_X_RBPLUS_PATINFO[] = { { 1, 34, 131, 148, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 1, 35, 132, 149, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 1, 36, 133, 150, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 1, 37, 134, 151, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 1, 38, 135, 152, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 2, 34, 459, 170, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 2, 35, 459, 801, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 2, 36, 460, 802, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 2, 37, 461, 152, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 3, 38, 462, 152, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 463, 803, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 463, 804, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 464, 805, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 465, 806, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 466, 806, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 467, 803, 0, } , // 8 pipes (2 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 467, 804, 0, } , // 8 pipes (2 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 468, 805, 0, } , // 8 pipes (2 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 469, 806, 0, } , // 8 pipes (2 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 470, 806, 0, } , // 8 pipes (2 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 471, 807, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 472, 808, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 473, 809, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 474, 810, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 475, 811, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 476, 812, 0, } , // 8 pipes (4 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 477, 804, 0, } , // 8 pipes (4 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 478, 805, 0, } , // 8 pipes (4 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 479, 806, 0, } , // 8 pipes (4 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 480, 806, 0, } , // 8 pipes (4 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 481, 813, 0, } , // 16 pipes (4 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 482, 804, 0, } , // 16 pipes (4 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 483, 805, 0, } , // 16 pipes (4 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 484, 806, 0, } , // 16 pipes (4 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 485, 806, 0, } , // 16 pipes (4 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 486, 814, 0, } , // 8 pipes (8 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 486, 815, 0, } , // 8 pipes (8 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 486, 816, 0, } , // 8 pipes (8 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 487, 817, 0, } , // 8 pipes (8 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 488, 817, 0, } , // 8 pipes (8 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 489, 812, 0, } , // 16 pipes (8 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 490, 804, 0, } , // 16 pipes (8 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 491, 805, 0, } , // 16 pipes (8 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 492, 806, 0, } , // 16 pipes (8 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 493, 806, 0, } , // 16 pipes (8 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 489, 818, 0, } , // 32 pipes (8 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 494, 819, 0, } , // 32 pipes (8 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 494, 820, 0, } , // 32 pipes (8 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 495, 821, 0, } , // 32 pipes (8 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 496, 821, 0, } , // 32 pipes (8 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 497, 822, 0, } , // 16 pipes (16 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 498, 823, 0, } , // 16 pipes (16 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 499, 824, 0, } , // 16 pipes (16 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 500, 825, 0, } , // 16 pipes (16 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 501, 825, 0, } , // 16 pipes (16 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 497, 826, 0, } , // 32 pipes (16 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 498, 827, 0, } , // 32 pipes (16 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 499, 828, 0, } , // 32 pipes (16 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 500, 829, 0, } , // 32 pipes (16 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 501, 829, 0, } , // 32 pipes (16 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 497, 830, 0, } , // 64 pipes (16 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 502, 831, 0, } , // 64 pipes (16 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 502, 832, 0, } , // 64 pipes (16 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 503, 833, 0, } , // 64 pipes (16 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 504, 833, 0, } , // 64 pipes (16 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 505, 834, 0, } , // 32 pipes (32 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 506, 835, 0, } , // 32 pipes (32 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 507, 836, 0, } , // 32 pipes (32 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 508, 837, 0, } , // 32 pipes (32 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 509, 837, 0, } , // 32 pipes (32 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus { 3, 34, 505, 838, 0, } , // 64 pipes (32 PKRs) 1 bpe @ SW_64K_D3_X @ RbPlus { 3, 35, 506, 839, 0, } , // 64 pipes (32 PKRs) 2 bpe @ SW_64K_D3_X @ RbPlus { 3, 36, 507, 840, 0, } , // 64 pipes (32 PKRs) 4 bpe @ SW_64K_D3_X @ RbPlus { 4, 37, 508, 841, 0, } , // 64 pipes (32 PKRs) 8 bpe @ SW_64K_D3_X @ RbPlus { 4, 38, 509, 841, 0, } , // 64 pipes (32 PKRs) 16 bpe @ SW_64K_D3_X @ RbPlus }; const ADDR_SW_PATINFO SW_VAR_R_X_1xaa_RBPLUS_PATINFO[] = { { 2, 0, 270, 183, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 2, 1, 271, 184, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 2, 39, 272, 185, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 2, 6, 273, 186, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 2, 7, 274, 187, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 275, 188, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 276, 189, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 277, 190, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 278, 191, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 279, 192, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 280, 193, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 281, 194, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 282, 195, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 283, 196, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 284, 197, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 394, 198, 1, } , // 8 pipes (2 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 395, 199, 2, } , // 8 pipes (2 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 396, 200, 3, } , // 8 pipes (2 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 397, 201, 4, } , // 8 pipes (2 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 398, 202, 5, } , // 8 pipes (2 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 290, 203, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 291, 204, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 292, 205, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 293, 206, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 294, 207, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 295, 208, 6, } , // 8 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 296, 209, 2, } , // 8 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 297, 210, 7, } , // 8 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 298, 211, 4, } , // 8 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 299, 212, 8, } , // 8 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 399, 213, 9, } , // 16 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 399, 214, 10, } , // 16 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 399, 215, 11, } , // 16 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 399, 216, 12, } , // 16 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 399, 217, 13, } , // 16 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 400, 218, 15, } , // 8 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 401, 219, 15, } , // 8 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 402, 220, 15, } , // 8 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 304, 221, 15, } , // 8 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 305, 222, 15, } , // 8 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 307, 213, 9, } , // 16 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 307, 223, 16, } , // 16 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 307, 215, 11, } , // 16 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 307, 216, 17, } , // 16 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 307, 224, 13, } , // 16 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 307, 497, 18, } , // 32 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 307, 498, 19, } , // 32 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 307, 499, 20, } , // 32 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 307, 500, 21, } , // 32 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 307, 501, 22, } , // 32 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 309, 230, 125, } , // 16 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 309, 231, 126, } , // 16 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 309, 232, 127, } , // 16 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 309, 233, 26, } , // 16 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 309, 234, 27, } , // 16 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 309, 502, 28, } , // 32 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 309, 503, 19, } , // 32 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 309, 504, 29, } , // 32 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 309, 238, 30, } , // 32 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 309, 239, 31, } , // 32 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 309, 505, 32, } , // 64 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 309, 506, 33, } , // 64 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 309, 507, 34, } , // 64 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 309, 508, 35, } , // 64 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 309, 509, 36, } , // 64 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 311, 510, 128, } , // 32 pipes (32 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 311, 511, 129, } , // 32 pipes (32 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 311, 512, 130, } , // 32 pipes (32 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 311, 248, 40, } , // 32 pipes (32 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 311, 249, 41, } , // 32 pipes (32 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 0, 311, 513, 32, } , // 64 pipes (32 PKRs) 1 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 1, 311, 514, 42, } , // 64 pipes (32 PKRs) 2 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 39, 311, 515, 34, } , // 64 pipes (32 PKRs) 4 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 6, 311, 253, 43, } , // 64 pipes (32 PKRs) 8 bpe @ SW_VAR_R_X 1xaa @ RbPlus { 3, 7, 311, 254, 44, } , // 64 pipes (32 PKRs) 16 bpe @ SW_VAR_R_X 1xaa @ RbPlus }; const ADDR_SW_PATINFO SW_VAR_R_X_2xaa_RBPLUS_PATINFO[] = { { 3, 0, 403, 516, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 271, 517, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 313, 518, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 273, 519, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 314, 520, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 404, 521, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 276, 522, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 315, 523, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 278, 524, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 316, 525, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 280, 526, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 281, 527, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 282, 528, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 283, 529, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 284, 530, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 394, 208, 131, } , // 8 pipes (2 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 395, 531, 132, } , // 8 pipes (2 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 396, 302, 133, } , // 8 pipes (2 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 397, 532, 134, } , // 8 pipes (2 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 398, 533, 135, } , // 8 pipes (2 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 290, 534, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 291, 535, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 292, 536, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 293, 537, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 294, 538, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 295, 208, 131, } , // 8 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 296, 209, 132, } , // 8 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 297, 210, 133, } , // 8 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 298, 211, 134, } , // 8 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 299, 212, 135, } , // 8 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 399, 539, 136, } , // 16 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 399, 214, 137, } , // 16 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 399, 280, 138, } , // 16 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 399, 216, 139, } , // 16 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 399, 224, 140, } , // 16 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 400, 540, 15, } , // 8 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 401, 541, 15, } , // 8 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 402, 542, 15, } , // 8 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 304, 543, 15, } , // 8 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 305, 544, 15, } , // 8 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 307, 539, 136, } , // 16 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 307, 214, 137, } , // 16 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 307, 280, 138, } , // 16 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 307, 216, 139, } , // 16 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 307, 224, 140, } , // 16 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 307, 545, 141, } , // 32 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 307, 498, 142, } , // 32 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 307, 546, 143, } , // 32 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 307, 500, 144, } , // 32 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 307, 547, 145, } , // 32 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 309, 548, 146, } , // 16 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 309, 231, 147, } , // 16 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 309, 285, 148, } , // 16 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 309, 233, 149, } , // 16 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 309, 286, 150, } , // 16 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 309, 502, 141, } , // 32 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 309, 503, 151, } , // 32 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 309, 504, 143, } , // 32 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 309, 238, 152, } , // 32 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 309, 239, 153, } , // 32 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 309, 505, 154, } , // 64 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 309, 506, 155, } , // 64 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 309, 507, 156, } , // 64 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 309, 508, 157, } , // 64 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 309, 509, 158, } , // 64 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 318, 549, 159, } , // 32 pipes (32 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 318, 550, 160, } , // 32 pipes (32 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 318, 551, 161, } , // 32 pipes (32 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 318, 287, 162, } , // 32 pipes (32 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 318, 288, 163, } , // 32 pipes (32 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 0, 318, 552, 154, } , // 64 pipes (32 PKRs) 1 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 1, 318, 553, 155, } , // 64 pipes (32 PKRs) 2 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 39, 318, 554, 156, } , // 64 pipes (32 PKRs) 4 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 6, 318, 555, 157, } , // 64 pipes (32 PKRs) 8 bpe @ SW_VAR_R_X 2xaa @ RbPlus { 3, 7, 318, 290, 158, } , // 64 pipes (32 PKRs) 16 bpe @ SW_VAR_R_X 2xaa @ RbPlus }; const ADDR_SW_PATINFO SW_VAR_R_X_4xaa_RBPLUS_PATINFO[] = { { 3, 0, 270, 556, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 271, 557, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 272, 558, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 273, 559, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 274, 560, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 275, 561, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 276, 562, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 277, 563, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 278, 564, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 279, 565, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 280, 566, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 281, 567, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 282, 568, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 283, 569, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 284, 570, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 394, 571, 164, } , // 8 pipes (2 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 395, 572, 165, } , // 8 pipes (2 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 396, 573, 166, } , // 8 pipes (2 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 397, 574, 167, } , // 8 pipes (2 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 398, 575, 168, } , // 8 pipes (2 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 290, 576, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 291, 577, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 292, 578, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 293, 579, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 405, 580, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 295, 581, 169, } , // 8 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 296, 582, 165, } , // 8 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 297, 583, 170, } , // 8 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 298, 584, 167, } , // 8 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 299, 585, 168, } , // 8 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 399, 213, 171, } , // 16 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 399, 214, 172, } , // 16 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 399, 215, 173, } , // 16 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 399, 216, 174, } , // 16 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 399, 217, 175, } , // 16 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 400, 586, 15, } , // 8 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 401, 587, 15, } , // 8 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 402, 588, 15, } , // 8 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 304, 589, 15, } , // 8 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 406, 544, 15, } , // 8 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 307, 213, 171, } , // 16 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 307, 223, 176, } , // 16 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 307, 215, 173, } , // 16 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 307, 216, 177, } , // 16 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 307, 224, 175, } , // 16 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 307, 497, 178, } , // 32 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 307, 498, 179, } , // 32 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 307, 499, 180, } , // 32 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 307, 500, 181, } , // 32 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 307, 501, 182, } , // 32 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 323, 590, 183, } , // 16 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 323, 591, 184, } , // 16 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 323, 592, 185, } , // 16 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 323, 593, 186, } , // 16 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 323, 286, 187, } , // 16 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 323, 594, 188, } , // 32 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 323, 595, 179, } , // 32 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 323, 596, 189, } , // 32 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 323, 321, 190, } , // 32 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 323, 322, 191, } , // 32 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 323, 597, 192, } , // 64 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 323, 598, 193, } , // 64 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 323, 599, 194, } , // 64 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 323, 600, 195, } , // 64 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 323, 601, 196, } , // 64 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 324, 602, 197, } , // 32 pipes (32 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 324, 603, 198, } , // 32 pipes (32 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 324, 604, 199, } , // 32 pipes (32 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 324, 605, 200, } , // 32 pipes (32 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 324, 606, 201, } , // 32 pipes (32 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 0, 324, 607, 192, } , // 64 pipes (32 PKRs) 1 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 1, 324, 608, 202, } , // 64 pipes (32 PKRs) 2 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 39, 324, 609, 194, } , // 64 pipes (32 PKRs) 4 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 6, 324, 327, 203, } , // 64 pipes (32 PKRs) 8 bpe @ SW_VAR_R_X 4xaa @ RbPlus { 3, 7, 324, 328, 204, } , // 64 pipes (32 PKRs) 16 bpe @ SW_VAR_R_X 4xaa @ RbPlus }; const ADDR_SW_PATINFO SW_VAR_R_X_8xaa_RBPLUS_PATINFO[] = { { 3, 0, 407, 610, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 408, 611, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 409, 612, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 410, 613, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 411, 614, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 404, 615, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 276, 616, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 315, 617, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 278, 618, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 412, 565, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 280, 619, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 281, 620, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 282, 621, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 283, 622, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 413, 623, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 394, 624, 205, } , // 8 pipes (2 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 395, 625, 206, } , // 8 pipes (2 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 396, 626, 207, } , // 8 pipes (2 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 397, 627, 208, } , // 8 pipes (2 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 414, 628, 209, } , // 8 pipes (2 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 415, 629, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 291, 630, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 292, 631, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 416, 632, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 417, 580, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 295, 624, 205, } , // 8 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 296, 633, 206, } , // 8 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 297, 634, 207, } , // 8 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 298, 627, 208, } , // 8 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 418, 635, 210, } , // 8 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 399, 636, 211, } , // 16 pipes (4 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 399, 637, 212, } , // 16 pipes (4 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 399, 638, 213, } , // 16 pipes (4 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 399, 639, 214, } , // 16 pipes (4 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 419, 640, 215, } , // 16 pipes (4 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 301, 641, 216, } , // 8 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 302, 642, 216, } , // 8 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 303, 643, 216, } , // 8 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 420, 589, 105, } , // 8 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 421, 544, 217, } , // 8 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 339, 636, 211, } , // 16 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 339, 637, 212, } , // 16 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 339, 638, 213, } , // 16 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 339, 639, 214, } , // 16 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 422, 224, 175, } , // 16 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 339, 545, 218, } , // 32 pipes (8 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 339, 498, 219, } , // 32 pipes (8 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 339, 546, 220, } , // 32 pipes (8 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 339, 500, 221, } , // 32 pipes (8 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 339, 644, 222, } , // 32 pipes (8 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 343, 645, 223, } , // 16 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 343, 646, 224, } , // 16 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 343, 647, 225, } , // 16 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 341, 648, 226, } , // 16 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 423, 286, 187, } , // 16 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 343, 649, 218, } , // 32 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 343, 650, 227, } , // 32 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 343, 651, 220, } , // 32 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 343, 652, 221, } , // 32 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 341, 653, 228, } , // 32 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 343, 654, 229, } , // 64 pipes (16 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 343, 655, 230, } , // 64 pipes (16 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 343, 656, 231, } , // 64 pipes (16 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 343, 657, 232, } , // 64 pipes (16 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 343, 658, 233, } , // 64 pipes (16 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 346, 659, 234, } , // 32 pipes (32 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 346, 660, 235, } , // 32 pipes (32 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 346, 661, 236, } , // 32 pipes (32 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 344, 662, 237, } , // 32 pipes (32 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 345, 663, 238, } , // 32 pipes (32 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 0, 346, 664, 229, } , // 64 pipes (32 PKRs) 1 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 1, 346, 665, 230, } , // 64 pipes (32 PKRs) 2 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 39, 346, 666, 231, } , // 64 pipes (32 PKRs) 4 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 6, 346, 667, 232, } , // 64 pipes (32 PKRs) 8 bpe @ SW_VAR_R_X 8xaa @ RbPlus { 3, 7, 344, 668, 204, } , // 64 pipes (32 PKRs) 16 bpe @ SW_VAR_R_X 8xaa @ RbPlus }; const ADDR_SW_PATINFO SW_VAR_Z_X_1xaa_RBPLUS_PATINFO[] = { { 2, 8, 270, 183, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 2, 9, 271, 184, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 2, 10, 272, 185, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 2, 11, 273, 186, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 2, 7, 274, 187, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 275, 188, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 276, 189, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 277, 190, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 278, 191, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 279, 192, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 280, 193, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 281, 194, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 282, 195, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 283, 196, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 284, 197, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 285, 198, 1, } , // 8 pipes (2 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 286, 199, 2, } , // 8 pipes (2 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 287, 200, 3, } , // 8 pipes (2 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 288, 201, 4, } , // 8 pipes (2 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 289, 202, 5, } , // 8 pipes (2 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 290, 203, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 291, 204, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 292, 205, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 293, 206, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 294, 207, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 295, 208, 6, } , // 8 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 296, 209, 2, } , // 8 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 297, 210, 7, } , // 8 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 298, 211, 4, } , // 8 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 299, 212, 8, } , // 8 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 300, 213, 9, } , // 16 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 300, 214, 10, } , // 16 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 300, 215, 11, } , // 16 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 300, 216, 12, } , // 16 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 300, 217, 13, } , // 16 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 301, 218, 14, } , // 8 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 302, 219, 14, } , // 8 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 303, 220, 14, } , // 8 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 304, 221, 15, } , // 8 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 305, 222, 15, } , // 8 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 306, 213, 9, } , // 16 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 306, 223, 16, } , // 16 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 306, 215, 11, } , // 16 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 307, 216, 17, } , // 16 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 307, 224, 13, } , // 16 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 306, 225, 18, } , // 32 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 306, 226, 19, } , // 32 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 306, 227, 20, } , // 32 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 307, 228, 21, } , // 32 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 307, 229, 22, } , // 32 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 308, 230, 23, } , // 16 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 308, 231, 24, } , // 16 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 308, 232, 25, } , // 16 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 309, 233, 26, } , // 16 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 309, 234, 27, } , // 16 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 308, 235, 28, } , // 32 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 308, 236, 19, } , // 32 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 308, 237, 29, } , // 32 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 309, 238, 30, } , // 32 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 309, 239, 31, } , // 32 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 308, 240, 32, } , // 64 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 308, 241, 33, } , // 64 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 308, 242, 34, } , // 64 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 309, 243, 35, } , // 64 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 309, 244, 36, } , // 64 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 310, 245, 37, } , // 32 pipes (32 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 310, 246, 38, } , // 32 pipes (32 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 310, 247, 39, } , // 32 pipes (32 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 311, 248, 40, } , // 32 pipes (32 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 311, 249, 41, } , // 32 pipes (32 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 8, 310, 250, 32, } , // 64 pipes (32 PKRs) 1 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 9, 310, 251, 42, } , // 64 pipes (32 PKRs) 2 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 10, 310, 252, 34, } , // 64 pipes (32 PKRs) 4 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 11, 311, 253, 43, } , // 64 pipes (32 PKRs) 8 bpe @ SW_VAR_Z_X 1xaa @ RbPlus { 3, 7, 311, 254, 44, } , // 64 pipes (32 PKRs) 16 bpe @ SW_VAR_Z_X 1xaa @ RbPlus }; const ADDR_SW_PATINFO SW_VAR_Z_X_2xaa_RBPLUS_PATINFO[] = { { 2, 13, 312, 255, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 2, 14, 272, 185, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 313, 256, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 273, 257, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 314, 258, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 276, 189, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 277, 190, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 315, 259, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 278, 260, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 316, 261, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 281, 262, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 282, 195, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 282, 263, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 317, 264, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 284, 265, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 286, 209, 2, } , // 8 pipes (2 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 287, 266, 3, } , // 8 pipes (2 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 287, 210, 45, } , // 8 pipes (2 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 288, 211, 46, } , // 8 pipes (2 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 289, 267, 47, } , // 8 pipes (2 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 291, 268, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 292, 205, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 292, 269, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 293, 270, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 294, 271, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 296, 209, 2, } , // 8 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 297, 210, 7, } , // 8 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 297, 210, 45, } , // 8 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 298, 211, 46, } , // 8 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 299, 212, 47, } , // 8 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 300, 272, 48, } , // 16 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 300, 273, 11, } , // 16 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 300, 273, 49, } , // 16 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 300, 274, 50, } , // 16 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 300, 275, 51, } , // 16 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 302, 219, 14, } , // 8 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 303, 220, 14, } , // 8 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 303, 276, 14, } , // 8 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 304, 277, 15, } , // 8 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 305, 278, 15, } , // 8 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 306, 279, 48, } , // 16 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 306, 215, 11, } , // 16 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 306, 280, 49, } , // 16 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 307, 281, 52, } , // 16 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 307, 224, 53, } , // 16 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 306, 236, 19, } , // 32 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 306, 237, 54, } , // 32 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 306, 237, 55, } , // 32 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 307, 282, 56, } , // 32 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 307, 283, 57, } , // 32 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 308, 284, 24, } , // 16 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 308, 232, 25, } , // 16 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 308, 285, 58, } , // 16 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 309, 233, 59, } , // 16 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 309, 286, 60, } , // 16 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 308, 236, 19, } , // 32 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 308, 237, 29, } , // 32 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 308, 237, 55, } , // 32 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 309, 238, 56, } , // 32 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 309, 239, 61, } , // 32 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 308, 241, 62, } , // 64 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 308, 242, 34, } , // 64 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 308, 242, 63, } , // 64 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 309, 243, 64, } , // 64 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 309, 244, 65, } , // 64 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 310, 246, 38, } , // 32 pipes (32 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 310, 247, 39, } , // 32 pipes (32 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 310, 247, 66, } , // 32 pipes (32 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 318, 287, 67, } , // 32 pipes (32 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 318, 288, 68, } , // 32 pipes (32 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 13, 310, 251, 62, } , // 64 pipes (32 PKRs) 1 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 14, 310, 252, 34, } , // 64 pipes (32 PKRs) 2 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 15, 310, 252, 63, } , // 64 pipes (32 PKRs) 4 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 16, 318, 289, 69, } , // 64 pipes (32 PKRs) 8 bpe @ SW_VAR_Z_X 2xaa @ RbPlus { 3, 17, 318, 290, 65, } , // 64 pipes (32 PKRs) 16 bpe @ SW_VAR_Z_X 2xaa @ RbPlus }; const ADDR_SW_PATINFO SW_VAR_Z_X_4xaa_RBPLUS_PATINFO[] = { { 2, 18, 272, 185, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 272, 291, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 272, 292, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 273, 293, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 274, 294, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 277, 190, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 315, 259, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 277, 295, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 319, 296, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 279, 297, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 282, 195, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 282, 298, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 282, 299, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 283, 300, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 284, 301, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 287, 200, 3, } , // 8 pipes (2 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 287, 302, 45, } , // 8 pipes (2 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 287, 303, 70, } , // 8 pipes (2 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 289, 304, 71, } , // 8 pipes (2 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 289, 305, 72, } , // 8 pipes (2 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 292, 205, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 292, 306, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 292, 307, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 320, 308, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 321, 309, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 297, 210, 7, } , // 8 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 297, 210, 45, } , // 8 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 297, 310, 45, } , // 8 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 298, 311, 71, } , // 8 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 299, 312, 47, } , // 8 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 300, 215, 11, } , // 16 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 300, 215, 73, } , // 16 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 300, 215, 74, } , // 16 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 300, 216, 75, } , // 16 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 300, 217, 76, } , // 16 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 303, 220, 14, } , // 8 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 303, 276, 14, } , // 8 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 303, 313, 14, } , // 8 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 305, 314, 15, } , // 8 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 322, 315, 15, } , // 8 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 306, 215, 11, } , // 16 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 306, 232, 77, } , // 16 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 306, 215, 78, } , // 16 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 307, 216, 79, } , // 16 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 307, 224, 80, } , // 16 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 306, 227, 20, } , // 32 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 306, 316, 55, } , // 32 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 306, 227, 81, } , // 32 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 307, 317, 82, } , // 32 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 307, 229, 83, } , // 32 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 308, 232, 25, } , // 16 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 308, 232, 84, } , // 16 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 308, 318, 84, } , // 16 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 323, 319, 85, } , // 16 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 323, 320, 86, } , // 16 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 308, 237, 29, } , // 32 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 308, 237, 55, } , // 32 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 308, 237, 87, } , // 32 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 323, 321, 88, } , // 32 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 323, 322, 89, } , // 32 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 308, 242, 34, } , // 64 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 308, 242, 90, } , // 64 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 308, 242, 91, } , // 64 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 323, 323, 92, } , // 64 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 323, 324, 93, } , // 64 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 310, 247, 39, } , // 32 pipes (32 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 310, 247, 66, } , // 32 pipes (32 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 310, 247, 94, } , // 32 pipes (32 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 324, 325, 95, } , // 32 pipes (32 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 324, 326, 96, } , // 32 pipes (32 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 18, 310, 252, 34, } , // 64 pipes (32 PKRs) 1 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 19, 310, 252, 97, } , // 64 pipes (32 PKRs) 2 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 20, 310, 252, 98, } , // 64 pipes (32 PKRs) 4 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 21, 324, 327, 99, } , // 64 pipes (32 PKRs) 8 bpe @ SW_VAR_Z_X 4xaa @ RbPlus { 3, 22, 324, 328, 100, } , // 64 pipes (32 PKRs) 16 bpe @ SW_VAR_Z_X 4xaa @ RbPlus }; const ADDR_SW_PATINFO SW_VAR_Z_X_8xaa_RBPLUS_PATINFO[] = { { 3, 23, 313, 256, 0, } , // 1 pipes (1 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 272, 292, 0, } , // 1 pipes (1 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 325, 292, 0, } , // 1 pipes (1 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 326, 329, 0, } , // 1 pipes (1 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 327, 294, 0, } , // 1 pipes (1 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 315, 259, 0, } , // 2 pipes (1-2 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 277, 295, 0, } , // 2 pipes (1-2 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 315, 330, 0, } , // 2 pipes (1-2 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 278, 331, 0, } , // 2 pipes (1-2 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 328, 331, 0, } , // 2 pipes (1-2 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 282, 263, 0, } , // 4 pipes (1-2 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 282, 299, 0, } , // 4 pipes (1-2 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 282, 332, 0, } , // 4 pipes (1-2 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 317, 333, 0, } , // 4 pipes (1-2 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 329, 334, 0, } , // 4 pipes (1-2 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 287, 210, 45, } , // 8 pipes (2 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 287, 335, 70, } , // 8 pipes (2 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 287, 336, 70, } , // 8 pipes (2 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 330, 337, 72, } , // 8 pipes (2 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 331, 338, 101, } , // 8 pipes (2 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 292, 269, 0, } , // 4 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 292, 307, 0, } , // 4 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 292, 339, 0, } , // 4 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 332, 340, 0, } , // 4 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 333, 341, 0, } , // 4 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 297, 210, 45, } , // 8 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 297, 310, 45, } , // 8 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 297, 342, 45, } , // 8 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 299, 343, 102, } , // 8 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 334, 344, 103, } , // 8 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 300, 273, 49, } , // 16 pipes (4 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 300, 273, 74, } , // 16 pipes (4 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 300, 345, 74, } , // 16 pipes (4 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 335, 346, 76, } , // 16 pipes (4 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 336, 286, 104, } , // 16 pipes (4 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 303, 276, 14, } , // 8 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 303, 313, 14, } , // 8 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 303, 347, 14, } , // 8 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 337, 348, 105, } , // 8 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 338, 349, 106, } , // 8 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 306, 280, 49, } , // 16 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 306, 215, 78, } , // 16 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 306, 350, 74, } , // 16 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 339, 351, 107, } , // 16 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 340, 351, 108, } , // 16 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 306, 237, 55, } , // 32 pipes (8 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 306, 237, 109, } , // 32 pipes (8 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 306, 237, 110, } , // 32 pipes (8 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 339, 352, 111, } , // 32 pipes (8 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 339, 353, 112, } , // 32 pipes (8 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 308, 285, 58, } , // 16 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 308, 318, 84, } , // 16 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 308, 354, 84, } , // 16 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 341, 355, 113, } , // 16 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 342, 356, 114, } , // 16 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 308, 237, 55, } , // 32 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 308, 237, 87, } , // 32 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 308, 237, 115, } , // 32 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 343, 357, 116, } , // 32 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 341, 358, 117, } , // 32 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 308, 242, 63, } , // 64 pipes (16 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 308, 242, 91, } , // 64 pipes (16 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 308, 242, 118, } , // 64 pipes (16 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 343, 359, 119, } , // 64 pipes (16 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 343, 360, 120, } , // 64 pipes (16 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 310, 247, 66, } , // 32 pipes (32 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 310, 247, 94, } , // 32 pipes (32 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 310, 361, 94, } , // 32 pipes (32 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 344, 362, 121, } , // 32 pipes (32 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 345, 363, 122, } , // 32 pipes (32 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 23, 310, 252, 63, } , // 64 pipes (32 PKRs) 1 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 24, 310, 252, 98, } , // 64 pipes (32 PKRs) 2 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 25, 310, 252, 118, } , // 64 pipes (32 PKRs) 4 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 26, 346, 364, 123, } , // 64 pipes (32 PKRs) 8 bpe @ SW_VAR_Z_X 8xaa @ RbPlus { 3, 27, 344, 365, 124, } , // 64 pipes (32 PKRs) 16 bpe @ SW_VAR_Z_X 8xaa @ RbPlus }; const UINT_64 GFX10_SW_PATTERN_NIBBLE01[][8] = { {X0, X1, X2, X3, Y0, Y1, Y2, Y3, }, // 0 {0, X0, X1, X2, Y0, Y1, Y2, X3, }, // 1 {0, 0, X0, X1, Y0, Y1, Y2, X2, }, // 2 {0, 0, 0, X0, Y0, Y1, X1, X2, }, // 3 {0, 0, 0, 0, Y0, Y1, X0, X1, }, // 4 {X0, X1, X2, Y1, Y0, Y2, X3, Y3, }, // 5 {0, 0, 0, X0, Y0, X1, X2, Y1, }, // 6 {0, 0, 0, 0, X0, Y0, X1, Y1, }, // 7 {X0, Y0, X1, Y1, X2, Y2, X3, Y3, }, // 8 {0, X0, Y0, X1, Y1, X2, Y2, X3, }, // 9 {0, 0, X0, Y0, X1, Y1, X2, Y2, }, // 10 {0, 0, 0, X0, Y0, X1, Y1, X2, }, // 11 {X0, Y0, X1, Y1, X2, Y2, X3, Y4, }, // 12 {S0, X0, Y0, X1, Y1, X2, Y2, X3, }, // 13 {0, S0, X0, Y0, X1, Y1, X2, Y2, }, // 14 {0, 0, S0, X0, Y0, X1, Y1, X2, }, // 15 {0, 0, 0, S0, X0, Y0, X1, Y1, }, // 16 {0, 0, 0, 0, S0, X0, Y0, X1, }, // 17 {S0, S1, X0, Y0, X1, Y1, X2, Y2, }, // 18 {0, S0, S1, X0, Y0, X1, Y1, X2, }, // 19 {0, 0, S0, S1, X0, Y0, X1, Y1, }, // 20 {0, 0, 0, S0, S1, X0, Y0, X1, }, // 21 {0, 0, 0, 0, S0, S1, X0, Y0, }, // 22 {S0, S1, S2, X0, Y0, X1, Y1, X2, }, // 23 {0, S0, S1, S2, X0, Y0, X1, Y1, }, // 24 {0, 0, S0, S1, S2, X0, Y0, X1, }, // 25 {0, 0, 0, S0, S1, S2, X0, Y0, }, // 26 {0, 0, 0, 0, S0, S1, S2, X0, }, // 27 {X0, X1, X2, Y1, Y0, Y2, X3, Y4, }, // 28 {X0, X1, Z0, Y0, Z1, Y1, X2, Z2, }, // 29 {0, X0, Z0, Y0, Z1, Y1, X1, Z2, }, // 30 {0, 0, X0, Y0, Z0, Y1, X1, Z1, }, // 31 {0, 0, 0, X0, Z0, Y0, X1, Z1, }, // 32 {0, 0, 0, 0, Z0, Y0, X0, Z1, }, // 33 {X0, X1, Z0, Y0, Y1, Z1, X2, Z2, }, // 34 {0, X0, Z0, Y0, X1, Z1, Y1, Z2, }, // 35 {0, 0, X0, Y0, X1, Z0, Y1, Z1, }, // 36 {0, 0, 0, X0, Y0, Z0, X1, Z1, }, // 37 {0, 0, 0, 0, X0, Z0, Y0, Z1, }, // 38 {0, 0, X0, X1, Y0, Y1, X2, Y2, }, // 39 }; const UINT_64 GFX10_SW_PATTERN_NIBBLE2[][4] = { {0, 0, 0, 0, }, // 0 {Y4, X4, Y5, X5, }, // 1 {Y3, X4, Y4, X5, }, // 2 {Y3, X3, Y4, X4, }, // 3 {Y2, X3, Y3, X4, }, // 4 {Y2, X2, Y3, X3, }, // 5 {Z0^X4^Y4, X4, Y5, X5, }, // 6 {Z0^Y3^X4, X4, Y4, X5, }, // 7 {Z0^X3^Y3, X3, Y4, X4, }, // 8 {Z0^Y2^X3, X3, Y3, X4, }, // 9 {Z0^X2^Y2, X2, Y3, X3, }, // 10 {Z1^Y4^X5, Z0^X4^Y5, Y5, X5, }, // 11 {Z1^Y3^X5, Z0^X4^Y4, Y4, X5, }, // 12 {Z1^Y3^X4, Z0^X3^Y4, Y4, X4, }, // 13 {Z1^Y2^X4, Z0^X3^Y3, Y3, X4, }, // 14 {Z1^Y2^X3, Z0^X2^Y3, Y3, X3, }, // 15 {Z2^Y4^X6, Z1^X4^Y6, Z0^X5^Y5, X5, }, // 16 {Z2^Y3^X6, Z1^X4^Y5, Z0^Y4^X5, X5, }, // 17 {Z2^Y3^X5, Z1^X3^Y5, Z0^X4^Y4, X4, }, // 18 {Y2^Z2^X5, Z1^X3^Y4, Z0^Y3^X4, X4, }, // 19 {Y2^Z2^X4, Z1^X2^Y4, Z0^X3^Y3, X3, }, // 20 {Z3^Y4^X7, Z2^X4^Y7, Z1^Y5^X6, Z0^X5^Y6, }, // 21 {Y3^Z3^X7, Z2^X4^Y6, Z1^Y4^X6, Z0^X5^Y5, }, // 22 {Y3^Z3^X6, Z2^X3^Y6, Z1^Y4^X5, Z0^X4^Y5, }, // 23 {Y2^Z3^X6, Z2^X3^Y5, Z1^Y3^X5, Z0^X4^Y4, }, // 24 {Y2^Z3^X5, X2^Z2^Y5, Z1^Y3^X4, Z0^X3^Y4, }, // 25 {Y4^Z4^X8, Z3^X4^Y8, Z2^Y5^X7, Z1^X5^Y7, }, // 26 {Y3^Z4^X8, Z3^X4^Y7, Z2^Y4^X7, Z1^X5^Y6, }, // 27 {Y3^Z4^X7, X3^Z3^Y7, Z2^Y4^X6, Z1^X4^Y6, }, // 28 {Y2^Z4^X7, X3^Z3^Y6, Z2^Y3^X6, Z1^X4^Y5, }, // 29 {Y2^Z4^X6, X2^Z3^Y6, Z2^Y3^X5, Z1^X3^Y5, }, // 30 {Y4^Z5^X9, X4^Z4^Y9, Z3^Y5^X8, Z2^X5^Y8, }, // 31 {Y3^Z5^X9, X4^Z4^Y8, Z3^Y4^X8, Z2^X5^Y7, }, // 32 {Y3^Z5^X8, X3^Z4^Y8, Z3^Y4^X7, Z2^X4^Y7, }, // 33 {Y2^Z5^X8, X3^Z4^Y7, Y3^Z3^X7, Z2^X4^Y6, }, // 34 {Y2^Z5^X7, X2^Z4^Y7, Y3^Z3^X6, Z2^X3^Y6, }, // 35 {X4^Y4, X4, Y5, X5, }, // 36 {Y3^X4, X4, Y4, X5, }, // 37 {X3^Y3, X3, Y4, X4, }, // 38 {Y2^X3, X3, Y3, X4, }, // 39 {X2^Y2, X2, Y3, X3, }, // 40 {Y4^X5, X4^Y5, Y5, X5, }, // 41 {Y3^X5, X4^Y4, Y4, X5, }, // 42 {Y3^X4, X3^Y4, Y4, X4, }, // 43 {Y2^X4, X3^Y3, Y3, X4, }, // 44 {Y2^X3, X2^Y3, Y3, X3, }, // 45 {Y4^X6, X4^Y6, X5^Y5, X5, }, // 46 {Y3^X6, X4^Y5, Y4^X5, X5, }, // 47 {Y3^X5, X3^Y5, X4^Y4, X4, }, // 48 {Y2^X5, X3^Y4, Y3^X4, X4, }, // 49 {Y2^X4, X2^Y4, X3^Y3, X3, }, // 50 {Y4^X7, X4^Y7, Y5^X6, X5^Y6, }, // 51 {Y3^X7, X4^Y6, Y4^X6, X5^Y5, }, // 52 {Y3^X6, X3^Y6, Y4^X5, X4^Y5, }, // 53 {Y2^X6, X3^Y5, Y3^X5, X4^Y4, }, // 54 {Y2^X5, X2^Y5, Y3^X4, X3^Y4, }, // 55 {Y4, X4, Y5^X7, X5^Y7, }, // 56 {Y3, X4, Y4^X7, X5^Y6, }, // 57 {Y3, X3, Y4^X6, X4^Y6, }, // 58 {Y2, X3, Y3^X6, X4^Y5, }, // 59 {Y2, X2, Y3^X5, X3^Y5, }, // 60 {Z0^X3^Y3, X4, Y5, X5, }, // 61 {Z0^X3^Y3, X4, Y4, X5, }, // 62 {Z0^X3^Y3, X3, Y2, X4, }, // 63 {Z0^X3^Y3, X2, Y2, X3, }, // 64 {Z1^X3^Y3, Z0^X4^Y4, Y5, X5, }, // 65 {Z1^X3^Y3, Z0^X4^Y4, Y4, X5, }, // 66 {Z1^X3^Y3, Z0^X4^Y4, Y3, X4, }, // 67 {Z1^X3^Y3, Z0^X4^Y4, Y2, X3, }, // 68 {Z1^X3^Y3, Z0^X4^Y4, Y2, X2, }, // 69 {Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, X5, }, // 70 {Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, X4, }, // 71 {Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, X3, }, // 72 {Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, X2, }, // 73 {X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, }, // 74 {X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, }, // 75 {X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X7, Z0^X5^Y7, }, // 76 {X3^Y3^Z5, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, }, // 77 {X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, }, // 78 {X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X8, Z0^X5^Y8, }, // 79 {Y3, Y4, X4, Y5, }, // 80 {X2, Y3, X3, Y4, }, // 81 {Z0^X3^Y3, Y4, X4, Y5, }, // 82 {Z0^X3^Y3, X2, X3, Y4, }, // 83 {Z1^X3^Y3, Z0^X4^Y4, Y4, Y5, }, // 84 {Z1^X3^Y3, Z0^X4^Y4, X2, Y3, }, // 85 {Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, Y4, }, // 86 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X6, Y2^X5^Y6, }, // 87 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X7, Y2^X5^Y7, }, // 88 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X8, Y2^X5^Y8, }, // 89 {X3, Y3, X4, Y4, }, // 90 {Z0^X3^Y3, X3, X4, Y4, }, // 91 {Z1^X3^Y3, Z0^X4^Y4, X3, Y4, }, // 92 {Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, Y2, }, // 93 {Z1^X3^Y3, Z0^X4^Y4, Y2^X5^Y5, X2, }, // 94 {Z2^X3^Y3, Z1^X4^Y4, Y2^Y5^X6, Z0^X5^Y6, }, // 95 {Z1^X3^Y3, Z0^X4^Y4, Y2^Y5^X6, X1^X5^Y6, }, // 96 {Z2^X3^Y3, Z1^X4^Y4, Y2^Y5^X7, Z0^X5^Y7, }, // 97 {Z1^X3^Y3, Z0^X4^Y4, Y2^Y5^X7, X1^X5^Y7, }, // 98 {Z2^X3^Y3, Z1^X4^Y4, Y2^Y5^X8, Z0^X5^Y8, }, // 99 {Z1^X3^Y3, Z0^X4^Y4, Y2^Y5^X8, X1^X5^Y8, }, // 100 {Z0^X3^Y3, Y2, X3, Y4, }, // 101 {Z1^X3^Y3, Z0^X4^Y4, X2, Y2, }, // 102 {Z1^X3^Y3, Z0^X4^Y4, Y2^X5^Y5, Y3, }, // 103 {Z1^X3^Y3, Z0^X4^Y4, Y0^X5^Y5, Y2, }, // 104 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X6, Z3^X5^Y6, }, // 105 {Z1^X3^Y3, Z0^X4^Y4, Y0^Y5^X6, X1^X5^Y6, }, // 106 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X7, Z4^X5^Y7, }, // 107 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X7, Z3^X5^Y7, }, // 108 {Z1^X3^Y3, Z0^X4^Y4, Y0^Y5^X7, X1^X5^Y7, }, // 109 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X8, Z4^X5^Y8, }, // 110 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X8, Z3^X5^Y8, }, // 111 {Z1^X3^Y3, Z0^X4^Y4, Y0^Y5^X8, X1^X5^Y8, }, // 112 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X6, S0^X5^Y6, }, // 113 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X7, S0^X5^Y7, }, // 114 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X8, S0^X5^Y8, }, // 115 {Z1^X3^Y3, Z0^X4^Y4, S1^X5^Y5, X2, }, // 116 {Z2^X3^Y3, Z1^X4^Y4, S1^Y5^X6, Z0^X5^Y6, }, // 117 {Z1^X3^Y3, Z0^X4^Y4, S1^Y5^X6, S0^X5^Y6, }, // 118 {Z2^X3^Y3, Z1^X4^Y4, S1^Y5^X7, Z0^X5^Y7, }, // 119 {Z1^X3^Y3, Z0^X4^Y4, S1^Y5^X7, S0^X5^Y7, }, // 120 {Z2^X3^Y3, Z1^X4^Y4, S1^Y5^X8, Z0^X5^Y8, }, // 121 {Z1^X3^Y3, Z0^X4^Y4, S1^Y5^X8, S0^X5^Y8, }, // 122 {Z1^X3^Y3, Z0^X4^Y4, S2^X5^Y5, Y2, }, // 123 {Z1^X3^Y3, Z0^X4^Y4, S2^X5^Y5, X2, }, // 124 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X6, S2^X5^Y6, }, // 125 {Z1^X3^Y3, Z0^X4^Y4, S2^Y5^X6, S1^X5^Y6, }, // 126 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X7, S2^X5^Y7, }, // 127 {Z1^X3^Y3, Z0^X4^Y4, S2^Y5^X7, S1^X5^Y7, }, // 128 {Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X8, S2^X5^Y8, }, // 129 {Z1^X3^Y3, Z0^X4^Y4, S2^Y5^X8, S1^X5^Y8, }, // 130 {Y2, X3, Z3, Y3, }, // 131 {Y2, X2, Z3, Y3, }, // 132 {Y2, X2, Z2, Y3, }, // 133 {Y1, X2, Z2, Y2, }, // 134 {Y1, X1, Z2, Y2, }, // 135 {Y2^X3^Z3, X3, Z3, Y3, }, // 136 {X2^Y2^Z3, X2, Z3, Y3, }, // 137 {X2^Y2^Z2, X2, Z2, Y3, }, // 138 {Y1^X2^Z2, X2, Z2, Y2, }, // 139 {X1^Y1^Z2, X1, Z2, Y2, }, // 140 {Y2^X4^Z4, X3^Y3^Z3, Z3, Y3, }, // 141 {Y2^X3^Z4, X2^Y3^Z3, Z3, Y3, }, // 142 {Y2^X3^Z3, X2^Z2^Y3, Z2, Y3, }, // 143 {Y1^X3^Z3, X2^Y2^Z2, Z2, Y2, }, // 144 {Y1^X2^Z3, X1^Y2^Z2, Z2, Y2, }, // 145 {Y2^X5^Z5, X3^Y4^Z4, Y3^Z3^X4, Y3, }, // 146 {Y2^X4^Z5, X2^Y4^Z4, X3^Y3^Z3, Y3, }, // 147 {Y2^X4^Z4, X2^Z3^Y4, Z2^X3^Y3, Y3, }, // 148 {Y1^X4^Z4, X2^Y3^Z3, Y2^Z2^X3, Y2, }, // 149 {Y1^X3^Z4, X1^Y3^Z3, X2^Y2^Z2, Y2, }, // 150 {Y2^X6^Z6, X3^Y5^Z5, Z3^Y4^X5, Y3^X4^Z4, }, // 151 {Y2^X5^Z6, X2^Y5^Z5, Z3^X4^Y4, X3^Y3^Z4, }, // 152 {Y2^X5^Z5, X2^Z4^Y5, Z2^X4^Y4, X3^Y3^Z3, }, // 153 {Y1^X5^Z5, X2^Y4^Z4, Z2^Y3^X4, Y2^X3^Z3, }, // 154 {Y1^X4^Z5, X1^Y4^Z4, Z2^X3^Y3, X2^Y2^Z3, }, // 155 {Y2^X7^Z7, X3^Y6^Z6, Z3^Y5^X6, Y3^X5^Z5, }, // 156 {Y2^X6^Z7, X2^Y6^Z6, Z3^X5^Y5, Y3^X4^Z5, }, // 157 {Y2^X6^Z6, X2^Z5^Y6, Z2^X5^Y5, Y3^X4^Z4, }, // 158 {Y1^X6^Z6, X2^Y5^Z5, Z2^Y4^X5, Y2^X4^Z4, }, // 159 {Y1^X5^Z6, X1^Y5^Z5, Z2^X4^Y4, Y2^X3^Z4, }, // 160 {Y2^X8^Z8, X3^Y7^Z7, Z3^Y6^X7, Y3^X6^Z6, }, // 161 {Y2^X7^Z8, X2^Y7^Z7, Z3^X6^Y6, Y3^X5^Z6, }, // 162 {Y2^X7^Z7, X2^Z6^Y7, Z2^X6^Y6, Y3^X5^Z5, }, // 163 {Y1^X7^Z7, X2^Y6^Z6, Z2^Y5^X6, Y2^X5^Z5, }, // 164 {Y1^X6^Z7, X1^Y6^Z6, Z2^X5^Y5, Y2^X4^Z5, }, // 165 {Y2^X5, X3^Y4^Z4, Y3^Z3^X4, Y3, }, // 166 {Y2^X4, X2^Y4^Z4, X3^Y3^Z3, Y3, }, // 167 {Y2^X4, X2^Z3^Y4, Z2^X3^Y3, Y3, }, // 168 {Y1^X4, X2^Y3^Z3, Y2^Z2^X3, Y2, }, // 169 {Y1^X3, X1^Y3^Z3, X2^Y2^Z2, Y2, }, // 170 {Y2, X3, Z3^Y4^X5, Y3^X4^Z4, }, // 171 {Y2, X2, Z3^X4^Y4, X3^Y3^Z4, }, // 172 {Y2, X2, Z2^X4^Y4, X3^Y3^Z3, }, // 173 {Y1, X2, Z2^Y3^X4, Y2^X3^Z3, }, // 174 {Y1, X1, Z2^X3^Y3, X2^Y2^Z3, }, // 175 {Y2, X3, Z3, Y3^X5, }, // 176 {Y2, X2, Z3, Y3^X4, }, // 177 {Y2, X2, Z2, Y3^X4, }, // 178 {Y1, X2, Z2, Y2^X4, }, // 179 {Y1, X1, Z2, Y2^X3, }, // 180 {X3^Y3, X3, Z3, Y2, }, // 181 {X3^Y3, X2, Z3, Y2, }, // 182 {X3^Y3, X2, Z2, Y2, }, // 183 {X3^Y3, X2, Z2, Y1, }, // 184 {X3^Y3, X1, Z2, Y1, }, // 185 {X3^Y3, X4^Y4, Z3, Y2, }, // 186 {X3^Y3, X4^Y4, Z2, Y2, }, // 187 {X3^Y3, X4^Y4, Z2, Y1, }, // 188 {X3^Y3, X1^X4^Y4, Z2, Y1, }, // 189 {X3^Y3, X4^Y4, X5^Y5, Z3, }, // 190 {X3^Y3, X4^Y4, Z3^X5^Y5, Y2, }, // 191 {X3^Y3, X4^Y4, Z2^X5^Y5, Y2, }, // 192 {X3^Y3, X4^Y4, Z2^X5^Y5, Y1, }, // 193 {X3^Y3, X1^X4^Y4, Z2^X5^Y5, Y1, }, // 194 {X3^Y3, X4^Y4, Y2^Y5^X6, X5^Y6, }, // 195 {X3^Y3, X4^Y4, Z3^Y5^X6, Y2^X5^Y6, }, // 196 {X3^Y3, X4^Y4, Z2^Y5^X6, Y2^X5^Y6, }, // 197 {X3^Y3, X4^Y4, Z2^Y5^X6, Y1^X5^Y6, }, // 198 {X3^Y3, X1^X4^Y4, Z2^Y5^X6, Y1^X5^Y6, }, // 199 {X3^Y3, X4^Y4, Y2^Y5^X7, X5^Y7, }, // 200 {X3^Y3, X4^Y4, Z3^Y5^X7, Y2^X5^Y7, }, // 201 {X3^Y3, X4^Y4, Z2^Y5^X7, Y2^X5^Y7, }, // 202 {X3^Y3, X4^Y4, Z2^Y5^X7, Y1^X5^Y7, }, // 203 {X3^Y3, X1^X4^Y4, Z2^Y5^X7, Y1^X5^Y7, }, // 204 {X3^Y3, X4^Y4, Y2^Y5^X8, X5^Y8, }, // 205 {X3^Y3, X4^Y4, Z3^Y5^X8, Y2^X5^Y8, }, // 206 {X3^Y3, X4^Y4, Z2^Y5^X8, Y2^X5^Y8, }, // 207 {X3^Y3, X4^Y4, Z2^Y5^X8, Y1^X5^Y8, }, // 208 {X3^Y3, X1^X4^Y4, Z2^Y5^X8, Y1^X5^Y8, }, // 209 {Y4^X5, Z0^X4^Y5, Y5, X5, }, // 210 {Y3^X5, Z0^X4^Y4, Y4, X5, }, // 211 {Y3^X4, Z0^X3^Y4, Y4, X4, }, // 212 {Y2^X4, Z0^X3^Y3, Y3, X4, }, // 213 {Y2^X3, Z0^X2^Y3, Y3, X3, }, // 214 {Y4^X6, X4^Y6, Z0^X5^Y5, X5, }, // 215 {Y3^X6, X4^Y5, Z0^Y4^X5, X5, }, // 216 {Y3^X5, X3^Y5, Z0^X4^Y4, X4, }, // 217 {Y2^X5, X3^Y4, Z0^Y3^X4, X4, }, // 218 {Y2^X4, X2^Y4, Z0^X3^Y3, X3, }, // 219 {Y4^X6, Z1^X4^Y6, Z0^X5^Y5, X5, }, // 220 {Y3^X6, Z1^X4^Y5, Z0^Y4^X5, X5, }, // 221 {Y3^X5, Z1^X3^Y5, Z0^X4^Y4, X4, }, // 222 {Y2^X5, Z1^X3^Y4, Z0^Y3^X4, X4, }, // 223 {Y2^X4, Z1^X2^Y4, Z0^X3^Y3, X3, }, // 224 {Y4^X7, X4^Y7, Z1^Y5^X6, Z0^X5^Y6, }, // 225 {Y3^X7, X4^Y6, Z1^Y4^X6, Z0^X5^Y5, }, // 226 {Y3^X6, X3^Y6, Z1^Y4^X5, Z0^X4^Y5, }, // 227 {Y2^X6, X3^Y5, Z1^Y3^X5, Z0^X4^Y4, }, // 228 {Y2^X5, X2^Y5, Z1^Y3^X4, Z0^X3^Y4, }, // 229 {Y4^X7, Z2^X4^Y7, Z1^Y5^X6, Z0^X5^Y6, }, // 230 {Y3^X7, Z2^X4^Y6, Z1^Y4^X6, Z0^X5^Y5, }, // 231 {Y3^X6, Z2^X3^Y6, Z1^Y4^X5, Z0^X4^Y5, }, // 232 {Y2^X6, Z2^X3^Y5, Z1^Y3^X5, Z0^X4^Y4, }, // 233 {Y2^X5, X2^Z2^Y5, Z1^Y3^X4, Z0^X3^Y4, }, // 234 {Y4^X7, X4^Y7, Z2^Y5^X6, Z1^X5^Y6, }, // 235 {Y3^X7, X4^Y6, Z2^Y4^X6, Z1^X5^Y5, }, // 236 {Y3^X6, X3^Y6, Z2^Y4^X5, Z1^X4^Y5, }, // 237 {Y2^X6, X3^Y5, Z2^Y3^X5, Z1^X4^Y4, }, // 238 {Y2^X5, X2^Y5, Z2^Y3^X4, Z1^X3^Y4, }, // 239 {Y4^X7, Z3^X4^Y7, Z2^Y5^X6, Z1^X5^Y6, }, // 240 {Y3^X7, Z3^X4^Y6, Z2^Y4^X6, Z1^X5^Y5, }, // 241 {Y3^X6, X3^Z3^Y6, Z2^Y4^X5, Z1^X4^Y5, }, // 242 {Y2^X6, X3^Z3^Y5, Z2^Y3^X5, Z1^X4^Y4, }, // 243 {Y2^X5, X2^Z3^Y5, Z2^Y3^X4, Z1^X3^Y4, }, // 244 {Y4^X7, X4^Y7, Z3^Y5^X6, Z2^X5^Y6, }, // 245 {Y3^X7, X4^Y6, Z3^Y4^X6, Z2^X5^Y5, }, // 246 {Y3^X6, X3^Y6, Z3^Y4^X5, Z2^X4^Y5, }, // 247 {Y2^X6, X3^Y5, Y3^Z3^X5, Z2^X4^Y4, }, // 248 {Y2^X5, X2^Y5, Y3^Z3^X4, Z2^X3^Y4, }, // 249 {Y4^X8, X4^Y8, Z2^Y5^X7, Z1^X5^Y7, }, // 250 {Y3^X8, X4^Y7, Z2^Y4^X7, Z1^X5^Y6, }, // 251 {Y3^X7, X3^Y7, Z2^Y4^X6, Z1^X4^Y6, }, // 252 {Y2^X7, X3^Y6, Z2^Y3^X6, Z1^X4^Y5, }, // 253 {Y2^X6, X2^Y6, Z2^Y3^X5, Z1^X3^Y5, }, // 254 {Y4^X8, Z3^X4^Y8, Z2^Y5^X7, Z1^X5^Y7, }, // 255 {Y3^X8, Z3^X4^Y7, Z2^Y4^X7, Z1^X5^Y6, }, // 256 {Y3^X7, X3^Z3^Y7, Z2^Y4^X6, Z1^X4^Y6, }, // 257 {Y2^X7, X3^Z3^Y6, Z2^Y3^X6, Z1^X4^Y5, }, // 258 {Y2^X6, X2^Z3^Y6, Z2^Y3^X5, Z1^X3^Y5, }, // 259 {Y4^X9, X4^Y9, Z3^Y5^X8, Z2^X5^Y8, }, // 260 {Y3^X9, X4^Y8, Z3^Y4^X8, Z2^X5^Y7, }, // 261 {Y3^X8, X3^Y8, Z3^Y4^X7, Z2^X4^Y7, }, // 262 {Y2^X8, X3^Y7, Y3^Z3^X7, Z2^X4^Y6, }, // 263 {Y2^X7, X2^Y7, Y3^Z3^X6, Z2^X3^Y6, }, // 264 {Y4^X9, X4^Z4^Y9, Z3^Y5^X8, Z2^X5^Y8, }, // 265 {Y3^X9, X4^Z4^Y8, Z3^Y4^X8, Z2^X5^Y7, }, // 266 {Y3^X8, X3^Z4^Y8, Z3^Y4^X7, Z2^X4^Y7, }, // 267 {Y2^X8, X3^Z4^Y7, Y3^Z3^X7, Z2^X4^Y6, }, // 268 {Y2^X7, X2^Z4^Y7, Y3^Z3^X6, Z2^X3^Y6, }, // 269 {X4, Y4, X5^Y8, Y5^X8, }, // 270 {Y3, X4, Y4^X8, X5^Y7, }, // 271 {X3, Y3, X4^Y7, Y4^X7, }, // 272 {Y2, X3, Y3^X7, X4^Y6, }, // 273 {X2, Y2, X3^Y6, Y3^X6, }, // 274 {Z0^X4^Y4, Y4, X5, X6^Y8, }, // 275 {Z0^X4^Y4, Y3, Y4, X5^Y8, }, // 276 {Z0^X4^Y4, X3, Y3, X5^Y7, }, // 277 {Z0^X4^Y4, Y2, X3, Y3^X8, }, // 278 {Z0^X4^Y4, X2, Y2, X3^Y6, }, // 279 {Y4^X5^Y5, Z0^X4^Y4, X5, Y5, }, // 280 {Y4^X5^Y5, Z0^X4^Y4, Y3, X5, }, // 281 {Y4^X5^Y5, Z0^X4^Y4, X3, Y3, }, // 282 {Y4^X5^Y5, Z0^X4^Y4, Y2, X3, }, // 283 {Y4^X5^Y5, Z0^X4^Y4, X2, Y2, }, // 284 {Y4^X5^Y5, Z0^X4^Y4, X5^Y5, Y5, }, // 285 {Y4^X5^Y5, Z0^X4^Y4, X5^Y5, Y3, }, // 286 {Y4^X5^Y5, Z0^X4^Y4, X5^Y5, X3, }, // 287 {Y4^X5^Y5, Z0^X4^Y4, X5^Y5, Y2, }, // 288 {Y4^X5^Y5, Z0^X4^Y4, X5^Y5, X2, }, // 289 {Y4^X6^Y6, Z1^X4^Y4, X5, X6, }, // 290 {Y4^X6^Y6, Z1^X4^Y4, Y3, X5, }, // 291 {Y4^X6^Y6, Z1^X4^Y4, X3, Y3, }, // 292 {Y4^X6^Y6, Z1^X4^Y4, Y2, X3, }, // 293 {Y4^X6^Y6, Z1^X4^Y4, X2, Y2, }, // 294 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5, }, // 295 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y3, }, // 296 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X3, }, // 297 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y2, }, // 298 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X2, }, // 299 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^Y6, }, // 300 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X6, }, // 301 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, Y3, }, // 302 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X3, }, // 303 {Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Y2, }, // 304 {Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X2, }, // 305 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X5^Y6, }, // 306 {Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, }, // 307 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, }, // 308 {Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, }, // 309 {Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, X5^Y8, }, // 310 {Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, }, // 311 {Y3, X4, Y4^X8, Y5^X7, }, // 312 {X3, Y3, Y4^X7, X4^Y7, }, // 313 {X2, Y2, Y3^X6, X3^Y6, }, // 314 {Z0^X4^Y4, X3, Y3, Y4^X8, }, // 315 {Z0^X4^Y4, X2, Y2, Y3^X7, }, // 316 {Y4^X5^Y5, Z0^X4^Y4, X2, X3, }, // 317 {Y4^X9^Y9, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, }, // 318 {Z0^X4^Y4, X2, X3, Y3^X8, }, // 319 {Y4^X6^Y6, Z1^X4^Y4, X2, X3, }, // 320 {Y4^X6^Y6, Z0^X4^Y4, X2, X3, }, // 321 {Y4^X7^Y7, Z1^X4^Y4, Y1^Y5^X6, X2, }, // 322 {Y4^X8^Y8, Z2^X4^Y4, Z1^Y5^X7, Z0^X5^Y7, }, // 323 {Y4^X9^Y9, Z2^X4^Y4, Z1^Y5^X8, Z0^X5^Y8, }, // 324 {X3, Y3, Y4^X7, Y1^X4^Y7, }, // 325 {Y2, X3, Y3^X7, X1^X4^Y6, }, // 326 {X2, Y2, Y3^X6, Y0^X3^Y6, }, // 327 {Y0^X4^Y4, Y2, X3, Y3^X8, }, // 328 {Y4^X5^Y5, Y0^X4^Y4, X2, X3, }, // 329 {Y4^X5^Y5, Z0^X4^Y4, X2^X5^Y5, Y2, }, // 330 {Y4^X5^Y5, Z0^X4^Y4, Y1^X5^Y5, X2, }, // 331 {Y4^X6^Y6, Z0^X4^Y4, X3, Y3, }, // 332 {Y4^X6^Y6, Y0^X4^Y4, X3, Y3, }, // 333 {Y4^X6^Y6, Z0^X4^Y4, Y0^X5^Y5, X2, }, // 334 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X2^X5^Y5, }, // 335 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y1^X5^Y5, }, // 336 {Y4^X7^Y7, Z0^X4^Y4, Y1^Y5^X6, X3, }, // 337 {Y4^X7^Y7, Z0^X4^Y4, Y0^Y5^X6, X3, }, // 338 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, Z2^X5^Y6, }, // 339 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, Y0^X5^Y6, }, // 340 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, Z2^X5^Y7, }, // 341 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, Y0^X5^Y7, }, // 342 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, Z3^X5^Y7, }, // 343 {Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, Z3^X5^Y8, }, // 344 {Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, Z2^X5^Y8, }, // 345 {Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, Z4^X5^Y8, }, // 346 {X4, Y4, X5^Y10, Y5^X10, }, // 347 {Y3, X4, Y4^X10, X5^Y9, }, // 348 {X3, Y3, X4^Y9, Y4^X9, }, // 349 {Y2, X3, Y3^X9, X4^Y8, }, // 350 {X2, Y2, X3^Y8, Y3^X8, }, // 351 {Z0^X4^Y4, Y4, X5, Y5^X10, }, // 352 {Z0^X4^Y4, Y3, Y4, X5^Y9, }, // 353 {Z0^X4^Y4, X3, Y3, Y4^X9, }, // 354 {Z0^X4^Y4, Y2, X3, Y3^X9, }, // 355 {Z0^X4^Y4, X2, Y2, Y3^X8, }, // 356 {Y3, X4, Y4^X10, Y5^X9, }, // 357 {X3, Y3, Y4^X9, X4^Y9, }, // 358 {X2, Y2, Y3^X8, X3^Y8, }, // 359 {Z0^X4^Y4, Y3, Y4, Y5^X9, }, // 360 {Z0^X4^Y4, X2, X3, Y3^X9, }, // 361 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X2^X5^Y6, }, // 362 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y1^X5^Y6, }, // 363 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X2, }, // 364 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, Y1^X5^Y6, }, // 365 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, Y1^X5^Y7, }, // 366 {Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, Y1^X5^Y8, }, // 367 {Z0^X4^Y4, X3, Y3, X5^Y8, }, // 368 {Y4^X6^Y6, Z0^X4^Y4, Y1^X5^Y5, X2, }, // 369 {Y4^X6^Y6, Z0^X4^Y4, Y1^X5^Y5, X1^X5^Y6, }, // 370 {Y4^X7^Y7, Z1^X4^Y4, Y1^Y5^X6, X3, }, // 371 {Y4^X7^Y7, Z1^X4^Y4, Y1^Y5^X6, Z0^X5^Y6, }, // 372 {Y4^X7^Y7, Z0^X4^Y4, Y1^Y5^X6, X1^X5^Y6, }, // 373 {Y4^X8^Y8, Z1^X4^Y4, Y1^Y5^X7, Z0^X5^Y7, }, // 374 {Y4^X8^Y8, Z0^X4^Y4, Y1^Y5^X7, X1^X5^Y7, }, // 375 {Y4^X9^Y9, Z1^X4^Y4, Y1^Y5^X8, Z0^X5^Y8, }, // 376 {Y4^X9^Y9, Z0^X4^Y4, Y1^Y5^X8, X1^X5^Y8, }, // 377 {Z0^X4^Y4, X2, Y2, X3^Y7, }, // 378 {Y4^X5^Y5, Z0^X4^Y4, Y2^X5^Y5, X2, }, // 379 {Y4^X5^Y5, Y0^X4^Y4, X1^X5^Y5, X2, }, // 380 {Y4^X6^Y6, Z0^X4^Y4, Y1^X5^Y5, X3, }, // 381 {Y4^X6^Y6, Y0^X4^Y4, Y1^X5^Y5, X3, }, // 382 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y2^X5^Y6, }, // 383 {Y4^X6^Y6, Z0^X4^Y4, Y1^X5^Y5, X2^X5^Y6, }, // 384 {Y4^X6^Y6, Y0^X4^Y4, Y1^X5^Y5, Y2^X5^Y6, }, // 385 {Y4^X7^Y7, Y0^X4^Y4, Y1^Y5^X6, X3, }, // 386 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, Y2^X5^Y6, }, // 387 {Y4^X7^Y7, Y0^X4^Y4, Y1^Y5^X6, X1^X5^Y6, }, // 388 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, Y2^X5^Y7, }, // 389 {Y4^X8^Y8, Y0^X4^Y4, Y1^Y5^X7, X1^X5^Y7, }, // 390 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, X2^X5^Y7, }, // 391 {Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, X2^X5^Y8, }, // 392 {Y4^X9^Y9, Y0^X4^Y4, Y1^Y5^X8, X1^X5^Y8, }, // 393 {Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, Y5, }, // 394 {Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, Y3, }, // 395 {Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, X3, }, // 396 {Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, Y2, }, // 397 {Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, X2, }, // 398 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^X7^Y7, }, // 399 {Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X6, }, // 400 {Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Y3, }, // 401 {Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X3, }, // 402 {X4, Y4, Y5^X8, X5^Y8, }, // 403 {Z0^X4^Y4, Y4, X5, Y5^X9, }, // 404 {Y4^X6^Y6, Z0^X4^Y4, X2, Y2, }, // 405 {Y4^X7^Y7, Z1^X4^Y4, S1^Y5^X6, X2, }, // 406 {X4, Y4, Y5^X8, S0^X5^Y8, }, // 407 {Y3, X4, Y4^X8, S0^X5^Y7, }, // 408 {X3, Y3, Y4^X7, S0^X4^Y7, }, // 409 {Y2, X3, Y3^X7, S0^X4^Y6, }, // 410 {X2, Y2, Y3^X6, S0^X3^Y6, }, // 411 {S2^X4^Y4, X2, Y2, X3^Y6, }, // 412 {Y4^X5^Y5, S2^X4^Y4, X2, Y2, }, // 413 {Y4^X5^Y5, Z0^X4^Y4, X3^X6^Y6, X2, }, // 414 {Y4^X6^Y6, Z1^X4^Y4, X5, Y6, }, // 415 {Y4^X6^Y6, Z0^X4^Y4, Y2, X3, }, // 416 {Y4^X6^Y6, S2^X4^Y4, X2, Y2, }, // 417 {Y4^X6^Y6, Z0^X4^Y4, S2^X5^Y5, X2, }, // 418 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X3^X7^Y7, }, // 419 {Y4^X7^Y7, Z0^X4^Y4, S2^Y5^X6, Y2, }, // 420 {Y4^X7^Y7, Z0^X4^Y4, S2^Y5^X6, X2, }, // 421 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, S2^X5^Y6, }, // 422 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, S2^X5^Y7, }, // 423 {X4, Y4, Y5^X10, X5^Y10, }, // 424 {Y4^X5^Y5, Z0^X4^Y4, S0^X6^Y6, X2, }, // 425 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, S0^X7^Y7, }, // 426 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, S0^X5^Y6, }, // 427 {Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, S0^X5^Y7, }, // 428 {Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, S0^X5^Y8, }, // 429 {Y4^X5^Y5, Z0^X4^Y4, S1^X6^Y6, X2, }, // 430 {Y4^X6^Y6, Z0^X4^Y4, S1^X5^Y5, X2, }, // 431 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, S1^X7^Y7, }, // 432 {Y4^X6^Y6, Z0^X4^Y4, S1^X5^Y5, S0^X7^Y7, }, // 433 {Y4^X7^Y7, Z1^X4^Y4, S1^Y5^X6, Y2, }, // 434 {Y4^X7^Y7, Z0^X4^Y4, S1^Y5^X6, X2, }, // 435 {Y4^X7^Y7, Z1^X4^Y4, S1^Y5^X6, Z0^X5^Y6, }, // 436 {Y4^X7^Y7, Z0^X4^Y4, S1^Y5^X6, S0^X5^Y6, }, // 437 {Y4^X8^Y8, Z1^X4^Y4, S1^Y5^X7, Z0^X5^Y7, }, // 438 {Y4^X8^Y8, Z0^X4^Y4, S1^Y5^X7, S0^X5^Y7, }, // 439 {Y4^X9^Y9, Z1^X4^Y4, S1^Y5^X8, Z0^X5^Y8, }, // 440 {Y4^X9^Y9, Z0^X4^Y4, S1^Y5^X8, S0^X5^Y8, }, // 441 {Y4^X5^Y5, Z0^X4^Y4, S2^X6^Y6, X3, }, // 442 {Y4^X5^Y5, Z0^X4^Y4, S2^X6^Y6, Y2, }, // 443 {Y4^X5^Y5, S2^X4^Y4, S1^X6^Y6, X2, }, // 444 {Y4^X6^Y6, Z0^X4^Y4, S2^X5^Y5, Y2, }, // 445 {Y4^X6^Y6, S2^X4^Y4, S1^X5^Y5, X2, }, // 446 {Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, S2^X7^Y7, }, // 447 {Y4^X6^Y6, Z0^X4^Y4, S2^X5^Y5, S1^X7^Y7, }, // 448 {Y4^X6^Y6, S2^X4^Y4, S1^X5^Y5, S0^X7^Y7, }, // 449 {Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, Y6, }, // 450 {Y4^X7^Y7, S2^X4^Y4, S1^Y5^X6, X2, }, // 451 {Y4^X7^Y7, Z0^X4^Y4, S2^Y5^X6, S1^X5^Y6, }, // 452 {Y4^X7^Y7, S2^X4^Y4, S1^Y5^X6, S0^X5^Y6, }, // 453 {Y4^X8^Y8, Z0^X4^Y4, S2^Y5^X7, S1^X5^Y7, }, // 454 {Y4^X8^Y8, S2^X4^Y4, S1^Y5^X7, S0^X5^Y7, }, // 455 {Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, S2^X5^Y8, }, // 456 {Y4^X9^Y9, Z0^X4^Y4, S2^Y5^X8, S1^X5^Y8, }, // 457 {Y4^X9^Y9, S2^X4^Y4, S1^Y5^X8, S0^X5^Y8, }, // 458 {X4^Y4, Y2, Z3, Y3, }, // 459 {X4^Y4, Y2, Z2, Y3, }, // 460 {X4^Y4, Y1, Z2, Y2, }, // 461 {Y1^X4^Y4, X1, Z2, Y2, }, // 462 {Y4^X5^Y5, X4^Y4, Y2, Z3, }, // 463 {Y4^X5^Y5, X4^Y4, Y2, Z2, }, // 464 {Z3^Y4^X5^Y5, X4^Y4, Y1, Z2, }, // 465 {Z3^Y4^X5^Y5, Y1^X4^Y4, X1, Z2, }, // 466 {Y4^X5^Y5, X4^Y4, Z3^X5, Y2, }, // 467 {Y4^X5^Y5, X4^Y4, Z2^X5, Y2, }, // 468 {Z3^Y4^X5^Y5, X4^Y4, Z2^X5, Y1, }, // 469 {Z3^Y4^X5^Y5, Y1^X4^Y4, Z2^X5, X1, }, // 470 {Y4^X6^Y6, X4^Y4, Y2, Y3, }, // 471 {Y4^X6^Y6, X4^Y4, Z3, Y3, }, // 472 {Y4^X6^Y6, X4^Y4, Z2, Y3, }, // 473 {Z3^Y4^X6^Y6, X4^Y4, Z2, Y2, }, // 474 {Z3^Y4^X6^Y6, Y1^X4^Y4, Z2, Y2, }, // 475 {Y4^X6^Y6, X4^Y4, X5^Y5, Y2, }, // 476 {Y4^X6^Y6, X4^Y4, Y2^X5^Y5, Z3, }, // 477 {Y4^X6^Y6, X4^Y4, Y2^X5^Y5, Z2, }, // 478 {Z3^Y4^X6^Y6, X4^Y4, Y1^X5^Y5, Z2, }, // 479 {Z3^Y4^X6^Y6, Y1^X4^Y4, X1^X5^Y5, Z2, }, // 480 {Y4^X6^Y6, X4^Y4, X5^Y5, Z3^X6, }, // 481 {Y4^X6^Y6, X4^Y4, Y2^X5^Y5, Z3^X6, }, // 482 {Y4^X6^Y6, X4^Y4, Y2^X5^Y5, Z2^X6, }, // 483 {Z3^Y4^X6^Y6, X4^Y4, Y1^X5^Y5, Z2^X6, }, // 484 {Z3^Y4^X6^Y6, Y1^X4^Y4, X1^X5^Y5, Z2^X6, }, // 485 {Y4^X7^Y7, X4^Y4, Y2^Y5^X6, Y3, }, // 486 {Z3^Y4^X7^Y7, X4^Y4, Y1^Y5^X6, Y2, }, // 487 {Z3^Y4^X7^Y7, Y1^X4^Y4, X1^Y5^X6, Y2, }, // 488 {Y4^X7^Y7, X4^Y4, Y2^Y5^X6, X5^Y6, }, // 489 {Y4^X7^Y7, X4^Y4, Y2^Y5^X6, Z3^X5^Y6, }, // 490 {Y4^X7^Y7, X4^Y4, Y2^Y5^X6, Z2^X5^Y6, }, // 491 {Z3^Y4^X7^Y7, X4^Y4, Y1^Y5^X6, Z2^X5^Y6, }, // 492 {Z3^Y4^X7^Y7, Y1^X4^Y4, X1^Y5^X6, Z2^X5^Y6, }, // 493 {Y4^X7^Y7, X4^Y4, Y2^Y5^X6, Y3^X5^Y6, }, // 494 {Z3^Y4^X7^Y7, X4^Y4, Y1^Y5^X6, Y2^X5^Y6, }, // 495 {Z3^Y4^X7^Y7, Y1^X4^Y4, X1^Y5^X6, Y2^X5^Y6, }, // 496 {Y4^X8^Y8, X4^Y4, Y2^Y5^X7, X5^Y7, }, // 497 {Y4^X8^Y8, X4^Y4, Y2^Y5^X7, Z3^X5^Y7, }, // 498 {Y4^X8^Y8, X4^Y4, Y2^Y5^X7, Z2^X5^Y7, }, // 499 {Z3^Y4^X8^Y8, X4^Y4, Y1^Y5^X7, Z2^X5^Y7, }, // 500 {Z3^Y4^X8^Y8, Y1^X4^Y4, X1^Y5^X7, Z2^X5^Y7, }, // 501 {Y4^X8^Y8, X4^Y4, Y2^Y5^X7, Y3^X5^Y7, }, // 502 {Z3^Y4^X8^Y8, X4^Y4, Y1^Y5^X7, Y2^X5^Y7, }, // 503 {Z3^Y4^X8^Y8, Y1^X4^Y4, X1^Y5^X7, Y2^X5^Y7, }, // 504 {Y4^X9^Y9, X4^Y4, Y2^Y5^X8, X5^Y8, }, // 505 {Y4^X9^Y9, X4^Y4, Y2^Y5^X8, Z3^X5^Y8, }, // 506 {Y4^X9^Y9, X4^Y4, Y2^Y5^X8, Z2^X5^Y8, }, // 507 {Z3^Y4^X9^Y9, X4^Y4, Y1^Y5^X8, Z2^X5^Y8, }, // 508 {Z3^Y4^X9^Y9, Y1^X4^Y4, X1^Y5^X8, Z2^X5^Y8, }, // 509 }; const UINT_64 GFX10_SW_PATTERN_NIBBLE3[][4] = { {0, 0, 0, 0, }, // 0 {Y6, X6, Y7, X7, }, // 1 {Y5, X6, Y6, X7, }, // 2 {Y5, X5, Y6, X6, }, // 3 {Y4, X5, Y5, X6, }, // 4 {Y4, X4, Y5, X5, }, // 5 {Z0^X6^Y6, X6, Y7, X7, }, // 6 {Z0^Y5^X6, X6, Y6, X7, }, // 7 {Z0^X5^Y5, X5, Y6, X6, }, // 8 {Z0^Y4^X5, X5, Y5, X6, }, // 9 {Z0^X4^Y4, X4, Y5, X5, }, // 10 {Z1^Y6^X7, Z0^X6^Y7, Y7, X7, }, // 11 {Z1^Y5^X7, Z0^X6^Y6, Y6, X7, }, // 12 {Z1^Y5^X6, Z0^X5^Y6, Y6, X6, }, // 13 {Z1^Y4^X6, Z0^X5^Y5, Y5, X6, }, // 14 {Z1^Y4^X5, Z0^X4^Y5, Y5, X5, }, // 15 {X6^Y6, X6, Y7, X7, }, // 16 {Y5^X6, X6, Y6, X7, }, // 17 {X5^Y5, X5, Y6, X6, }, // 18 {Y4^X5, X5, Y5, X6, }, // 19 {X4^Y4, X4, Y5, X5, }, // 20 {Y6^X7, X6^Y7, Y7, X7, }, // 21 {Y5^X7, X6^Y6, Y6, X7, }, // 22 {Y5^X6, X5^Y6, Y6, X6, }, // 23 {Y4^X6, X5^Y5, Y5, X6, }, // 24 {Y4^X5, X4^Y5, Y5, X5, }, // 25 {Y3, X4, Y5, X5, }, // 26 {Y4, X5, Y6, X6, }, // 27 {Y2, X4, Y5, X6, }, // 28 {Y2, X3, Y4, X5, }, // 29 {Y4, X6, Y6, X7, }, // 30 {Y3, X4, Y6, X6, }, // 31 {Y2, X3, Y4, X6, }, // 32 {Y2, X2, Y3, X4, }, // 33 {Z0^X6^Y6, X4, Y6, X7, }, // 34 {Z0^X6^Y6, X3, Y4, X6, }, // 35 {Z0^X6^Y6, Y2, X3, Y4, }, // 36 {Y2^X6^Y6, X2, Y3, X4, }, // 37 {Z1^Y6^X7, Z0^X6^Y7, Y4, X7, }, // 38 {Z1^Y6^X7, Z0^X6^Y7, Y3, X4, }, // 39 {Y2^Y6^X7, Z0^X6^Y7, Y3, X4, }, // 40 {Y2^Y6^X7, X2^X6^Y7, Y3, X4, }, // 41 {X5, Y6, X6, Y7, }, // 42 {Y5, X5, Y6, Y2^Y7, }, // 43 {X4, Y5, X5, Y2^Y6, }, // 44 {Y4, X4, Y5, Y1^Y6, }, // 45 {Y3, X4, Y5, Y1^Y6, }, // 46 {Y4, X5, Y6, Y2^Y7, }, // 47 {X3, Y4, X5, Y2^Y6, }, // 48 {Y2, X3, Y4, Y1^Y6, }, // 49 {Y4, Y6, X6, Y7, }, // 50 {Y3, X4, Y6, Y2^Y7, }, // 51 {X2, Y3, X4, Y2^Y6, }, // 52 {Y1, X3, Y4, X2^Y6, }, // 53 {Z0^X6^Y6, Y4, X6, Y7, }, // 54 {Z0^X6^Y6, X3, Y4, Y2^Y7, }, // 55 {Y2^X6^Y6, Y3, X4, X2^Y7, }, // 56 {X2^X6^Y6, X3, Y4, Y1^Y7, }, // 57 {Z0^Y6^X7, Z5^X6^Y7, Y4, Y7, }, // 58 {Z0^Y6^X7, Z5^X6^Y7, Y3, X4, }, // 59 {Z0^Y6^X7, Y2^X6^Y7, X3, Y4, }, // 60 {X2^Y6^X7, Y1^X6^Y7, X3, Y4, }, // 61 {X5, Y5, X6, Y2^Y6, }, // 62 {Y5, X5, Y2^Y6, X2^Y7, }, // 63 {Y4, X5, Y1^Y5, X2^Y6, }, // 64 {Y4, X4, Y1^Y5, X1^Y6, }, // 65 {Y5, X5, X2^Y6, Y2^Y7, }, // 66 {Y4, X5, X2^Y5, Y1^Y6, }, // 67 {Y4, X4, X1^Y5, Y1^Y6, }, // 68 {Y3, X4, Y1^Y5, X1^Y6, }, // 69 {X4, Y5, X6, Y2^Y6, }, // 70 {Y4, X5, X2^Y6, Y2^Y7, }, // 71 {X3, Y4, Y1^Y5, X2^Y6, }, // 72 {Y3, X4, X1^Y6, Y1^Y7, }, // 73 {X3, Y4, X6, Y2^Y6, }, // 74 {Y3, X4, Y2^Y6, X2^Y7, }, // 75 {Y3, X4, Y1^Y6, X2^Y7, }, // 76 {Z4^X6^Y6, X3, Y4, X6, }, // 77 {Z4^X6^Y6, X3, Y4, Y2^Y6, }, // 78 {Y1^X6^Y6, Y3, X4, X2^Y7, }, // 79 {Z5^Y6^X7, Z4^X6^Y7, Y3, X4, }, // 80 {Y2^Y6^X7, Z4^X6^Y7, Y3, X4, }, // 81 {Y1^Y6^X7, X2^X6^Y7, Y3, X4, }, // 82 {Y5, Y1^Y6, Y2^Y7, X2^Y8, }, // 83 {X4, Y1^Y5, X1^Y6, Y2^Y7, }, // 84 {Y4, Y0^Y5, Y1^Y6, X1^Y7, }, // 85 {Y5, Y1^Y6, X2^Y7, Y2^Y8, }, // 86 {X4, X1^Y5, Y1^Y6, X2^Y7, }, // 87 {Y4, Y0^Y5, X1^Y6, Y1^Y7, }, // 88 {X3, Y0^Y5, X1^Y6, Y1^Y7, }, // 89 {Y4, Y1^Y6, X2^Y7, Y2^Y8, }, // 90 {X4, X1^Y6, Y1^Y7, X2^Y8, }, // 91 {X3, X1^Y6, Y1^Y7, X2^Y8, }, // 92 {X3, Y4, X2^Y6, Y1^Y7, }, // 93 {X3, Y1^Y6, X2^Y7, Y2^Y8, }, // 94 {Z3^X6^Y6, X3, Y4, Y2^Y7, }, // 95 {Y2^X6^Y6, X3, X2^Y7, Y1^Y8, }, // 96 {Z3^Y6^X7, Y2^X6^Y7, X3, Y4, }, // 97 {Y2^Y6^X7, X2^X6^Y7, X3, Y1^Y7, }, // 98 {Y6, X6, Y7, S0^Y8, }, // 99 {Y5, X6, Y6, S0^Y7, }, // 100 {Y5, X5, Y6, S0^Y7, }, // 101 {Y4, X5, Y5, S0^Y6, }, // 102 {Y4, X4, Y5, S0^Y6, }, // 103 {Y3, X4, Y5, S0^Y6, }, // 104 {Y4, X5, Y6, S0^Y7, }, // 105 {Y2, X4, Y5, S0^Y6, }, // 106 {Y2, X3, Y4, S0^Y6, }, // 107 {Y4, X6, Y6, S0^Y7, }, // 108 {Y3, X4, Y6, S0^Y7, }, // 109 {Z0^X6^Y6, X6, Y7, S0^Y8, }, // 110 {Z0^X6^Y6, X4, Y6, S0^Y7, }, // 111 {Z0^X6^Y6, X3, Y4, S0^Y7, }, // 112 {S0^X6^Y6, Y2, X3, Y4, }, // 113 {Z0^Y6^X7, Z5^X6^Y7, Y7, S0^Y8, }, // 114 {Z0^Y6^X7, Z5^X6^Y7, Y4, S0^Y7, }, // 115 {Z0^Y6^X7, S0^X6^Y7, Y3, X4, }, // 116 {S0^Y6^X7, Y2^X6^Y7, X3, Y4, }, // 117 {Y6, X6, S0^Y7, S1^Y8, }, // 118 {Y5, X6, S0^Y6, S1^Y7, }, // 119 {Y5, X5, S0^Y6, S1^Y7, }, // 120 {Y4, X5, S0^Y5, S1^Y6, }, // 121 {Y4, X4, S0^Y5, S1^Y6, }, // 122 {Y3, X4, S0^Y5, S1^Y6, }, // 123 {Y4, X5, S0^Y6, S1^Y7, }, // 124 {X3, Y4, S0^Y5, S1^Y6, }, // 125 {Y4, X6, S0^Y6, S1^Y7, }, // 126 {Y3, X4, S0^Y6, S1^Y7, }, // 127 {Z4^X6^Y6, X6, S0^Y7, S1^Y8, }, // 128 {Z4^X6^Y6, Y4, S0^Y6, S1^Y7, }, // 129 {S1^X6^Y6, X3, Y4, S0^Y7, }, // 130 {Z5^Y6^X7, Z4^X6^Y7, S0^Y7, S1^Y8, }, // 131 {S1^Y6^X7, Z4^X6^Y7, Y4, S0^Y7, }, // 132 {S1^Y6^X7, S0^X6^Y7, Y3, X4, }, // 133 {Y6, S0^Y7, S1^Y8, S2^Y9, }, // 134 {Y5, S0^Y6, S1^Y7, S2^Y8, }, // 135 {Y4, S0^Y5, S1^Y6, S2^Y7, }, // 136 {X3, S0^Y5, S1^Y6, S2^Y7, }, // 137 {Y4, S0^Y6, S1^Y7, S2^Y8, }, // 138 {X3, Y4, S0^Y6, S1^Y7, }, // 139 {Y2, X3, S0^Y6, S1^Y7, }, // 140 {X2, Y2, X3, S0^Y6, }, // 141 {Z3^X6^Y6, S0^Y7, S1^Y8, S2^Y9, }, // 142 {S2^X6^Y6, Y4, S0^Y7, S1^Y8, }, // 143 {S0^X6^Y6, X2, Y2, X3, }, // 144 {Z3^Y6^X7, S2^X6^Y7, S0^Y7, S1^Y8, }, // 145 {S2^Y6^X7, S1^X6^Y7, Y4, S0^Y7, }, // 146 {S0^Y6^X7, X2^X6^Y7, Y2, X3, }, // 147 {X4, Z4, Y4, X5, }, // 148 {X3, Z4, Y4, X4, }, // 149 {X3, Z3, Y4, X4, }, // 150 {X3, Z3, Y3, X4, }, // 151 {X2, Z3, Y3, X3, }, // 152 {X4^Y4^Z4, Z4, Y4, X5, }, // 153 {X3^Y4^Z4, Z4, Y4, X4, }, // 154 {X3^Z3^Y4, Z3, Y4, X4, }, // 155 {X3^Y3^Z3, Z3, Y3, X4, }, // 156 {X2^Y3^Z3, Z3, Y3, X3, }, // 157 {X4^Y5^Z5, Y4^Z4^X5, Y4, X5, }, // 158 {X3^Y5^Z5, X4^Y4^Z4, Y4, X4, }, // 159 {X3^Z4^Y5, Z3^X4^Y4, Y4, X4, }, // 160 {X3^Y4^Z4, Y3^Z3^X4, Y3, X4, }, // 161 {X2^Y4^Z4, X3^Y3^Z3, Y3, X3, }, // 162 {X4, Y4^Z4^X5, Y4, X5, }, // 163 {X3, X4^Y4^Z4, Y4, X4, }, // 164 {X3, Z3^X4^Y4, Y4, X4, }, // 165 {X3, Y3^Z3^X4, Y3, X4, }, // 166 {X2, X3^Y3^Z3, Y3, X3, }, // 167 {X3, Z3, Y2, X4, }, // 168 {X2, Z3, Y2, X3, }, // 169 {X3, Z4, Y4, X5, }, // 170 {X2, Z4, Y3, X4, }, // 171 {X2, Z3, Y3, X4, }, // 172 {Y2, X3, Z4, Y4, }, // 173 {Z3, Y3, X4, Z4, }, // 174 {Z3^X6^Y6, Y3, X4, Z4, }, // 175 {X2^X6^Y6, Z4, Y3, X4, }, // 176 {X2^X6^Y6, Z3, Y3, X4, }, // 177 {X2^X6^Y6, Z3, Y2, X3, }, // 178 {Z3^Y6^X7, Z4^X6^Y7, Y3, X4, }, // 179 {X2^Y6^X7, Z4^X6^Y7, Y3, X4, }, // 180 {X2^Y6^X7, Z3^X6^Y7, Y3, X4, }, // 181 {X2^Y6^X7, Z3^X6^Y7, Y2, X3, }, // 182 {X6^Y7, Y6^X7, 0, 0, }, // 183 {Y5^X7, X6^Y6, 0, 0, }, // 184 {X5^Y6, Y5^X6, 0, 0, }, // 185 {Y4^X6, X5^Y5, 0, 0, }, // 186 {X4^Y5, Y4^X5, 0, 0, }, // 187 {Y5^X9, X7^Y7, Y6^X8, 0, }, // 188 {Y5^X8, X6^Y7, Y6^X7, 0, }, // 189 {Y4^X8, X6^Y6, Y5^X7, 0, }, // 190 {Y4^X7, X5^Y6, Y5^X6, 0, }, // 191 {Y3^X7, X5^Y5, Y4^X6, 0, }, // 192 {X6^Y9, Y6^X9, X7^Y8, Y7^X8, }, // 193 {X6^Y8, Y5^X9, X7^Y7, Y6^X8, }, // 194 {X5^Y8, Y5^X8, X6^Y7, Y6^X7, }, // 195 {Y3^X8, X5^Y7, X6^Y6, Y5^X7, }, // 196 {Y3^X7, X3^Y7, X5^Y6, Y5^X6, }, // 197 {X6, X7^Y9, Y6^X10, X8^Y8, }, // 198 {Y5, X6^Y9, Y6^X9, X7^Y8, }, // 199 {Y3, X6^Y8, Y5^X9, X7^Y7, }, // 200 {X3, Y3^X9, Y5^X8, X6^Y7, }, // 201 {Y2, X3^Y7, Y3^X8, X6^Y6, }, // 202 {Y6^X9, X7^Y8, Y7^X8, Z0^X5^Y5, }, // 203 {X6^Y8, Y6^X8, X7^Y7, Z0^X5^Y5, }, // 204 {X5^Y8, X6^Y7, Y6^X7, Z0^X5^Y5, }, // 205 {Y3^X7, X5^Y7, X6^Y6, Z0^X5^Y5, }, // 206 {X3^Y7, Y3^X6, X5^Y6, Z0^X5^Y5, }, // 207 {X6, Y6^X10, X7^Y9, Y7^X9, }, // 208 {X5, X6^Y9, Y6^X9, X7^Y8, }, // 209 {Y3, X5^Y9, X6^Y8, Y6^X8, }, // 210 {X3, Y3^X8, X5^Y8, X6^Y7, }, // 211 {Y2, X3^Y8, Y3^X7, X5^Y7, }, // 212 {X6, Y6, X7^Y10, Y7^X10, }, // 213 {Y3, X6, Y6^X10, X7^Y9, }, // 214 {X3, Y3, X6^Y9, Y6^X9, }, // 215 {Y2, X3, Y3^X9, X6^Y8, }, // 216 {X2, Y2, X3^Y8, Y3^X8, }, // 217 {Y6, X7^Y9, X8^Y8, Y7^X9, }, // 218 {X6, Y6^X9, X7^Y8, Y7^X8, }, // 219 {Y3, X6^Y8, X7^Y7, Y6^X8, }, // 220 {X3, Y3^X8, X6^Y7, Y6^X7, }, // 221 {Y2, X3^Y7, Y3^X7, X6^Y6, }, // 222 {Y3, X6, X7^Y9, Y6^X10, }, // 223 {X2, Y2, Y3^X8, X3^Y8, }, // 224 {X6^Y6, Y6, X7, X8^Y10, }, // 225 {X6^Y6, Y3, Y6, X7^Y10, }, // 226 {X6^Y6, X3, Y3, X7^Y9, }, // 227 {X6^Y6, Y2, X3, Y3^X10, }, // 228 {X6^Y6, X2, Y2, X3^Y8, }, // 229 {X6, X7, Y7^X10, X8^Y9, }, // 230 {Y3, X6, X7^Y9, Y7^X9, }, // 231 {X3, Y3, X6^Y9, X7^Y8, }, // 232 {Y2, X3, Y3^X8, X6^Y8, }, // 233 {X2, Y2, X3^Y8, Y3^X7, }, // 234 {X6^Y6, X6, X7, Y7^X11, }, // 235 {X6^Y6, Y3, X6, X7^Y10, }, // 236 {X6^Y6, X3, Y3, X6^Y10, }, // 237 {Z0^X6^Y6, Y2, X3, Y3^X9, }, // 238 {Z0^X6^Y6, X2, Y2, X3^Y9, }, // 239 {X6^Y6, X6^Y8, X7, Y7, }, // 240 {X6^Y6, X6^Y8, Y3, X7, }, // 241 {X6^Y6, X6^Y8, X3, Y3, }, // 242 {Z0^X6^Y6, X6^Y8, Y2, X3, }, // 243 {Z0^X6^Y6, X6^Y8, X2, Y2, }, // 244 {Y6^X7, X7, Y7, X8^Y10, }, // 245 {Y6^X7, Y3, X7, Y7^X10, }, // 246 {Y6^X7, X3, Y3, X7^Y9, }, // 247 {Z1^Y6^X7, Y2, X3, Y3^X9, }, // 248 {Z1^Y6^X7, X2, Y2, X3^Y8, }, // 249 {Y6^X7, X6^Y7, X7, Y7, }, // 250 {Y6^X7, X6^Y7, Y3, X7, }, // 251 {Y6^X7, X6^Y7, X3, Y3, }, // 252 {Z1^Y6^X7, Z0^X6^Y7, Y2, X3, }, // 253 {Z1^Y6^X7, Z0^X6^Y7, X2, Y2, }, // 254 {X5^Y7, X6^Y6, 0, 0, }, // 255 {Y5^X6, Y2^X5^Y6, 0, 0, }, // 256 {Y4^X6, X2^X5^Y5, 0, 0, }, // 257 {Y4^X5, Y1^X4^Y5, 0, 0, }, // 258 {X5^Y7, Y5^X7, Y2^X6^Y6, 0, }, // 259 {X5^Y6, Y4^X7, X2^Y5^X6, 0, }, // 260 {X3^Y6, Y4^X6, Y1^X5^Y5, 0, }, // 261 {Y5^X9, Y6^X8, X6^Y8, X7^Y7, }, // 262 {Y5^X8, X5^Y8, Y6^X7, Y2^X6^Y7, }, // 263 {Y3^X8, X5^Y7, Y5^X7, Y2^X6^Y6, }, // 264 {Y3^X7, X3^Y7, Y5^X6, Y1^X5^Y6, }, // 265 {Y3, X5^Y9, X6^Y8, X7^Y7, }, // 266 {Y2, Y3^X7, X3^Y8, X5^Y7, }, // 267 {Y6^X8, X6^Y8, X7^Y7, Z0^X5^Y5, }, // 268 {X5^Y8, Y6^X7, Y2^X6^Y7, Z0^X5^Y5, }, // 269 {Y3^X7, X5^Y7, X2^X6^Y6, Z0^X5^Y5, }, // 270 {Y3^X6, X3^Y7, Y1^X5^Y6, Z0^X5^Y5, }, // 271 {Y3, X5, X6^Y10, Y7^X9, }, // 272 {X3, Y3, X5^Y10, X6^Y9, }, // 273 {Y2, X3, Y3^X8, X5^Y9, }, // 274 {X2, Y2, Y3^X7, X3^Y9, }, // 275 {Y3, X6^Y8, Y6^X8, Y2^X7^Y7, }, // 276 {X3, Y3^X8, X6^Y7, X2^Y6^X7, }, // 277 {Y2, Y3^X7, X3^Y7, Y1^X6^Y6, }, // 278 {Y3, X6, Y6^X10, Y7^X9, }, // 279 {X3, Y3, Y6^X9, X6^Y9, }, // 280 {X2, X3, Y3^X9, X6^Y8, }, // 281 {X6^Y6, Y2, X3, Y3^X9, }, // 282 {X6^Y6, X2, Y2, Y3^X8, }, // 283 {Y3, X6, Y7^X9, X7^Y9, }, // 284 {X3, Y3, X6^Y9, Y7^X8, }, // 285 {X2, Y2, Y3^X7, X3^Y8, }, // 286 {Z0^Y6^X7, Y2, X3, Y3^X9, }, // 287 {Z0^Y6^X7, X2, Y2, Y3^X8, }, // 288 {Z0^Y6^X7, Z4^X6^Y7, X2, X3, }, // 289 {Z0^Y6^X7, Z4^X6^Y7, X2, Y2, }, // 290 {X5^Y6, Y2^Y5^X6, 0, 0, }, // 291 {X2^X5^Y6, Y2^Y5^X6, 0, 0, }, // 292 {X2^X5^Y5, Y1^Y4^X6, 0, 0, }, // 293 {X1^X4^Y5, Y1^Y4^X5, 0, 0, }, // 294 {Y4^X8, X2^X6^Y6, Y2^Y5^X7, 0, }, // 295 {Y4^X7, Y2^Y5^X6, Y1^X5^Y6, 0, }, // 296 {Y3^X7, X1^X5^Y5, Y1^Y4^X6, 0, }, // 297 {X5^Y8, X6^Y7, Y5^X8, Y2^Y6^X7, }, // 298 {X5^Y8, Y5^X8, X2^Y6^X7, Y2^X6^Y7, }, // 299 {Y3^X8, X5^Y7, X2^Y5^X7, Y1^X6^Y6, }, // 300 {Y3^X7, X3^Y7, X1^Y5^X6, Y1^X5^Y6, }, // 301 {Y3, Y5^X9, X6^Y8, Y6^X8, }, // 302 {Y3, X6^Y8, Y5^X9, X2^X7^Y7, }, // 303 {X3, Y3^X9, Y5^X8, Y2^Y6^X7, }, // 304 {Y2, X3^Y7, Y3^X8, X1^X6^Y6, }, // 305 {X5^Y8, X6^Y7, Y2^Y6^X7, Z0^X5^Y5, }, // 306 {X5^Y8, X2^X6^Y7, Y2^Y6^X7, Z0^X5^Y5, }, // 307 {Y3^X8, Y2^Y5^X7, Y1^X6^Y6, Z0^X5^Y5, }, // 308 {Y3^X7, Y2^X6^Y6, X1^X5^Y7, Y1^X5^Y5, }, // 309 {Y3, X5^Y9, X6^Y8, X2^Y6^X8, }, // 310 {X3, Y3^X8, X5^Y8, X2^Y6^X7, }, // 311 {Y2, Y3^X8, X3^Y7, X1^Y5^X7, }, // 312 {Y3, X6^Y8, X2^X7^Y7, Y2^Y6^X8, }, // 313 {X3, Y3^X8, Y2^Y6^X7, Y1^X6^Y7, }, // 314 {X3, Y3^X8, Y2^Y6^X7, X1^X6^Y7, }, // 315 {X6^Y6, X3, Y3, Y6^X10, }, // 316 {X6^Y6, X2, X3, Y3^X10, }, // 317 {X3, Y3, X6^Y9, X2^X7^Y8, }, // 318 {X2, X3, Y3^X9, Y2^Y6^X8, }, // 319 {X2, X3, Y3^X8, Y2^X7^Y7, }, // 320 {Z3^X6^Y6, Y2, X3, Y3^X9, }, // 321 {Z3^X6^Y6, X2, Y2, Y3^X9, }, // 322 {Z3^X6^Y6, X6^Y8, Y2, X3, }, // 323 {Z3^X6^Y6, X6^Y8, X2, Y2, }, // 324 {Z4^Y6^X7, X2, X3, Y3^X9, }, // 325 {Y1^Y6^X7, X2, X3, Y3^X9, }, // 326 {Z4^Y6^X7, Z3^X6^Y7, Y2, X3, }, // 327 {Z4^Y6^X7, Z3^X6^Y7, X2, Y2, }, // 328 {Y1^Y4^X6, X2^X5^Y5, 0, 0, }, // 329 {Y1^X5^Y7, X2^X6^Y6, Y2^Y5^X7, 0, }, // 330 {X1^X5^Y6, Y1^Y4^X7, X2^Y5^X6, 0, }, // 331 {Y5^X8, Y1^X5^Y8, X2^X6^Y7, Y2^Y6^X7, }, // 332 {Y3^X8, Y1^X5^Y7, X1^Y5^X7, Y2^X6^Y6, }, // 333 {Y3^X7, Y1^X4^Y7, Y2^X5^Y6, X1^Y5^X6, }, // 334 {Y3, X5^Y9, X6^Y8, X2^X7^Y7, }, // 335 {Y3, X5^Y9, Y1^X6^Y8, X2^X7^Y7, }, // 336 {X3, Y3^X8, X5^Y7, X1^X6^Y6, }, // 337 {Y2, Y3^X7, X3^Y7, Y0^X5^Y6, }, // 338 {Y1^X5^Y8, X2^X6^Y7, Y2^Y6^X7, Z0^X5^Y5, }, // 339 {X1^X5^Y8, Y2^Y6^X7, X2^X6^Y7, Y1^X5^Y5, }, // 340 {X1^X5^Y8, X2^X6^Y7, Y2^Y6^X7, Y1^X5^Y5, }, // 341 {Y3, X5^Y9, Y1^X6^Y8, X2^Y6^X8, }, // 342 {X3, Y3^X9, Y1^X6^Y7, X1^Y5^X8, }, // 343 {X3, Y3^X8, Y1^X5^Y8, Y2^X6^Y7, }, // 344 {X3, Y3, X5^Y10, Y1^X6^Y9, }, // 345 {Y2, X3, Y3^X8, X5^Y8, }, // 346 {Y3, Y1^X6^Y8, X2^X7^Y7, Y2^Y6^X8, }, // 347 {Y3, X1^X6^Y8, Y2^Y6^X8, X2^X7^Y7, }, // 348 {Y3, X1^X6^Y8, X2^X7^Y7, Y2^Y6^X8, }, // 349 {X3, Y3, Y6^X9, Y1^X6^Y9, }, // 350 {X2, X3, Y3^X9, Y1^X6^Y8, }, // 351 {X2^X6^Y6, Y2, X3, Y3^X9, }, // 352 {Y1^X6^Y6, X2, Y2, Y3^X8, }, // 353 {X3, Y3, Y1^X6^Y9, X2^X7^Y8, }, // 354 {X3, Y3, X1^X6^Y9, Y2^Y7^X8, }, // 355 {X3, Y3, X1^X6^Y9, X2^X7^Y8, }, // 356 {Z2^X6^Y6, X2, X3, Y3^X10, }, // 357 {Y0^X6^Y6, X2, X3, Y3^X9, }, // 358 {Z2^X6^Y6, X6^Y8, Y2, X3, }, // 359 {Z2^X6^Y6, Y1^X6^Y8, X2, Y2, }, // 360 {Y6^X7, X3, Y3, Y1^X7^Y9, }, // 361 {Y1^Y6^X7, X3, Y3, X1^X7^Y9, }, // 362 {Y0^Y6^X7, X3, Y3, X1^X7^Y9, }, // 363 {Z3^Y6^X7, Z2^X6^Y7, X2, X3, }, // 364 {Z2^Y6^X7, Y0^X6^Y7, X2, X3, }, // 365 {Y5^X9, X6^Y8, Y6^X8, X7^Y7, }, // 366 {Y4^X8, X5^Y7, Y5^X7, X6^Y6, }, // 367 {X4^Y7, Y4^X7, X5^Y6, Y5^X6, }, // 368 {X5^Y7, Y4^X8, X6^Y6, Y5^X7, }, // 369 {X3^Y7, Y4^X7, X5^Y6, Y5^X6, }, // 370 {Y5, X6^Y8, X7^Y7, Y6^X8, }, // 371 {Y3, Y5^X8, X6^Y7, Y6^X7, }, // 372 {X3, Y3^X8, X6^Y6, Y5^X7, }, // 373 {Y2, Y3^X7, X3^Y6, Y5^X6, }, // 374 {X5, X6^Y8, Y6^X8, X7^Y7, }, // 375 {Y3, X5^Y8, X6^Y7, Y6^X7, }, // 376 {X3, Y3^X7, X5^Y7, X6^Y6, }, // 377 {Y2, X3^Y7, Y3^X6, X5^Y6, }, // 378 {X6, Y6, X7^Y8, Y7^X8, }, // 379 {Y3, X6, Y6^X8, X7^Y7, }, // 380 {X3, Y3, X6^Y7, Y6^X7, }, // 381 {Y2, X3, Y3^X7, X6^Y6, }, // 382 {X2, Y2, X3^Y6, Y3^X6, }, // 383 {Y6, X7^Y8, Y7^X8, X5^Y6, }, // 384 {X6, X7^Y7, Y6^X8, X5^Y6, }, // 385 {Y3, X6^Y7, Y6^X7, X5^Y6, }, // 386 {X3, Y3^X7, X6^Y6, Z0^X5^Y6, }, // 387 {Y2, Y3^X6, X3^Y6, Z0^X5^Y6, }, // 388 {Y3, X6, X7^Y7, Y6^X8, }, // 389 {X2, Y2, Y3^X6, X3^Y6, }, // 390 {X6^Y6, Y6, X7, Y7^X8, }, // 391 {X6^Y6, Y3, Y6, X7^Y7, }, // 392 {X6^Y6, X3, Y3, Y6^X7, }, // 393 {X6^Y6, Y2, X3, Y3^X7, }, // 394 {X3^Y6, X2, Y2, Y3^X6, }, // 395 {X6, X7, Y7^X8, X6^Y6, }, // 396 {Y3, X6, X7^Y7, X6^Y6, }, // 397 {X3, Y3, X6^Y7, X6^Y6, }, // 398 {Y2, X3, Y3^X7, Z0^X6^Y6, }, // 399 {X2, X3, Y3^X6, Y2^X6^Y6, }, // 400 {X6^Y6, X6, X7, Y7^X8, }, // 401 {X6^Y6, Y3, X6, X7^Y7, }, // 402 {X6^Y6, X3, Y3, X6^Y7, }, // 403 {Z0^X6^Y6, Y2, X3, Y3^X7, }, // 404 {Y2^X6^Y6, X2, X3, Y3^X6, }, // 405 {Z0^X6^Y6, X3^Y8, Y2, Y3, }, // 406 {Y2^X6^Y6, X3^Y8, X2, Y3, }, // 407 {Y6^X7, X7, Y7, X6^Y7, }, // 408 {Y6^X7, Y3, X7, X6^Y7, }, // 409 {Y6^X7, X3, Y3, X6^Y7, }, // 410 {Y2^Y6^X7, X3, Y3, Z0^X6^Y7, }, // 411 {Y2^Y6^X7, X3, Y3, X2^X6^Y7, }, // 412 {Y2^Y6^X7, Z0^X6^Y7, X3, Y3, }, // 413 {Y2^Y6^X7, X2^X6^Y7, X3, Y3, }, // 414 {X5^Y9, Y6^X8, X6^Y8, X7^Y7, }, // 415 {Y4^X8, X5^Y7, Y5^X7, X2^X6^Y6, }, // 416 {Y4^X7, X4^Y7, Y5^X6, Y1^X5^Y6, }, // 417 {Y4^X8, X5^Y7, Y5^X7, Y2^X6^Y6, }, // 418 {Y4^X7, X3^Y7, Y5^X6, Y1^X5^Y6, }, // 419 {X5, Y6^X8, X6^Y8, X7^Y7, }, // 420 {Y3, X5^Y8, Y6^X7, Y2^X6^Y7, }, // 421 {X3, Y3^X7, X5^Y7, Y2^X6^Y6, }, // 422 {Y2, Y3^X6, X3^Y7, Y1^X5^Y6, }, // 423 {X3, Y3^X7, X5^Y7, X2^X6^Y6, }, // 424 {Y3, X5, X6^Y8, X7^Y7, }, // 425 {X3, Y3, X5^Y8, X6^Y7, }, // 426 {X3, Y3, X5^Y8, Y2^X6^Y7, }, // 427 {Y2, X3, Y3^X6, X5^Y6, }, // 428 {X2, Y2, Y3^X5, X3^Y6, }, // 429 {X6, Y6^X8, X7^Y7, X5^Y6, }, // 430 {Y3, Y6^X7, Y2^X6^Y7, X5^Y6, }, // 431 {X3, Y3^X7, Y2^X6^Y6, Z0^X5^Y6, }, // 432 {X3, Y3^X7, Y2^X6^Y6, Y1^X5^Y6, }, // 433 {X3, Y3, Y6^X7, Y2^X6^Y7, }, // 434 {X2, X3, Y3^X7, Y2^X6^Y6, }, // 435 {X6^Y6, X3, Y3, Y2^X6^Y7, }, // 436 {X3, Y3, Y2^X6^Y7, X6^Y6, }, // 437 {X3, Y3, X2^X6^Y7, Y2^X6^Y6, }, // 438 {Y2^X6^Y6, X3, Y3, X2^X6^Y7, }, // 439 {X6^Y6, X6^Y8, Y3, Y7, }, // 440 {X6^Y6, Y2^X6^Y8, X3, Y3, }, // 441 {Y2^X6^Y6, X2^X6^Y8, X3, Y3, }, // 442 {Y6^X7, Y3, Y7, X6^Y7, }, // 443 {Y6^X7, X3, Y3, Y2^X6^Y7, }, // 444 {Y6^X7, X6^Y7, Y3, Y7, }, // 445 {Y6^X7, Y2^X6^Y7, X3, Y3, }, // 446 {X5^Y8, Y5^X8, X6^Y7, Y2^Y6^X7, }, // 447 {X5^Y8, Y5^X8, X2^X6^Y7, Y2^Y6^X7, }, // 448 {Y4^X8, X5^Y7, X2^X6^Y6, Y1^Y5^X7, }, // 449 {X4^Y7, Y4^X7, X1^X5^Y6, Y1^Y5^X6, }, // 450 {Y4^X9, X6^Y7, Y5^X8, Y2^Y6^X7, }, // 451 {X5^Y7, Y4^X8, X2^Y5^X7, Y1^X6^Y6, }, // 452 {X3^Y7, Y4^X7, X1^Y5^X6, Y1^X5^Y6, }, // 453 {Y3, X6^Y7, Y5^X8, Y2^Y6^X7, }, // 454 {Y3, Y5^X8, X2^Y6^X7, Y2^X6^Y7, }, // 455 {X3, Y3^X8, X2^Y5^X7, Y1^X6^Y6, }, // 456 {Y2, Y3^X6, X3^Y6, X1^X5^Y5, }, // 457 {Y3, X5^Y8, X6^Y7, Y2^Y6^X7, }, // 458 {Y3, X5^Y8, X2^X6^Y7, Y2^Y6^X7, }, // 459 {X3, Y3^X8, Y2^Y5^X7, Y1^X6^Y6, }, // 460 {X3, Y3^X7, Y2^X6^Y6, X1^X5^Y7, }, // 461 {X3, Y3, X6^Y7, Y2^Y6^X7, }, // 462 {X3, Y3, X2^X6^Y7, Y2^Y6^X7, }, // 463 {X2, X3, Y3^X7, Y2^Y5^X6, }, // 464 {X2, X3, Y3^X6, Y2^X5^Y6, }, // 465 {Y3, X6^Y7, Y2^Y6^X7, X5^Y6, }, // 466 {Y3, X2^Y6^X7, Y2^X6^Y7, X5^Y6, }, // 467 {Y3, X2^Y6^X7, Y2^X6^Y7, Z0^X5^Y6, }, // 468 {Y3, X2^Y6^X7, Y2^X6^Y7, X1^X5^Y6, }, // 469 {X3, Y3, X2^Y6^X7, Y2^X6^Y7, }, // 470 {X6^Y6, X3, Y3, Y2^Y6^X7, }, // 471 {Y2^X6^Y6, X3, Y3, X2^X6^Y6, }, // 472 {X3, Y3, Y2^Y6^X7, X6^Y6, }, // 473 {Y2^Y6^X7, X3, Y3, X6^Y7, }, // 474 {Y2^Y6^X7, X6^Y7, X3, Y3, }, // 475 {Y4^X8, X1^X5^Y7, Y1^Y5^X7, X2^X6^Y6, }, // 476 {Y4^X7, Y0^X4^Y7, X1^X5^Y6, Y1^Y5^X6, }, // 477 {Y4^X8, Y1^X5^Y7, X1^Y5^X7, Y2^X6^Y6, }, // 478 {Y3^X7, Y0^X4^Y6, X1^Y4^X6, Y1^X5^Y5, }, // 479 {Y3, X5^Y8, X2^Y6^X7, Y2^X6^Y7, }, // 480 {Y3, Y1^X5^Y8, X2^X6^Y7, Y2^Y6^X7, }, // 481 {X3, Y3^X7, Y1^X5^Y6, X1^Y5^X6, }, // 482 {X3, Y3^X6, Y1^X4^Y6, Y2^X5^Y5, }, // 483 {Y3, X1^X5^Y8, Y2^Y6^X7, X2^X6^Y7, }, // 484 {Y3, X1^X5^Y8, X2^X6^Y7, Y2^Y6^X7, }, // 485 {X3, Y3, Y1^X5^Y7, X2^X6^Y6, }, // 486 {X3, Y3, X1^X5^Y7, Y2^X6^Y6, }, // 487 {X3, Y3, X1^X5^Y7, X2^X6^Y6, }, // 488 {Y3, X2^Y6^X7, Y1^X6^Y7, Y2^X5^Y6, }, // 489 {X3, Y3, X2^Y6^X7, Y1^X6^Y7, }, // 490 {X2^X6^Y6, X3, Y3, Y1^X6^Y6, }, // 491 {X2^X6^Y6, X3, Y3, Y2^X6^Y6, }, // 492 {X3, Y3, Y1^X6^Y7, X2^X6^Y6, }, // 493 {Y2^X6^Y6, X3, Y3, Y1^X6^Y7, }, // 494 {Y2^X6^Y6, Y1^X6^Y8, X3, Y3, }, // 495 {Y2^Y6^X7, X3, Y3, Y1^X6^Y7, }, // 496 {X6^X8^Y8, Y6, X7, X8^Y10, }, // 497 {X6^X8^Y8, Y3, Y6, X7^Y10, }, // 498 {X6^X8^Y8, X3, Y3, X7^Y9, }, // 499 {X6^X8^Y8, Y2, X3, Y3^X10, }, // 500 {X6^X8^Y8, X2, Y2, X3^Y8, }, // 501 {Z0^X6^Y6, X6, X7, Y7^X11, }, // 502 {Z0^X6^Y6, Y3, X6, X7^Y10, }, // 503 {Z0^X6^Y6, X3, Y3, X6^Y10, }, // 504 {Z0^X6^Y6, X6^X9^Y9, X7, Y7, }, // 505 {Z0^X6^Y6, X6^X9^Y9, Y3, X7, }, // 506 {Z0^X6^Y6, X6^X9^Y9, X3, Y3, }, // 507 {Z0^X6^Y6, X6^X9^Y9, Y2, X3, }, // 508 {Z0^X6^Y6, X6^X9^Y9, X2, Y2, }, // 509 {Z1^Y6^X7, X7, Y7, X8^Y10, }, // 510 {Z1^Y6^X7, Y3, X7, Y7^X10, }, // 511 {Z1^Y6^X7, X3, Y3, X7^Y9, }, // 512 {Z1^Y6^X7, Z0^X6^Y7, X7, Y7, }, // 513 {Z1^Y6^X7, Z0^X6^Y7, Y3, X7, }, // 514 {Z1^Y6^X7, Z0^X6^Y7, X3, Y3, }, // 515 {Y6^X7, S0^X6^Y7, 0, 0, }, // 516 {Y5^X7, S0^X6^Y6, 0, 0, }, // 517 {Y5^X6, S0^X5^Y6, 0, 0, }, // 518 {Y4^X6, S0^X5^Y5, 0, 0, }, // 519 {Y4^X5, S0^X4^Y5, 0, 0, }, // 520 {X6^Y8, Y6^X8, S0^X7^Y7, 0, }, // 521 {X6^Y7, Y5^X8, S0^Y6^X7, 0, }, // 522 {X5^Y7, Y5^X7, S0^X6^Y6, 0, }, // 523 {X5^Y6, Y4^X7, S0^Y5^X6, 0, }, // 524 {X3^Y6, Y4^X6, S0^X5^Y5, 0, }, // 525 {Y6^X9, X6^Y9, Y7^X8, S0^X7^Y8, }, // 526 {Y5^X9, X6^Y8, Y6^X8, S0^X7^Y7, }, // 527 {Y5^X8, X5^Y8, Y6^X7, S0^X6^Y7, }, // 528 {Y3^X8, X5^Y7, Y5^X7, S0^X6^Y6, }, // 529 {Y3^X7, X3^Y7, Y5^X6, S0^X5^Y6, }, // 530 {Y5, X6^Y9, X7^Y8, Y6^X9, }, // 531 {X3, Y3^X9, X6^Y7, Y5^X8, }, // 532 {Y2, Y3^X8, X3^Y7, Y5^X7, }, // 533 {Y6^X9, Y7^X8, S0^X7^Y8, Z0^X5^Y5, }, // 534 {X6^Y8, Y6^X8, S0^X7^Y7, Z0^X5^Y5, }, // 535 {X5^Y8, Y6^X7, S0^X6^Y7, Z0^X5^Y5, }, // 536 {Y3^X7, X5^Y7, S0^X6^Y6, Z0^X5^Y5, }, // 537 {Y3^X6, X3^Y7, S0^X5^Y6, Z0^X5^Y5, }, // 538 {X6, Y6, Y7^X10, X7^Y10, }, // 539 {Y6, X7^Y9, Y7^X9, S0^X8^Y8, }, // 540 {X6, X7^Y8, Y6^X9, S0^Y7^X8, }, // 541 {Y3, X6^Y8, Y6^X8, S0^X7^Y7, }, // 542 {X3, Y3^X8, X6^Y7, S0^Y6^X7, }, // 543 {Y2, Y3^X7, X3^Y7, S0^X6^Y6, }, // 544 {X6^X8^Y8, Y6, X7, Y7^X11, }, // 545 {X6^X8^Y8, X3, Y3, Y6^X10, }, // 546 {X6^X8^Y8, X2, Y2, Y3^X9, }, // 547 {X6, X7, Y7^X10, Y8^X9, }, // 548 {Z0^Y6^X7, X7, Y7, X8^Y10, }, // 549 {Z0^Y6^X7, Y3, X7, X8^Y9, }, // 550 {Z0^Y6^X7, X3, Y3, X7^Y9, }, // 551 {Z0^Y6^X7, Z4^X6^Y7, X7, Y7, }, // 552 {Z0^Y6^X7, Z4^X6^Y7, Y3, X7, }, // 553 {Z0^Y6^X7, Z4^X6^Y7, X3, Y3, }, // 554 {Z0^Y6^X7, Z4^X6^Y7, Y2, X3, }, // 555 {S0^X6^Y7, S1^Y6^X7, 0, 0, }, // 556 {S0^Y5^X7, S1^X6^Y6, 0, 0, }, // 557 {S0^X5^Y6, S1^Y5^X6, 0, 0, }, // 558 {S0^Y4^X6, S1^X5^Y5, 0, 0, }, // 559 {S0^X4^Y5, S1^Y4^X5, 0, 0, }, // 560 {Y5^X9, S0^X7^Y7, S1^Y6^X8, 0, }, // 561 {Y5^X8, S0^X6^Y7, S1^Y6^X7, 0, }, // 562 {Y4^X8, S0^X6^Y6, S1^Y5^X7, 0, }, // 563 {Y4^X7, S0^X5^Y6, S1^Y5^X6, 0, }, // 564 {Y3^X7, S0^X5^Y5, S1^Y4^X6, 0, }, // 565 {X6^Y9, Y6^X9, S0^X7^Y8, S1^Y7^X8, }, // 566 {X6^Y8, Y5^X9, S0^X7^Y7, S1^Y6^X8, }, // 567 {X5^Y8, Y5^X8, S0^X6^Y7, S1^Y6^X7, }, // 568 {Y3^X8, X5^Y7, S0^X6^Y6, S1^Y5^X7, }, // 569 {Y3^X7, X3^Y7, S0^X5^Y6, S1^Y5^X6, }, // 570 {X6, X7^Y9, Y6^X10, S0^X8^Y8, }, // 571 {Y5, X6^Y9, Y6^X9, S0^X7^Y8, }, // 572 {Y3, X6^Y8, Y5^X9, S0^X7^Y7, }, // 573 {X3, Y3^X9, Y5^X8, S0^X6^Y7, }, // 574 {Y2, X3^Y7, Y3^X8, S0^X6^Y6, }, // 575 {Y6^X9, S0^X7^Y8, S1^Y7^X8, Z0^X5^Y5, }, // 576 {X6^Y8, S0^Y6^X8, S1^X7^Y7, Z0^X5^Y5, }, // 577 {X5^Y8, S0^X6^Y7, S1^Y6^X7, Z0^X5^Y5, }, // 578 {Y3^X8, S0^X6^Y6, S1^Y5^X7, Z0^X5^Y5, }, // 579 {Y3^X6, X3^Y7, S0^X5^Y6, S1^X5^Y5, }, // 580 {X6, Y6^X10, X7^Y9, S0^Y7^X9, }, // 581 {X5, X6^Y9, Y6^X9, S0^X7^Y8, }, // 582 {Y3, X5^Y9, X6^Y8, S0^Y6^X8, }, // 583 {X3, Y3^X8, X5^Y8, S0^X6^Y7, }, // 584 {Y2, Y3^X8, X3^Y7, S0^X6^Y6, }, // 585 {Y6, X7^Y9, S0^X8^Y8, S1^Y7^X9, }, // 586 {X6, Y6^X9, S0^X7^Y8, S1^Y7^X8, }, // 587 {Y3, X6^Y8, S0^X7^Y7, S1^Y6^X8, }, // 588 {X3, Y3^X8, S0^X6^Y7, S1^Y6^X7, }, // 589 {X6, X7, Y7^X10, S0^X8^Y9, }, // 590 {Y3, X6, X7^Y9, S0^Y7^X9, }, // 591 {X3, Y3, X6^Y9, S0^X7^Y8, }, // 592 {Y2, X3, Y3^X9, S0^X7^Y7, }, // 593 {Z3^X6^Y6, X6, X7, Y7^X11, }, // 594 {Z3^X6^Y6, Y3, X6, X7^Y10, }, // 595 {Z3^X6^Y6, X3, Y3, X6^Y10, }, // 596 {Z3^X6^Y6, X6^X9^Y9, X7, Y7, }, // 597 {Z3^X6^Y6, X6^X9^Y9, Y3, X7, }, // 598 {Z3^X6^Y6, X6^X9^Y9, X3, Y3, }, // 599 {Z3^X6^Y6, X6^X9^Y9, Y2, X3, }, // 600 {Z3^X6^Y6, X6^X9^Y9, X2, Y2, }, // 601 {Z4^Y6^X7, X7, Y7, X8^Y10, }, // 602 {Z4^Y6^X7, Y3, X7, Y7^X10, }, // 603 {Z4^Y6^X7, X3, Y3, X7^Y9, }, // 604 {Z4^Y6^X7, Y2, X3, Y3^X9, }, // 605 {S1^Y6^X7, X2, Y2, Y3^X8, }, // 606 {Z4^Y6^X7, Z3^X6^Y7, X7, Y7, }, // 607 {Z4^Y6^X7, Z3^X6^Y7, Y3, X7, }, // 608 {Z4^Y6^X7, Z3^X6^Y7, X3, Y3, }, // 609 {S1^Y6^X7, S2^X6^Y7, 0, 0, }, // 610 {S1^Y5^X7, S2^X6^Y6, 0, 0, }, // 611 {S1^Y5^X6, S2^X5^Y6, 0, 0, }, // 612 {S1^Y4^X6, S2^X5^Y5, 0, 0, }, // 613 {S1^Y4^X5, S2^X4^Y5, 0, 0, }, // 614 {S0^X6^Y8, S1^Y6^X8, S2^X7^Y7, 0, }, // 615 {S0^X6^Y7, S1^Y5^X8, S2^Y6^X7, 0, }, // 616 {S0^X5^Y7, S1^Y5^X7, S2^X6^Y6, 0, }, // 617 {S0^X5^Y6, S1^Y4^X7, S2^Y5^X6, 0, }, // 618 {Y6^X9, S0^X6^Y9, S1^Y7^X8, S2^X7^Y8, }, // 619 {Y5^X9, S0^X6^Y8, S1^Y6^X8, S2^X7^Y7, }, // 620 {Y5^X8, S0^X5^Y8, S1^Y6^X7, S2^X6^Y7, }, // 621 {Y3^X8, S0^X5^Y7, S1^Y5^X7, S2^X6^Y6, }, // 622 {Y3^X6, X3^Y7, S0^X4^Y6, S1^X5^Y5, }, // 623 {X6, Y6^X10, S0^X7^Y9, S1^Y7^X9, }, // 624 {Y5, X6^Y9, S0^X7^Y8, S1^Y6^X9, }, // 625 {Y3, Y5^X9, S0^X6^Y8, S1^Y6^X8, }, // 626 {X3, Y3^X9, S0^X6^Y7, S1^Y5^X8, }, // 627 {Y2, Y3^X8, S0^X5^Y7, S1^Y5^X7, }, // 628 {S0^X6^Y9, S1^Y7^X8, S2^X7^Y8, Z0^X5^Y5, }, // 629 {S0^X6^Y8, S1^Y6^X8, S2^X7^Y7, Z0^X5^Y5, }, // 630 {S0^X5^Y8, S1^Y6^X7, S2^X6^Y7, Z0^X5^Y5, }, // 631 {Y3^X7, S0^X5^Y7, S1^X6^Y6, S2^X5^Y5, }, // 632 {X5, X6^Y9, S0^Y6^X9, S1^X7^Y8, }, // 633 {Y3, X5^Y9, S0^X6^Y8, S1^Y6^X8, }, // 634 {Y2, Y3^X7, X3^Y8, S0^X5^Y7, }, // 635 {X6, Y6, Y7^X10, S0^X7^Y10, }, // 636 {Y3, X6, Y6^X10, S0^X7^Y9, }, // 637 {X3, Y3, Y6^X9, S0^X6^Y9, }, // 638 {Y2, X3, Y3^X9, S0^X6^Y8, }, // 639 {X2, Y2, Y3^X8, S0^X5^Y8, }, // 640 {Y6, S0^X7^Y9, S1^Y7^X9, S2^X8^Y8, }, // 641 {X6, S0^X7^Y8, S1^Y6^X9, S2^Y7^X8, }, // 642 {Y3, S0^X6^Y8, S1^Y6^X8, S2^X7^Y7, }, // 643 {X3^X8^Y8, X2, Y2, Y3^X9, }, // 644 {X6, Y7, S0^X7^Y10, S1^Y8^X9, }, // 645 {Y3, X6, S0^X7^Y9, S1^Y7^X9, }, // 646 {X3, Y3, S0^X6^Y9, S1^Y7^X8, }, // 647 {Y2, X3, Y3^X8, S0^X6^Y8, }, // 648 {Z2^X6^Y6, X6, X7, Y7^X11, }, // 649 {Z2^X6^Y6, Y3, X6, X7^Y10, }, // 650 {Z2^X6^Y6, X3, Y3, X6^Y10, }, // 651 {Z2^X6^Y6, Y2, X3, Y3^X10, }, // 652 {S2^X6^Y6, X2, Y2, Y3^X8, }, // 653 {Z2^X6^Y6, X6^X9^Y9, X7, Y7, }, // 654 {Z2^X6^Y6, X6^X9^Y9, Y3, X7, }, // 655 {Z2^X6^Y6, X6^X9^Y9, X3, Y3, }, // 656 {Z2^X6^Y6, X6^X9^Y9, Y2, X3, }, // 657 {Z2^X6^Y6, X3^X9^Y9, X2, Y2, }, // 658 {Z3^Y6^X7, X7, Y7, S0^X8^Y10, }, // 659 {Z3^Y6^X7, Y3, X7, S0^X8^Y9, }, // 660 {Z3^Y6^X7, X3, Y3, S0^X7^Y9, }, // 661 {S2^Y6^X7, Y2, X3, Y3^X9, }, // 662 {S2^Y6^X7, X2, Y2, Y3^X8, }, // 663 {Z3^Y6^X7, Z2^X6^Y7, X7, Y7, }, // 664 {Z3^Y6^X7, Z2^X6^Y7, Y3, X7, }, // 665 {Z3^Y6^X7, Z2^X6^Y7, X3, Y3, }, // 666 {Z3^Y6^X7, Z2^X6^Y7, Y2, X3, }, // 667 {Z2^Y6^X7, S2^X6^Y7, X2, Y2, }, // 668 {Y6, X7^Y8, Y7^X8, Z0^X5^Y6, }, // 669 {X6, X7^Y7, Y6^X8, Z0^X5^Y6, }, // 670 {Y3, X6^Y7, Y6^X7, Z0^X5^Y6, }, // 671 {X6^X8^Y8, Y6, X7, Y7^X8, }, // 672 {X6^X8^Y8, Y3, Y6, X7^Y7, }, // 673 {X6^X8^Y8, X3, Y3, Y6^X7, }, // 674 {X6^X8^Y8, Y2, X3, Y3^X7, }, // 675 {X3^X8^Y8, X2, Y2, Y3^X6, }, // 676 {X6, X7, Y7^X8, Z0^X6^Y6, }, // 677 {Y3, X6, X7^Y7, Z0^X6^Y6, }, // 678 {X3, Y3, X6^Y7, Z0^X6^Y6, }, // 679 {Z0^X6^Y6, X6, X7, Y7^X8, }, // 680 {Z0^X6^Y6, Y3, X6, X7^Y7, }, // 681 {Z0^X6^Y6, X3, Y3, X6^Y7, }, // 682 {Z0^X6^Y6, X3^X9^Y9, Y2, Y3, }, // 683 {Y2^X6^Y6, X3^X9^Y9, X2, Y3, }, // 684 {Z1^Y6^X7, X7, Y7, Z0^X6^Y7, }, // 685 {Z1^Y6^X7, Y3, X7, Z0^X6^Y7, }, // 686 {Z1^Y6^X7, X3, Y3, Z0^X6^Y7, }, // 687 {Y4^X8, X5^Y7, Y5^X7, S0^X6^Y6, }, // 688 {Y4^X7, X4^Y7, Y5^X6, S0^X5^Y6, }, // 689 {Y4^X7, X3^Y7, Y5^X6, S0^X5^Y6, }, // 690 {X6, Y6^X9, Y7^X8, S0^X7^Y8, }, // 691 {Y5, X6^Y8, Y6^X8, S0^X7^Y7, }, // 692 {Y3, Y5^X8, Y6^X7, S0^X6^Y7, }, // 693 {X3, Y3^X8, Y5^X7, S0^X6^Y6, }, // 694 {Y2, Y3^X6, X3^Y6, X5^Y5, }, // 695 {X5, X6^Y8, Y6^X8, S0^X7^Y7, }, // 696 {Y3, X5^Y8, Y6^X7, S0^X6^Y7, }, // 697 {X3, Y3^X7, X5^Y7, S0^X6^Y6, }, // 698 {Y2, Y3^X6, X3^Y7, S0^X5^Y6, }, // 699 {X6, Y6, Y7^X8, S0^X7^Y8, }, // 700 {Y3, X6, Y6^X8, S0^X7^Y7, }, // 701 {X3, Y3, Y6^X7, S0^X6^Y7, }, // 702 {Y2, X3, Y3^X7, S0^X6^Y6, }, // 703 {Y6, Y7^X8, S0^X7^Y8, Z0^X5^Y6, }, // 704 {X6, Y6^X8, S0^X7^Y7, Z0^X5^Y6, }, // 705 {Y3, Y6^X7, S0^X6^Y7, Z0^X5^Y6, }, // 706 {X3, Y3^X7, S0^X6^Y6, Z0^X5^Y6, }, // 707 {Y2, Y3^X6, X3^Y6, S0^X5^Y6, }, // 708 {X6^X8^Y8, Y6, Y7, S0^X7^Y8, }, // 709 {X6^X8^Y8, Y3, Y6, S0^X7^Y7, }, // 710 {S0^X8^Y8, X3, Y3, X6^Y6, }, // 711 {S0^X8^Y8, Y2, X3, Y3^X6, }, // 712 {X6, Y7, S0^X7^Y8, Z0^X6^Y6, }, // 713 {Y3, X6, S0^X7^Y7, Z0^X6^Y6, }, // 714 {X3, Y3, S0^X6^Y7, Z0^X6^Y6, }, // 715 {Y2, X3, Y3^X6, S0^X6^Y6, }, // 716 {Z0^X6^Y6, X6, Y7, S0^X7^Y8, }, // 717 {Z0^X6^Y6, Y3, X6, S0^X7^Y7, }, // 718 {Z0^X6^Y6, X3, Y3, S0^X6^Y7, }, // 719 {S0^X6^Y6, Y2, X3, Y3^X6, }, // 720 {Z0^X6^Y6, X6^X9^Y9, Y7, S0^X7, }, // 721 {Z0^X6^Y6, X6^X9^Y9, Y3, S0^X7, }, // 722 {Z0^X6^Y6, S0^X9^Y9, X3, Y3, }, // 723 {S0^X6^Y6, X3^X9^Y9, Y2, Y3, }, // 724 {Z0^Y6^X7, Y7, S0^X7, Z4^X6^Y7, }, // 725 {Z0^Y6^X7, Y3, S0^X7, Z4^X6^Y7, }, // 726 {Z0^Y6^X7, X3, Y3, S0^X6^Y7, }, // 727 {S0^Y6^X7, X3, Y3, Y2^X6^Y7, }, // 728 {Z0^Y6^X7, Z4^X6^Y7, Y7, S0^X7, }, // 729 {Z0^Y6^X7, Z4^X6^Y7, Y3, S0^X7, }, // 730 {Z0^Y6^X7, S0^X6^Y7, X3, Y3, }, // 731 {S0^Y6^X7, Y2^X6^Y7, X3, Y3, }, // 732 {Y5^X9, X6^Y8, S0^Y6^X8, S1^X7^Y7, }, // 733 {Y4^X8, X5^Y7, S0^Y5^X7, S1^X6^Y6, }, // 734 {X4^Y7, Y4^X7, S0^X5^Y6, S1^Y5^X6, }, // 735 {X5^Y7, Y4^X8, S0^X6^Y6, S1^Y5^X7, }, // 736 {X3^Y7, Y4^X7, S0^X5^Y6, S1^Y5^X6, }, // 737 {Y5, X6^Y8, S0^X7^Y7, S1^Y6^X8, }, // 738 {Y3, Y5^X8, S0^X6^Y7, S1^Y6^X7, }, // 739 {X3, Y3^X8, S0^X6^Y6, S1^Y5^X7, }, // 740 {Y2, Y3^X6, X3^Y6, S0^X5^Y5, }, // 741 {X5, X6^Y8, S0^Y6^X8, S1^X7^Y7, }, // 742 {Y3, X5^Y8, S0^X6^Y7, S1^Y6^X7, }, // 743 {X6, Y6, S0^X7^Y8, S1^Y7^X8, }, // 744 {Y3, X6, S0^Y6^X8, S1^X7^Y7, }, // 745 {X3, Y3, S0^X6^Y7, S1^Y6^X7, }, // 746 {Y2, X3, Y3^X7, S0^Y5^X6, }, // 747 {Y6, S0^X7^Y8, S1^Y7^X8, Z0^X5^Y6, }, // 748 {X6, S0^X7^Y7, S1^Y6^X8, Z0^X5^Y6, }, // 749 {Y3, S0^X6^Y7, S1^Y6^X7, Z0^X5^Y6, }, // 750 {Y3, X6, S0^X7^Y7, S1^Y6^X8, }, // 751 {X6^X8^Y8, Y6, S0^X7, S1^Y7^X8, }, // 752 {X6^X8^Y8, Y3, S0^X7, S1^Y6^X8, }, // 753 {S1^X8^Y8, X3, Y3, S0^X6^Y6, }, // 754 {X6, S0^X7, S1^Y7^X8, Z3^X6^Y6, }, // 755 {Y3, S0^X7, S1^Y6^X8, Z3^X6^Y6, }, // 756 {X3, Y3, S0^X6^Y7, S1^X6^Y6, }, // 757 {Z3^X6^Y6, X6, S0^X7, S1^Y7^X8, }, // 758 {Z3^X6^Y6, Y3, S0^X7, S1^Y6^X8, }, // 759 {S1^X6^Y6, X3, Y3, S0^X6^Y7, }, // 760 {Z3^X6^Y6, X6^X9^Y9, S0^X7, S1^Y7, }, // 761 {Z3^X6^Y6, S1^X9^Y9, Y3, S0^X7, }, // 762 {S1^X6^Y6, S0^X9^Y9, X3, Y3, }, // 763 {Z4^Y6^X7, S0^X7, S1^Y7, Z3^X6^Y7, }, // 764 {S1^Y6^X7, Y3, S0^X7, Z3^X6^Y7, }, // 765 {S1^Y6^X7, X3, Y3, S0^X6^Y7, }, // 766 {Z4^Y6^X7, Z3^X6^Y7, S0^X7, S1^Y7, }, // 767 {S1^Y6^X7, Z3^X6^Y7, Y3, S0^X7, }, // 768 {S1^Y6^X7, S0^X6^Y7, X3, Y3, }, // 769 {Y4^X8, S0^X5^Y7, S1^Y5^X7, S2^X6^Y6, }, // 770 {Y4^X7, S0^X4^Y7, S1^Y5^X6, S2^X5^Y6, }, // 771 {Y3^X7, S0^X4^Y6, S1^Y4^X6, S2^X5^Y5, }, // 772 {Y6, S0^X6^Y9, S1^Y7^X8, S2^X7^Y8, }, // 773 {Y5, S0^X6^Y8, S1^Y6^X8, S2^X7^Y7, }, // 774 {Y3, Y5^X7, S0^X5^Y7, S1^X6^Y6, }, // 775 {X3, Y3^X7, S0^X5^Y6, S1^Y5^X6, }, // 776 {Y2, Y3^X5, X3^Y6, S0^X4^Y5, }, // 777 {X5, S0^X6^Y8, S1^Y6^X8, S2^X7^Y7, }, // 778 {Y3, S0^X5^Y8, S1^Y6^X7, S2^X6^Y7, }, // 779 {X3, Y3^X7, S0^X5^Y7, S1^X6^Y6, }, // 780 {Y6, S0^X6, S1^Y7^X8, S2^X7^Y8, }, // 781 {Y3, S0^X6, S1^Y6^X8, S2^X7^Y7, }, // 782 {X3, Y3, S0^X5^Y7, S1^X6^Y6, }, // 783 {Y2, X3, Y3^X6, S0^X5^Y6, }, // 784 {S0^X6, S1^Y7^X8, S2^X7^Y8, Z2^X5^Y6, }, // 785 {S0^X6, S1^Y6^X8, S2^X7^Y7, Z2^X5^Y6, }, // 786 {Y3, S0^X6^Y7, S1^Y6^X7, S2^X5^Y6, }, // 787 {X3, Y3^X7, S0^X6^Y6, S1^X5^Y6, }, // 788 {S2^X8^Y8, Y6, S0^X6, S1^X7^Y7, }, // 789 {S2^X8^Y8, Y3, S0^X6, S1^Y6^X7, }, // 790 {S0^X6, S1^Y7, S2^X7^Y8, Z2^X6^Y6, }, // 791 {Y3, S0^X6, S1^X7^Y7, S2^X6^Y6, }, // 792 {Z2^X6^Y6, S0^X6, S1^Y7, S2^X7^Y8, }, // 793 {S2^X6^Y6, Y3, S0^X6, S1^X7^Y7, }, // 794 {Z2^X6^Y6, S2^X9^Y9, S0^X6, S1^Y7, }, // 795 {S2^X6^Y6, S1^X9^Y9, Y3, S0^X6, }, // 796 {Z2^Y6^X7, S0^X7, S1^Y7, S2^X6^Y7, }, // 797 {S2^Y6^X7, Y3, S0^X7, S1^X6^Y7, }, // 798 {Z2^Y6^X7, S2^X6^Y7, S0^X7, S1^Y7, }, // 799 {S2^Y6^X7, S1^X6^Y7, Y3, S0^X7, }, // 800 {X2, Z4, Y4, X3, }, // 801 {X2, Z3, Y4, X3, }, // 802 {Y3, X3, Z4, X5, }, // 803 {Y3, X2, Z4, X3, }, // 804 {Y3, X2, Z3, X3, }, // 805 {Y2, X2, Y3, X3, }, // 806 {Z3, X3, Z4, X5^Y5, }, // 807 {X2, Z4, X3, Y2^X5^Y5, }, // 808 {X2, Z3, X3, Y2^X5^Y5, }, // 809 {X2, Y3, X3, Y1^X5^Y5, }, // 810 {X2, Y3, X3, X1^X5^Y5, }, // 811 {Y3, Z3, X3, Z4, }, // 812 {Y2, Y3, X3, Z4, }, // 813 {Z3, X3, Z4, X5^Y6, }, // 814 {X2, Z4, X3, Z3^X5^Y6, }, // 815 {X2, Z3, X3, Z2^X5^Y6, }, // 816 {X2, Y3, X3, Z2^X5^Y6, }, // 817 {Z3^X7, Y3, X3, Z4, }, // 818 {Z3^X7, X2, Z4, X3, }, // 819 {Z2^X7, X2, Z3, X3, }, // 820 {Z2^X7, X2, Y3, X3, }, // 821 {Z3, X3, Z4, Y3^X6^Y6, }, // 822 {X2, Z4, X3, Y3^X6^Y6, }, // 823 {X2, Z3, X3, Y3^X6^Y6, }, // 824 {X2, Y3, X3, Y2^X6^Y6, }, // 825 {Y3^X6^Y6, Z3, X3, Z4, }, // 826 {Y3^X6^Y6, X2, Z4, X3, }, // 827 {Y3^X6^Y6, X2, Z3, X3, }, // 828 {Y2^X6^Y6, X2, Y3, X3, }, // 829 {Y3^X6^Y6, Z3^X8, X3, Z4, }, // 830 {X2^X6^Y6, Z3^X8, Z4, X3, }, // 831 {X2^X6^Y6, Z2^X8, Z3, X3, }, // 832 {X2^X6^Y6, Z2^X8, Y3, X3, }, // 833 {Y3^Y6^X7, X3, Z4, Z3^X6^Y7, }, // 834 {Y3^Y6^X7, Z4, X3, X2^X6^Y7, }, // 835 {Y3^Y6^X7, Z3, X3, X2^X6^Y7, }, // 836 {Y2^Y6^X7, Y3, X3, X2^X6^Y7, }, // 837 {Y3^Y6^X7, Z3^X6^Y7, X3, Z4, }, // 838 {Y3^Y6^X7, X2^X6^Y7, Z4, X3, }, // 839 {Y3^Y6^X7, X2^X6^Y7, Z3, X3, }, // 840 {Y2^Y6^X7, X2^X6^Y7, Y3, X3, }, // 841 }; const UINT_64 GFX10_SW_PATTERN_NIBBLE4[][4] = { {0, 0, 0, 0, }, // 0 {Y7^X9, 0, 0, 0, }, // 1 {Y7^X8, 0, 0, 0, }, // 2 {Y6^X8, 0, 0, 0, }, // 3 {Y6^X7, 0, 0, 0, }, // 4 {Y5^X7, 0, 0, 0, }, // 5 {X8^Y8, 0, 0, 0, }, // 6 {X7^Y7, 0, 0, 0, }, // 7 {X6^Y6, 0, 0, 0, }, // 8 {X8^Y9, Y8^X9, 0, 0, }, // 9 {Y7^X9, X8^Y8, 0, 0, }, // 10 {X7^Y8, Y7^X8, 0, 0, }, // 11 {Y6^X8, X7^Y7, 0, 0, }, // 12 {X6^Y7, Y6^X7, 0, 0, }, // 13 {X5^Y6, 0, 0, 0, }, // 14 {Z0^X5^Y6, 0, 0, 0, }, // 15 {X8^Y8, Y7^X9, 0, 0, }, // 16 {X7^Y7, Y6^X8, 0, 0, }, // 17 {Y7^X11, X9^Y9, Y8^X10, 0, }, // 18 {Y7^X10, X8^Y9, Y8^X9, 0, }, // 19 {Y6^X10, X8^Y8, Y7^X9, 0, }, // 20 {Y6^X9, X7^Y8, Y7^X8, 0, }, // 21 {Y3^X9, X7^Y7, Y6^X8, 0, }, // 22 {Y8^X9, X6^Y6, 0, 0, }, // 23 {X8^Y8, X6^Y6, 0, 0, }, // 24 {Y7^X8, X6^Y6, 0, 0, }, // 25 {X7^Y7, Z0^X6^Y6, 0, 0, }, // 26 {X6^Y7, Z0^X6^Y6, 0, 0, }, // 27 {X8^Y10, Y8^X10, X9^Y9, 0, }, // 28 {X7^Y9, Y7^X9, X8^Y8, 0, }, // 29 {X6^Y9, X7^Y8, Y7^X8, 0, }, // 30 {Y3^X8, X6^Y8, X7^Y7, 0, }, // 31 {X8^Y11, Y8^X11, X9^Y10, Y9^X10, }, // 32 {Y7^X11, X8^Y10, Y8^X10, X9^Y9, }, // 33 {X7^Y10, Y7^X10, X8^Y9, Y8^X9, }, // 34 {Y3^X10, X7^Y9, Y7^X9, X8^Y8, }, // 35 {X3^Y9, Y3^X9, X7^Y8, Y7^X8, }, // 36 {X9^Y9, Y8^X10, X6^Y7, 0, }, // 37 {X8^Y9, Y8^X9, X6^Y7, 0, }, // 38 {X8^Y8, Y7^X9, X6^Y7, 0, }, // 39 {X7^Y8, Y7^X8, Z0^X6^Y7, 0, }, // 40 {Y3^X8, X7^Y7, Z0^X6^Y7, 0, }, // 41 {X8^Y10, Y7^X11, X9^Y9, Y8^X10, }, // 42 {Y3^X10, X7^Y9, X8^Y8, Y7^X9, }, // 43 {Y3^X9, X3^Y9, X7^Y8, Y7^X8, }, // 44 {Y2^X7^Y7, 0, 0, 0, }, // 45 {X2^Y6^X7, 0, 0, 0, }, // 46 {Y1^X6^Y6, 0, 0, 0, }, // 47 {X7^Y9, X8^Y8, 0, 0, }, // 48 {Y7^X8, Y2^X7^Y8, 0, 0, }, // 49 {X6^Y8, X2^X7^Y7, 0, 0, }, // 50 {X5^Y8, Y1^X6^Y7, 0, 0, }, // 51 {Y6^X8, Y2^X7^Y7, 0, 0, }, // 52 {Y6^X7, Y1^X6^Y7, 0, 0, }, // 53 {X7^Y9, X8^Y8, Y7^X9, 0, }, // 54 {X7^Y9, Y7^X9, Y2^X8^Y8, 0, }, // 55 {X6^Y9, X7^Y8, X2^Y7^X8, 0, }, // 56 {X3^Y9, X6^Y8, Y1^X7^Y7, 0, }, // 57 {Y2^X7^Y8, X6^Y6, 0, 0, }, // 58 {X2^X7^Y7, Z0^X6^Y6, 0, 0, }, // 59 {Y1^X6^Y7, Z0^X6^Y6, 0, 0, }, // 60 {Y3^X8, X6^Y8, Y1^X7^Y7, 0, }, // 61 {Y7^X11, Y8^X10, X8^Y10, X9^Y9, }, // 62 {Y7^X10, X7^Y10, Y8^X9, Y2^X8^Y9, }, // 63 {Y3^X10, X7^Y9, Y7^X9, X2^X8^Y8, }, // 64 {Y3^X9, X3^Y9, Y7^X8, Y1^X7^Y8, }, // 65 {Y7^X9, Y2^X8^Y8, X6^Y7, 0, }, // 66 {X7^Y8, X2^Y7^X8, Z4^X6^Y7, 0, }, // 67 {X3^Y8, Y1^X7^Y7, Z4^X6^Y7, 0, }, // 68 {Y3^X10, X7^Y9, Y7^X9, Y2^X8^Y8, }, // 69 {Y2^Y6^X8, 0, 0, 0, }, // 70 {Y1^X6^Y7, 0, 0, 0, }, // 71 {Y1^Y5^X7, 0, 0, 0, }, // 72 {X7^Y8, Y2^Y7^X8, 0, 0, }, // 73 {X2^X7^Y8, Y2^Y7^X8, 0, 0, }, // 74 {X2^X7^Y7, Y1^Y6^X8, 0, 0, }, // 75 {X1^X6^Y7, Y1^Y6^X7, 0, 0, }, // 76 {Y6^X9, Y2^Y7^X8, 0, 0, }, // 77 {X2^Y7^X8, Y2^X7^Y8, 0, 0, }, // 78 {X2^Y6^X8, Y1^X7^Y7, 0, 0, }, // 79 {X1^Y6^X7, Y1^X6^Y7, 0, 0, }, // 80 {Y6^X10, X2^X8^Y8, Y2^Y7^X9, 0, }, // 81 {Y6^X9, Y2^Y7^X8, Y1^X7^Y8, 0, }, // 82 {Y3^X9, X1^X7^Y7, Y1^Y6^X8, 0, }, // 83 {Y2^Y7^X8, X6^Y6, 0, 0, }, // 84 {Y1^X7^Y7, Z3^X6^Y6, 0, 0, }, // 85 {X1^X6^Y8, Y1^X6^Y6, 0, 0, }, // 86 {X7^Y9, X2^Y7^X9, Y2^X8^Y8, 0, }, // 87 {X6^Y9, X2^Y7^X8, Y1^X7^Y8, 0, }, // 88 {X3^Y8, X1^Y6^X8, Y1^X7^Y7, 0, }, // 89 {X7^Y10, Y7^X10, X8^Y9, Y2^Y8^X9, }, // 90 {X7^Y10, Y7^X10, X2^X8^Y9, Y2^Y8^X9, }, // 91 {Y3^X10, X7^Y9, X2^X8^Y8, Y1^Y7^X9, }, // 92 {X3^Y9, Y3^X9, X1^X7^Y8, Y1^Y7^X8, }, // 93 {X2^X8^Y8, Y2^Y7^X9, X6^Y7, 0, }, // 94 {Y2^Y7^X8, Y1^X7^Y8, Z3^X6^Y7, 0, }, // 95 {Y2^Y7^X8, X1^X7^Y8, Z3^X6^Y7, 0, }, // 96 {X7^Y10, X8^Y9, Y7^X10, Y2^Y8^X9, }, // 97 {X7^Y10, Y7^X10, X2^Y8^X9, Y2^X8^Y9, }, // 98 {Y3^X10, X7^Y9, X2^Y7^X9, Y1^X8^Y8, }, // 99 {Y3^X9, X3^Y9, X1^Y7^X8, Y1^X7^Y8, }, // 100 {X1^Y5^X6, 0, 0, 0, }, // 101 {Y2^Y6^X7, 0, 0, 0, }, // 102 {X1^Y6^X7, 0, 0, 0, }, // 103 {Y0^X5^Y7, X1^X6^Y6, 0, 0, }, // 104 {Z1^X5^Y6, 0, 0, 0, }, // 105 {Y1^X5^Y6, 0, 0, 0, }, // 106 {X1^Y6^X8, Y2^X7^Y7, 0, 0, }, // 107 {Y2^X7^Y7, X1^Y6^X8, 0, 0, }, // 108 {X7^Y9, X2^X8^Y8, Y2^Y7^X9, 0, }, // 109 {Y1^X7^Y9, X2^X8^Y8, Y2^Y7^X9, 0, }, // 110 {X6^Y8, X1^X7^Y7, Y1^Y6^X8, 0, }, // 111 {X3^Y8, Y0^X6^Y7, X1^Y6^X7, 0, }, // 112 {X2^X7^Y8, Y1^X6^Y6, 0, 0, }, // 113 {Y2^Y7^X8, Y1^X6^Y6, 0, 0, }, // 114 {Y1^X7^Y9, X2^Y7^X9, Y2^X8^Y8, 0, }, // 115 {Y1^X7^Y8, X1^Y6^X9, Y2^Y7^X8, 0, }, // 116 {Y1^X6^Y9, Y2^X7^Y8, X1^Y7^X8, 0, }, // 117 {Y7^X10, Y1^X7^Y10, X2^X8^Y9, Y2^Y8^X9, }, // 118 {Y3^X10, X1^X7^Y9, Y1^Y7^X9, X2^X8^Y8, }, // 119 {Y3^X8, X3^Y9, Y0^X6^Y8, X1^X7^Y7, }, // 120 {Y2^Y7^X9, X2^X8^Y8, Z2^X6^Y7, 0, }, // 121 {X2^X8^Y8, Y2^Y7^X9, Y1^X6^Y7, 0, }, // 122 {Y3^X10, Y1^X7^Y9, X1^Y7^X9, Y2^X8^Y8, }, // 123 {Y3^X10, Y1^X7^Y9, Y2^X8^Y8, X1^Y7^X9, }, // 124 {Y8^X9, Z0^X6^Y6, 0, 0, }, // 125 {X8^Y8, Z0^X6^Y6, 0, 0, }, // 126 {Y7^X8, Z0^X6^Y6, 0, 0, }, // 127 {X9^Y9, Y8^X10, Z0^X6^Y7, 0, }, // 128 {X8^Y9, Y8^X9, Z0^X6^Y7, 0, }, // 129 {X8^Y8, Y7^X9, Z0^X6^Y7, 0, }, // 130 {S0^X8^Y8, 0, 0, 0, }, // 131 {S0^Y7^X8, 0, 0, 0, }, // 132 {S0^X7^Y7, 0, 0, 0, }, // 133 {S0^Y6^X7, 0, 0, 0, }, // 134 {S0^X6^Y6, 0, 0, 0, }, // 135 {Y8^X9, S0^X8^Y9, 0, 0, }, // 136 {Y7^X9, S0^X8^Y8, 0, 0, }, // 137 {Y7^X8, S0^X7^Y8, 0, 0, }, // 138 {Y6^X8, S0^X7^Y7, 0, 0, }, // 139 {Y6^X7, S0^X6^Y7, 0, 0, }, // 140 {X8^Y10, Y8^X10, S0^X9^Y9, 0, }, // 141 {X8^Y9, Y7^X10, S0^Y8^X9, 0, }, // 142 {X7^Y9, Y7^X9, S0^X8^Y8, 0, }, // 143 {X7^Y8, Y6^X9, S0^Y7^X8, 0, }, // 144 {X3^Y8, Y6^X8, S0^X7^Y7, 0, }, // 145 {S0^X8^Y9, Z0^X6^Y6, 0, 0, }, // 146 {S0^X8^Y8, Z0^X6^Y6, 0, 0, }, // 147 {S0^X7^Y8, Z0^X6^Y6, 0, 0, }, // 148 {S0^X7^Y7, Z0^X6^Y6, 0, 0, }, // 149 {S0^X6^Y7, Z0^X6^Y6, 0, 0, }, // 150 {Y7^X10, X8^Y9, S0^Y8^X9, 0, }, // 151 {X6^Y9, X7^Y8, S0^Y7^X8, 0, }, // 152 {Y3^X8, X6^Y8, S0^X7^Y7, 0, }, // 153 {Y8^X11, X8^Y11, Y9^X10, S0^X9^Y10, }, // 154 {Y7^X11, X8^Y10, Y8^X10, S0^X9^Y9, }, // 155 {Y7^X10, X7^Y10, Y8^X9, S0^X8^Y9, }, // 156 {Y3^X10, X7^Y9, Y7^X9, S0^X8^Y8, }, // 157 {Y3^X9, X3^Y9, Y7^X8, S0^X7^Y8, }, // 158 {Y8^X10, S0^X9^Y9, Z4^X6^Y7, 0, }, // 159 {Y7^X10, S0^Y8^X9, Z4^X6^Y7, 0, }, // 160 {Y7^X9, S0^X8^Y8, Z4^X6^Y7, 0, }, // 161 {X7^Y8, S0^Y7^X8, Z4^X6^Y7, 0, }, // 162 {X3^Y8, S0^X7^Y7, Z4^X6^Y7, 0, }, // 163 {S1^Y7^X9, 0, 0, 0, }, // 164 {S1^Y7^X8, 0, 0, 0, }, // 165 {S1^Y6^X8, 0, 0, 0, }, // 166 {S1^Y6^X7, 0, 0, 0, }, // 167 {S1^Y5^X7, 0, 0, 0, }, // 168 {S1^X8^Y8, 0, 0, 0, }, // 169 {S1^X7^Y7, 0, 0, 0, }, // 170 {S0^X8^Y9, S1^Y8^X9, 0, 0, }, // 171 {S0^Y7^X9, S1^X8^Y8, 0, 0, }, // 172 {S0^X7^Y8, S1^Y7^X8, 0, 0, }, // 173 {S0^Y6^X8, S1^X7^Y7, 0, 0, }, // 174 {S0^X6^Y7, S1^Y6^X7, 0, 0, }, // 175 {S0^X8^Y8, S1^Y7^X9, 0, 0, }, // 176 {S0^X7^Y7, S1^Y6^X8, 0, 0, }, // 177 {Y7^X11, S0^X9^Y9, S1^Y8^X10, 0, }, // 178 {Y7^X10, S0^X8^Y9, S1^Y8^X9, 0, }, // 179 {Y6^X10, S0^X8^Y8, S1^Y7^X9, 0, }, // 180 {Y6^X9, S0^X7^Y8, S1^Y7^X8, 0, }, // 181 {Y3^X9, S0^X7^Y7, S1^Y6^X8, 0, }, // 182 {S1^Y8^X9, Z3^X6^Y6, 0, 0, }, // 183 {S1^X8^Y8, Z3^X6^Y6, 0, 0, }, // 184 {S1^Y7^X8, Z3^X6^Y6, 0, 0, }, // 185 {S1^Y6^X8, Z3^X6^Y6, 0, 0, }, // 186 {S0^X6^Y7, S1^X6^Y6, 0, 0, }, // 187 {X8^Y10, S0^Y8^X10, S1^X9^Y9, 0, }, // 188 {X7^Y9, S0^Y7^X9, S1^X8^Y8, 0, }, // 189 {X6^Y9, S0^X7^Y8, S1^Y7^X8, 0, }, // 190 {X3^Y8, S0^X7^Y7, S1^Y6^X8, 0, }, // 191 {X8^Y11, Y8^X11, S0^X9^Y10, S1^Y9^X10, }, // 192 {Y7^X11, X8^Y10, S0^Y8^X10, S1^X9^Y9, }, // 193 {X7^Y10, Y7^X10, S0^X8^Y9, S1^Y8^X9, }, // 194 {Y3^X10, X7^Y9, S0^Y7^X9, S1^X8^Y8, }, // 195 {X3^Y9, Y3^X9, S0^X7^Y8, S1^Y7^X8, }, // 196 {S0^X9^Y9, S1^Y8^X10, Z3^X6^Y7, 0, }, // 197 {S0^X8^Y9, S1^Y8^X9, Z3^X6^Y7, 0, }, // 198 {S0^X8^Y8, S1^Y7^X9, Z3^X6^Y7, 0, }, // 199 {S0^X7^Y8, S1^Y7^X8, Z3^X6^Y7, 0, }, // 200 {X3^Y8, S0^X7^Y7, Z3^X6^Y7, 0, }, // 201 {X8^Y10, Y7^X11, S0^X9^Y9, S1^Y8^X10, }, // 202 {Y3^X10, X7^Y9, S0^X8^Y8, S1^Y7^X9, }, // 203 {Y3^X9, X3^Y9, S0^X7^Y8, S1^Y7^X8, }, // 204 {S2^X8^Y8, 0, 0, 0, }, // 205 {S2^Y7^X8, 0, 0, 0, }, // 206 {S2^X7^Y7, 0, 0, 0, }, // 207 {S2^Y6^X7, 0, 0, 0, }, // 208 {S2^X6^Y6, 0, 0, 0, }, // 209 {S1^X6^Y6, 0, 0, 0, }, // 210 {S1^Y8^X9, S2^X8^Y9, 0, 0, }, // 211 {S1^Y7^X9, S2^X8^Y8, 0, 0, }, // 212 {S1^Y7^X8, S2^X7^Y8, 0, 0, }, // 213 {S1^Y6^X8, S2^X7^Y7, 0, 0, }, // 214 {S1^Y6^X7, S2^X6^Y7, 0, 0, }, // 215 {Z2^X5^Y6, 0, 0, 0, }, // 216 {S1^X5^Y6, 0, 0, 0, }, // 217 {S0^X8^Y10, S1^Y8^X10, S2^X9^Y9, 0, }, // 218 {S0^X8^Y9, S1^Y7^X10, S2^Y8^X9, 0, }, // 219 {S0^X7^Y9, S1^Y7^X9, S2^X8^Y8, 0, }, // 220 {S0^X7^Y8, S1^Y6^X9, S2^Y7^X8, 0, }, // 221 {S0^X6^Y8, S1^Y6^X8, S2^X7^Y7, 0, }, // 222 {S2^X8^Y9, Z2^X6^Y6, 0, 0, }, // 223 {S2^X8^Y8, Z2^X6^Y6, 0, 0, }, // 224 {S2^X7^Y8, Z2^X6^Y6, 0, 0, }, // 225 {S1^X7^Y7, S2^X6^Y6, 0, 0, }, // 226 {S0^Y7^X10, S1^X8^Y9, S2^Y8^X9, 0, }, // 227 {X3^Y9, S0^X6^Y8, S1^X7^Y7, 0, }, // 228 {Y8^X11, S0^X8^Y11, S1^Y9^X10, S2^X9^Y10, }, // 229 {Y7^X11, S0^X8^Y10, S1^Y8^X10, S2^X9^Y9, }, // 230 {Y7^X10, S0^X7^Y10, S1^Y8^X9, S2^X8^Y9, }, // 231 {Y3^X10, S0^X7^Y9, S1^Y7^X9, S2^X8^Y8, }, // 232 {Y3^X9, S0^X6^Y9, S1^Y7^X8, S2^X7^Y8, }, // 233 {S1^Y8^X10, S2^X9^Y9, Z2^X6^Y7, 0, }, // 234 {S1^Y7^X10, S2^Y8^X9, Z2^X6^Y7, 0, }, // 235 {S1^Y7^X9, S2^X8^Y8, Z2^X6^Y7, 0, }, // 236 {S0^X7^Y8, S1^Y7^X8, Z2^X6^Y7, 0, }, // 237 {X3^Y8, S0^X7^Y7, S1^X6^Y7, 0, }, // 238 }; const UINT_8 DCC_64K_R_X_PATIDX[] = { 0, // 1 pipes 1 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 1, // 1 pipes 2 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 2, // 1 pipes 4 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 3, // 1 pipes 8 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 4, // 1 pipes 16 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 5, // 2 pipes 1 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 6, // 2 pipes 2 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 2, // 2 pipes 4 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 3, // 2 pipes 8 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 4, // 2 pipes 16 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 7, // 4+ pipes 1 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 6, // 4+ pipes 2 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 2, // 4+ pipes 4 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 3, // 4+ pipes 8 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 4, // 4+ pipes 16 bpe ua @ SW_64K_R_X 1xaa @ Navi1x 0, // 1 pipes 1 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 1, // 1 pipes 2 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 2, // 1 pipes 4 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 3, // 1 pipes 8 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 4, // 1 pipes 16 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 8, // 2 pipes 1 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 9, // 2 pipes 2 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 10, // 2 pipes 4 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 11, // 2 pipes 8 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 12, // 2 pipes 16 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 13, // 4 pipes 1 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 14, // 4 pipes 2 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 15, // 4 pipes 4 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 16, // 4 pipes 8 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 17, // 4 pipes 16 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 18, // 8 pipes 1 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 19, // 8 pipes 2 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 20, // 8 pipes 4 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 21, // 8 pipes 8 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 22, // 8 pipes 16 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 23, // 16 pipes 1 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 24, // 16 pipes 2 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 25, // 16 pipes 4 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 26, // 16 pipes 8 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 27, // 16 pipes 16 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 28, // 32 pipes 1 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 29, // 32 pipes 2 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 30, // 32 pipes 4 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 31, // 32 pipes 8 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 32, // 32 pipes 16 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 33, // 64 pipes 1 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 34, // 64 pipes 2 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 35, // 64 pipes 4 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 36, // 64 pipes 8 bpe pa @ SW_64K_R_X 1xaa @ Navi1x 37, // 64 pipes 16 bpe pa @ SW_64K_R_X 1xaa @ Navi1x }; const UINT_8 HTILE_PATIDX[] = { 0, // 1xaa ua @ HTILE_64K @ Navi1x 0, // 2xaa ua @ HTILE_64K @ Navi1x 0, // 4xaa ua @ HTILE_64K @ Navi1x 0, // 8xaa ua @ HTILE_64K @ Navi1x 0, // 1 pipes 1xaa pa @ HTILE_64K @ Navi1x 0, // 1 pipes 2xaa pa @ HTILE_64K @ Navi1x 0, // 1 pipes 4xaa pa @ HTILE_64K @ Navi1x 0, // 1 pipes 8xaa pa @ HTILE_64K @ Navi1x 1, // 2 pipes 1xaa pa @ HTILE_64K @ Navi1x 1, // 2 pipes 2xaa pa @ HTILE_64K @ Navi1x 1, // 2 pipes 4xaa pa @ HTILE_64K @ Navi1x 1, // 2 pipes 8xaa pa @ HTILE_64K @ Navi1x 2, // 4 pipes 1xaa pa @ HTILE_64K @ Navi1x 2, // 4 pipes 2xaa pa @ HTILE_64K @ Navi1x 2, // 4 pipes 4xaa pa @ HTILE_64K @ Navi1x 2, // 4 pipes 8xaa pa @ HTILE_64K @ Navi1x 3, // 8 pipes 1xaa pa @ HTILE_64K @ Navi1x 3, // 8 pipes 2xaa pa @ HTILE_64K @ Navi1x 3, // 8 pipes 4xaa pa @ HTILE_64K @ Navi1x 3, // 8 pipes 8xaa pa @ HTILE_64K @ Navi1x 4, // 16 pipes 1xaa pa @ HTILE_64K @ Navi1x 4, // 16 pipes 2xaa pa @ HTILE_64K @ Navi1x 4, // 16 pipes 4xaa pa @ HTILE_64K @ Navi1x 5, // 16 pipes 8xaa pa @ HTILE_64K @ Navi1x 6, // 32 pipes 1xaa pa @ HTILE_64K @ Navi1x 6, // 32 pipes 2xaa pa @ HTILE_64K @ Navi1x 7, // 32 pipes 4xaa pa @ HTILE_64K @ Navi1x 8, // 32 pipes 8xaa pa @ HTILE_64K @ Navi1x 9, // 64 pipes 1xaa pa @ HTILE_64K @ Navi1x 10, // 64 pipes 2xaa pa @ HTILE_64K @ Navi1x 11, // 64 pipes 4xaa pa @ HTILE_64K @ Navi1x 12, // 64 pipes 8xaa pa @ HTILE_64K @ Navi1x }; const UINT_8 CMASK_64K_PATIDX[] = { 0, // 1 bpe ua @ CMASK_64K @ Navi1x 0, // 2 bpe ua @ CMASK_64K @ Navi1x 0, // 4 bpe ua @ CMASK_64K @ Navi1x 0, // 8 bpe ua @ CMASK_64K @ Navi1x 0, // 1 pipes 1 bpe pa @ CMASK_64K @ Navi1x 0, // 1 pipes 2 bpe pa @ CMASK_64K @ Navi1x 0, // 1 pipes 4 bpe pa @ CMASK_64K @ Navi1x 0, // 1 pipes 8 bpe pa @ CMASK_64K @ Navi1x 1, // 2 pipes 1 bpe pa @ CMASK_64K @ Navi1x 1, // 2 pipes 2 bpe pa @ CMASK_64K @ Navi1x 1, // 2 pipes 4 bpe pa @ CMASK_64K @ Navi1x 1, // 2 pipes 8 bpe pa @ CMASK_64K @ Navi1x 2, // 4 pipes 1 bpe pa @ CMASK_64K @ Navi1x 2, // 4 pipes 2 bpe pa @ CMASK_64K @ Navi1x 2, // 4 pipes 4 bpe pa @ CMASK_64K @ Navi1x 2, // 4 pipes 8 bpe pa @ CMASK_64K @ Navi1x 3, // 8 pipes 1 bpe pa @ CMASK_64K @ Navi1x 3, // 8 pipes 2 bpe pa @ CMASK_64K @ Navi1x 3, // 8 pipes 4 bpe pa @ CMASK_64K @ Navi1x 3, // 8 pipes 8 bpe pa @ CMASK_64K @ Navi1x 4, // 16 pipes 1 bpe pa @ CMASK_64K @ Navi1x 4, // 16 pipes 2 bpe pa @ CMASK_64K @ Navi1x 4, // 16 pipes 4 bpe pa @ CMASK_64K @ Navi1x 4, // 16 pipes 8 bpe pa @ CMASK_64K @ Navi1x 5, // 32 pipes 1 bpe pa @ CMASK_64K @ Navi1x 5, // 32 pipes 2 bpe pa @ CMASK_64K @ Navi1x 5, // 32 pipes 4 bpe pa @ CMASK_64K @ Navi1x 5, // 32 pipes 8 bpe pa @ CMASK_64K @ Navi1x 6, // 64 pipes 1 bpe pa @ CMASK_64K @ Navi1x 6, // 64 pipes 2 bpe pa @ CMASK_64K @ Navi1x 6, // 64 pipes 4 bpe pa @ CMASK_64K @ Navi1x 7, // 64 pipes 8 bpe pa @ CMASK_64K @ Navi1x }; const UINT_8 DCC_64K_R_X_RBPLUS_PATIDX[] = { 0, // 1 bpe ua @ SW_64K_R_X 1xaa @ RbPlus 1, // 2 bpe ua @ SW_64K_R_X 1xaa @ RbPlus 2, // 4 bpe ua @ SW_64K_R_X 1xaa @ RbPlus 3, // 8 bpe ua @ SW_64K_R_X 1xaa @ RbPlus 4, // 16 bpe ua @ SW_64K_R_X 1xaa @ RbPlus 0, // 1 pipes (1 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 1, // 1 pipes (1 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 2, // 1 pipes (1 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 3, // 1 pipes (1 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 4, // 1 pipes (1 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 38, // 2 pipes (1-2 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 39, // 2 pipes (1-2 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 40, // 2 pipes (1-2 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 41, // 2 pipes (1-2 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 42, // 2 pipes (1-2 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 43, // 4 pipes (1-2 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 44, // 4 pipes (1-2 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 45, // 4 pipes (1-2 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 46, // 4 pipes (1-2 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 47, // 4 pipes (1-2 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 48, // 8 pipes (2 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 49, // 8 pipes (2 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 50, // 8 pipes (2 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 51, // 8 pipes (2 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 52, // 8 pipes (2 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 53, // 4 pipes (4 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 54, // 4 pipes (4 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 55, // 4 pipes (4 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 56, // 4 pipes (4 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 57, // 4 pipes (4 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 58, // 8 pipes (4 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 59, // 8 pipes (4 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 60, // 8 pipes (4 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 61, // 8 pipes (4 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 62, // 8 pipes (4 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 63, // 16 pipes (4 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 64, // 16 pipes (4 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 65, // 16 pipes (4 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 66, // 16 pipes (4 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 67, // 16 pipes (4 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 68, // 8 pipes (8 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 69, // 8 pipes (8 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 70, // 8 pipes (8 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 71, // 8 pipes (8 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 72, // 8 pipes (8 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 73, // 16 pipes (8 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 74, // 16 pipes (8 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 75, // 16 pipes (8 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 76, // 16 pipes (8 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 77, // 16 pipes (8 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 78, // 32 pipes (8 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 79, // 32 pipes (8 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 80, // 32 pipes (8 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 81, // 32 pipes (8 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 82, // 32 pipes (8 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 83, // 16 pipes (16 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 84, // 16 pipes (16 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 85, // 16 pipes (16 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 86, // 16 pipes (16 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 87, // 16 pipes (16 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 88, // 32 pipes (16 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 89, // 32 pipes (16 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 90, // 32 pipes (16 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 91, // 32 pipes (16 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 92, // 32 pipes (16 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 93, // 64 pipes (16 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 94, // 64 pipes (16 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 95, // 64 pipes (16 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 96, // 64 pipes (16 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 97, // 64 pipes (16 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 98, // 32 pipes (32 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 99, // 32 pipes (32 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 100, // 32 pipes (32 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 101, // 32 pipes (32 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 102, // 32 pipes (32 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 103, // 64 pipes (32 PKRs) 1 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 104, // 64 pipes (32 PKRs) 2 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 105, // 64 pipes (32 PKRs) 4 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 106, // 64 pipes (32 PKRs) 8 bpe pa @ SW_64K_R_X 1xaa @ RbPlus 107, // 64 pipes (32 PKRs) 16 bpe pa @ SW_64K_R_X 1xaa @ RbPlus }; const UINT_8 HTILE_RBPLUS_PATIDX[] = { 0, // 1xaa ua @ HTILE_64K @ RbPlus 0, // 2xaa ua @ HTILE_64K @ RbPlus 0, // 4xaa ua @ HTILE_64K @ RbPlus 0, // 8xaa ua @ HTILE_64K @ RbPlus 0, // 1 pipes (1-2 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 0, // 1 pipes (1-2 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 0, // 1 pipes (1-2 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 0, // 1 pipes (1-2 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 13, // 2 pipes (1-2 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 13, // 2 pipes (1-2 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 13, // 2 pipes (1-2 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 13, // 2 pipes (1-2 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 14, // 4 pipes (1-2 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 14, // 4 pipes (1-2 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 14, // 4 pipes (1-2 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 14, // 4 pipes (1-2 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 15, // 8 pipes (1-2 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 15, // 8 pipes (1-2 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 15, // 8 pipes (1-2 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 15, // 8 pipes (1-2 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 13, // 2 pipes (4 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 13, // 2 pipes (4 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 13, // 2 pipes (4 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 13, // 2 pipes (4 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 16, // 4 pipes (4 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 16, // 4 pipes (4 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 16, // 4 pipes (4 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 16, // 4 pipes (4 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 17, // 8 pipes (4 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 17, // 8 pipes (4 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 17, // 8 pipes (4 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 17, // 8 pipes (4 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 18, // 16 pipes (4 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 18, // 16 pipes (4 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 18, // 16 pipes (4 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 18, // 16 pipes (4 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 19, // 4 pipes (8 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 19, // 4 pipes (8 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 19, // 4 pipes (8 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 19, // 4 pipes (8 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 20, // 8 pipes (8 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 20, // 8 pipes (8 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 20, // 8 pipes (8 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 20, // 8 pipes (8 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 21, // 16 pipes (8 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 21, // 16 pipes (8 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 21, // 16 pipes (8 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 21, // 16 pipes (8 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 22, // 32 pipes (8 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 22, // 32 pipes (8 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 22, // 32 pipes (8 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 22, // 32 pipes (8 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 23, // 8 pipes (16 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 23, // 8 pipes (16 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 23, // 8 pipes (16 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 23, // 8 pipes (16 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 24, // 16 pipes (16 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 24, // 16 pipes (16 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 24, // 16 pipes (16 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 24, // 16 pipes (16 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 25, // 32 pipes (16 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 25, // 32 pipes (16 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 25, // 32 pipes (16 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 25, // 32 pipes (16 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 26, // 64 pipes (16 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 26, // 64 pipes (16 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 26, // 64 pipes (16 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 26, // 64 pipes (16 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 27, // 16 pipes (32 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 27, // 16 pipes (32 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 27, // 16 pipes (32 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 27, // 16 pipes (32 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 28, // 32 pipes (32 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 28, // 32 pipes (32 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 28, // 32 pipes (32 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 28, // 32 pipes (32 PKRs) 8xaa pa @ HTILE_64K @ RbPlus 29, // 64 pipes (32 PKRs) 1xaa pa @ HTILE_64K @ RbPlus 29, // 64 pipes (32 PKRs) 2xaa pa @ HTILE_64K @ RbPlus 29, // 64 pipes (32 PKRs) 4xaa pa @ HTILE_64K @ RbPlus 29, // 64 pipes (32 PKRs) 8xaa pa @ HTILE_64K @ RbPlus }; const UINT_8 CMASK_64K_RBPLUS_PATIDX[] = { 0, // 1 bpe ua @ CMASK_64K @ RbPlus 0, // 2 bpe ua @ CMASK_64K @ RbPlus 0, // 4 bpe ua @ CMASK_64K @ RbPlus 0, // 8 bpe ua @ CMASK_64K @ RbPlus 0, // 1 pipes (1-2 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 0, // 1 pipes (1-2 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 0, // 1 pipes (1-2 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 0, // 1 pipes (1-2 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 8, // 2 pipes (1-2 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 8, // 2 pipes (1-2 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 8, // 2 pipes (1-2 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 8, // 2 pipes (1-2 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 9, // 4 pipes (1-2 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 9, // 4 pipes (1-2 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 9, // 4 pipes (1-2 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 9, // 4 pipes (1-2 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 10, // 8 pipes (1-2 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 10, // 8 pipes (1-2 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 10, // 8 pipes (1-2 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 10, // 8 pipes (1-2 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 8, // 2 pipes (4 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 8, // 2 pipes (4 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 8, // 2 pipes (4 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 8, // 2 pipes (4 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 11, // 4 pipes (4 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 11, // 4 pipes (4 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 11, // 4 pipes (4 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 11, // 4 pipes (4 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 12, // 8 pipes (4 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 12, // 8 pipes (4 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 12, // 8 pipes (4 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 12, // 8 pipes (4 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 13, // 16 pipes (4 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 13, // 16 pipes (4 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 13, // 16 pipes (4 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 13, // 16 pipes (4 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 14, // 4 pipes (8 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 14, // 4 pipes (8 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 14, // 4 pipes (8 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 14, // 4 pipes (8 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 15, // 8 pipes (8 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 15, // 8 pipes (8 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 15, // 8 pipes (8 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 16, // 8 pipes (8 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 15, // 16 pipes (8 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 15, // 16 pipes (8 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 15, // 16 pipes (8 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 17, // 16 pipes (8 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 18, // 32 pipes (8 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 18, // 32 pipes (8 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 18, // 32 pipes (8 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 19, // 32 pipes (8 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 20, // 8 pipes (16 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 20, // 8 pipes (16 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 20, // 8 pipes (16 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 21, // 8 pipes (16 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 22, // 16 pipes (16 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 22, // 16 pipes (16 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 22, // 16 pipes (16 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 23, // 16 pipes (16 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 22, // 32 pipes (16 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 22, // 32 pipes (16 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 22, // 32 pipes (16 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 24, // 32 pipes (16 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 25, // 64 pipes (16 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 25, // 64 pipes (16 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 25, // 64 pipes (16 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 32, // 64 pipes (16 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 27, // 16 pipes (32 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 27, // 16 pipes (32 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 27, // 16 pipes (32 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 28, // 16 pipes (32 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 29, // 32 pipes (32 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 29, // 32 pipes (32 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 29, // 32 pipes (32 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 33, // 32 pipes (32 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus 29, // 64 pipes (32 PKRs) 1 bpe pa @ CMASK_64K @ RbPlus 29, // 64 pipes (32 PKRs) 2 bpe pa @ CMASK_64K @ RbPlus 29, // 64 pipes (32 PKRs) 4 bpe pa @ CMASK_64K @ RbPlus 34, // 64 pipes (32 PKRs) 8 bpe pa @ CMASK_64K @ RbPlus }; const UINT_8 CMASK_VAR_RBPLUS_PATIDX[] = { 0, // 1 bpe ua @ CMASK_VAR @ RbPlus 0, // 2 bpe ua @ CMASK_VAR @ RbPlus 0, // 4 bpe ua @ CMASK_VAR @ RbPlus 0, // 8 bpe ua @ CMASK_VAR @ RbPlus 0, // 1 pipes (1-2 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 0, // 1 pipes (1-2 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 0, // 1 pipes (1-2 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 0, // 1 pipes (1-2 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 8, // 2 pipes (1-2 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 8, // 2 pipes (1-2 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 8, // 2 pipes (1-2 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 8, // 2 pipes (1-2 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 9, // 4 pipes (1-2 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 9, // 4 pipes (1-2 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 9, // 4 pipes (1-2 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 9, // 4 pipes (1-2 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 10, // 8 pipes (1-2 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 10, // 8 pipes (1-2 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 10, // 8 pipes (1-2 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 10, // 8 pipes (1-2 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 8, // 2 pipes (4 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 8, // 2 pipes (4 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 8, // 2 pipes (4 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 8, // 2 pipes (4 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 11, // 4 pipes (4 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 11, // 4 pipes (4 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 11, // 4 pipes (4 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 11, // 4 pipes (4 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 12, // 8 pipes (4 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 12, // 8 pipes (4 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 12, // 8 pipes (4 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 12, // 8 pipes (4 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 13, // 16 pipes (4 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 13, // 16 pipes (4 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 13, // 16 pipes (4 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 13, // 16 pipes (4 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 14, // 4 pipes (8 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 14, // 4 pipes (8 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 14, // 4 pipes (8 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 14, // 4 pipes (8 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 15, // 8 pipes (8 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 15, // 8 pipes (8 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 15, // 8 pipes (8 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 16, // 8 pipes (8 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 15, // 16 pipes (8 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 15, // 16 pipes (8 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 15, // 16 pipes (8 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 17, // 16 pipes (8 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 18, // 32 pipes (8 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 18, // 32 pipes (8 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 18, // 32 pipes (8 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 19, // 32 pipes (8 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 20, // 8 pipes (16 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 20, // 8 pipes (16 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 20, // 8 pipes (16 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 21, // 8 pipes (16 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 22, // 16 pipes (16 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 22, // 16 pipes (16 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 22, // 16 pipes (16 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 23, // 16 pipes (16 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 22, // 32 pipes (16 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 22, // 32 pipes (16 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 22, // 32 pipes (16 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 24, // 32 pipes (16 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 25, // 64 pipes (16 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 25, // 64 pipes (16 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 25, // 64 pipes (16 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 26, // 64 pipes (16 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 27, // 16 pipes (32 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 27, // 16 pipes (32 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 27, // 16 pipes (32 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 28, // 16 pipes (32 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 29, // 32 pipes (32 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 29, // 32 pipes (32 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 29, // 32 pipes (32 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 30, // 32 pipes (32 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus 29, // 64 pipes (32 PKRs) 1 bpe pa @ CMASK_VAR @ RbPlus 29, // 64 pipes (32 PKRs) 2 bpe pa @ CMASK_VAR @ RbPlus 29, // 64 pipes (32 PKRs) 4 bpe pa @ CMASK_VAR @ RbPlus 31, // 64 pipes (32 PKRs) 8 bpe pa @ CMASK_VAR @ RbPlus }; const UINT_64 DCC_64K_R_X_SW_PATTERN[][17] = { {0, X4, Y4, X5, Y5, X6, Y6, X7, Y7, X8, Y8, X9, Y9, 0, 0, 0, 0, }, //0 {0, Y3, X4, Y4, X5, Y5, X6, Y6, X7, Y7, X8, Y8, X9, 0, 0, 0, 0, }, //1 {0, X3, Y3, X4, Y4, X5, Y5, X6, Y6, X7, Y7, X8, Y8, 0, 0, 0, 0, }, //2 {0, Y2, X3, Y3, X4, Y4, X5, Y5, X6, Y6, X7, Y7, X8, 0, 0, 0, 0, }, //3 {0, X2, Y2, X3, Y3, X4, Y4, X5, Y5, X6, Y6, X7, Y7, 0, 0, 0, 0, }, //4 {0, X3^Y3, X4, X5, Y5, X6, Y6, X7, Y7, X8, Y8, X9, Y9, 0, 0, 0, 0, }, //5 {0, X3^Y3, X4, Y4, X5, Y5, X6, Y6, X7, Y7, X8, Y8, X9, 0, 0, 0, 0, }, //6 {0, X3^Y3, X4^Y4, X5, Y5, X6, Y6, X7, Y7, X8, Y8, X9, Y9, 0, 0, 0, 0, }, //7 {0, X4, X5, Y5, X6, Y6, X7, Y7, X8, Z0^X3^Y3, Y8, X9, Y9, 0, 0, 0, 0, }, //8 {0, Y4, X4, X5, Y5, X6, Y6, X7, Y7, Z0^X3^Y3, X8, Y8, X9, 0, 0, 0, 0, }, //9 {0, X3, Y4, X4, X5, Y5, X6, Y6, X7, Z0^X3^Y3, Y7, X8, Y8, 0, 0, 0, 0, }, //10 {0, Y2, X3, Y4, X4, X5, Y5, X6, Y6, Z0^X3^Y3, X7, Y7, X8, 0, 0, 0, 0, }, //11 {0, X2, Y2, X3, Y4, X4, X5, Y5, X6, Z0^X3^Y3, Y6, X7, Y7, 0, 0, 0, 0, }, //12 {0, X5, Y5, X6, Y6, X7, Y7, X8, Y8, Z1^X3^Y3, Z0^X4^Y4, X9, Y9, 0, 0, 0, 0, }, //13 {0, Y4, X5, Y5, X6, Y6, X7, Y7, X8, Z1^X3^Y3, Z0^X4^Y4, Y8, X9, 0, 0, 0, 0, }, //14 {0, X3, Y4, X5, Y5, X6, Y6, X7, Y7, Z1^X3^Y3, Z0^X4^Y4, X8, Y8, 0, 0, 0, 0, }, //15 {0, Y2, X3, Y4, X5, Y5, X6, Y6, X7, Z1^X3^Y3, Z0^X4^Y4, Y7, X8, 0, 0, 0, 0, }, //16 {0, X2, Y2, X3, Y4, X5, Y5, X6, Y6, Z1^X3^Y3, Z0^X4^Y4, X7, Y7, 0, 0, 0, 0, }, //17 {0, Y5, X6, Y6, X7, Y7, X8, Y8, X9, Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, Y9, 0, 0, 0, 0, }, //18 {0, Y4, Y5, X6, Y6, X7, Y7, X8, Y8, Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, X9, 0, 0, 0, 0, }, //19 {0, X3, Y4, Y5, X6, Y6, X7, Y7, X8, Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, Y8, 0, 0, 0, 0, }, //20 {0, Y2, X3, Y4, Y5, X6, Y6, X7, Y7, Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, X8, 0, 0, 0, 0, }, //21 {0, X2, Y2, X3, Y4, Y5, X6, Y6, X7, Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, Y7, 0, 0, 0, 0, }, //22 {0, X6, Y6, X7, Y7, X8, Y8, X9, Y9, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //23 {0, Y4, X6, Y6, X7, Y7, X8, Y8, X9, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //24 {0, X3, Y4, X6, Y6, X7, Y7, X8, Y8, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //25 {0, Y2, X3, Y4, X6, Y6, X7, Y7, X8, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //26 {0, X2, Y2, X3, Y4, X6, Y6, X7, Y7, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //27 {0, Y6, X7, Y7, X8, Y8, X9, Y9, X10, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //28 {0, Y4, Y6, X7, Y7, X8, Y8, X9, Y9, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //29 {0, X3, Y4, Y6, X7, Y7, X8, Y8, X9, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //30 {0, Y2, X3, Y4, Y6, X7, Y7, X8, Y8, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //31 {0, X2, X3, Y4, Y6, X7, Y7, Y2, X8, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X7, Z0^X5^Y7, Y2^X6^Y6, 0, 0, 0, }, //32 {0, X7, Y7, X8, Y8, X9, Y9, X10, Y10, X3^Y3^Z5, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, 0, 0, }, //33 {0, Y4, X7, Y7, X8, Y8, X9, Y9, X10, X3^Y3^Z5, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, 0, 0, }, //34 {0, X3, Y4, X7, Y7, X8, Y8, X9, Y9, X3^Y3^Z5, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, 0, 0, }, //35 {0, X3, Y4, X7, Y7, X8, Y8, Y2, X9, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, Y2^Y6^X7, Z0^X6^Y7, 0, 0, }, //36 {0, X3, Y4, X7, Y7, X8, Y8, X2, Y2, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X8, Z0^X5^Y8, Y2^Y6^X7, X2^X6^Y7, 0, 0, }, //37 {0, Y4, X5, Y5, X6, Y6, X7, Y7, X8, Z0^X4^Y4, Y8, X9, Y9, 0, 0, 0, 0, }, //38 {0, Y3, Y4, X5, Y5, X6, Y6, X7, Y7, Z0^X4^Y4, X8, Y8, X9, 0, 0, 0, 0, }, //39 {0, X3, Y3, Y4, X5, Y5, X6, Y6, X7, Z0^X4^Y4, Y7, X8, Y8, 0, 0, 0, 0, }, //40 {0, Y2, X3, Y3, Y4, X5, Y5, X6, Y6, Z0^X4^Y4, X7, Y7, X8, 0, 0, 0, 0, }, //41 {0, X2, Y2, X3, Y3, Y4, X5, Y5, X6, Z0^X4^Y4, Y6, X7, Y7, 0, 0, 0, 0, }, //42 {0, X5, Y5, X6, Y6, X7, Y7, X8, Y8, Y4^X5^Y5, Z0^X4^Y4, X9, Y9, 0, 0, 0, 0, }, //43 {0, Y3, X5, Y5, X6, Y6, X7, Y7, X8, Y4^X5^Y5, Z0^X4^Y4, Y8, X9, 0, 0, 0, 0, }, //44 {0, X3, Y3, X5, Y5, X6, Y6, X7, Y7, Y4^X5^Y5, Z0^X4^Y4, X8, Y8, 0, 0, 0, 0, }, //45 {0, Y2, X3, Y3, X5, Y5, X6, Y6, X7, Y4^X5^Y5, Z0^X4^Y4, Y7, X8, 0, 0, 0, 0, }, //46 {0, X2, Y2, X3, Y3, X5, Y5, X6, Y6, Y4^X5^Y5, Z0^X4^Y4, X7, Y7, 0, 0, 0, 0, }, //47 {0, Y5, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, Y9, 0, 0, 0, 0, }, //48 {0, Y3, Y5, X6, Y6, X7, Y7, X8, Y8, Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, X9, 0, 0, 0, 0, }, //49 {0, X3, Y3, Y5, X6, Y6, X7, Y7, X8, Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, Y8, 0, 0, 0, 0, }, //50 {0, Y2, X3, Y3, Y5, X6, Y6, X7, Y7, Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, X8, 0, 0, 0, 0, }, //51 {0, X2, Y2, X3, Y3, Y5, X6, Y6, X7, Y4^X5^Y5, Z0^X4^Y4, X5^X6^Y6, Y7, 0, 0, 0, 0, }, //52 {0, X5, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X6^Y6, Z1^X4^Y4, X5^Y5, Y9, 0, 0, 0, 0, }, //53 {0, Y3, X5, X6, Y6, X7, Y7, X8, Y8, Y4^X6^Y6, Z1^X4^Y4, X5^Y5, X9, 0, 0, 0, 0, }, //54 {0, X3, Y3, X5, X6, Y6, X7, Y7, X8, Y4^X6^Y6, Z1^X4^Y4, X5^Y5, Y8, 0, 0, 0, 0, }, //55 {0, Y2, X3, Y3, X5, X6, Y6, X7, Y7, Y4^X6^Y6, Z1^X4^Y4, X5^Y5, X8, 0, 0, 0, 0, }, //56 {0, X2, Y2, X3, Y3, X5, X6, Y6, X7, Y4^X6^Y6, Z1^X4^Y4, X5^Y5, Y7, 0, 0, 0, 0, }, //57 {0, X5, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y9, 0, 0, 0, 0, }, //58 {0, Y3, X5, X6, Y6, X7, Y7, X8, Y8, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X9, 0, 0, 0, 0, }, //59 {0, X3, Y3, X5, X6, Y6, X7, Y7, X8, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y8, 0, 0, 0, 0, }, //60 {0, Y2, X3, Y3, X5, X6, Y6, X7, Y7, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X8, 0, 0, 0, 0, }, //61 {0, X2, Y2, X3, Y3, X5, X6, Y6, X7, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y7, 0, 0, 0, 0, }, //62 {0, X6, Y6, X7, Y7, X8, Y8, X9, Y9, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^X7^Y7, 0, 0, 0, 0, }, //63 {0, Y3, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^X7^Y7, 0, 0, 0, 0, }, //64 {0, X3, Y3, X6, Y6, X7, Y7, X8, Y8, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^X7^Y7, 0, 0, 0, 0, }, //65 {0, Y2, X3, Y3, X6, Y6, X7, Y7, X8, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^X7^Y7, 0, 0, 0, 0, }, //66 {0, X2, Y2, X3, Y3, X6, Y6, X7, Y7, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^X7^Y7, 0, 0, 0, 0, }, //67 {0, X6, Y6, X7, Y7, X8, Y8, X9, Y9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X5^Y6, 0, 0, 0, 0, }, //68 {0, Y3, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X5^Y6, 0, 0, 0, 0, }, //69 {0, X3, Y3, X6, Y6, X7, Y7, X8, Y8, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X5^Y6, 0, 0, 0, 0, }, //70 {0, Y2, X3, Y3, X6, Y6, X7, Y7, X8, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X5^Y6, 0, 0, 0, 0, }, //71 {0, X2, Y2, X3, Y3, X6, Y6, X7, Y7, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X5^Y6, 0, 0, 0, 0, }, //72 {0, X6, Y6, X7, Y7, X8, Y8, X9, Y9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //73 {0, Y3, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //74 {0, X3, Y3, X6, Y6, X7, Y7, X8, Y8, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //75 {0, Y2, X3, Y3, X6, Y6, X7, Y7, X8, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //76 {0, X2, Y2, X3, Y3, X6, Y6, X7, Y7, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //77 {0, Y6, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, X6^X8^Y8, 0, 0, 0, }, //78 {0, Y3, Y6, X7, Y7, X8, Y8, X9, Y9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, X6^X8^Y8, 0, 0, 0, }, //79 {0, X3, Y3, Y6, X7, Y7, X8, Y8, X9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, X6^X8^Y8, 0, 0, 0, }, //80 {0, Y2, X3, Y3, Y6, X7, Y7, X8, Y8, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, X6^X8^Y8, 0, 0, 0, }, //81 {0, X2, Y2, Y3, X6, Y6, X7, Y7, X8, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, X3^X8^Y8, 0, 0, 0, }, //82 {0, X6, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, X6^Y6, 0, 0, 0, }, //83 {0, Y3, X6, X7, Y7, X8, Y8, X9, Y9, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, X6^Y6, 0, 0, 0, }, //84 {0, X3, Y3, X6, X7, Y7, X8, Y8, X9, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, X6^Y6, 0, 0, 0, }, //85 {0, Y2, X3, Y3, X6, X7, Y7, X8, Y8, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, X6^Y6, 0, 0, 0, }, //86 {0, X2, X3, Y3, X6, X7, Y7, Y2, X8, Y4^X8^Y8, Z2^X4^Y4, Z1^Y5^X7, Z0^X5^Y7, X6^Y6, 0, 0, 0, }, //87 {0, X6, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //88 {0, Y3, X6, X7, Y7, X8, Y8, X9, Y9, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //89 {0, X3, Y3, X6, X7, Y7, X8, Y8, X9, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //90 {0, Y2, X3, Y3, X6, X7, Y7, X8, Y8, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //91 {0, X2, X3, Y3, X6, X7, Y7, Y2, X8, Y4^X8^Y8, Z2^X4^Y4, Z1^Y5^X7, Z0^X5^Y7, Y2^X6^Y6, 0, 0, 0, }, //92 {0, X7, Y7, X8, Y8, X9, Y9, X10, Y10, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, X6^X9^Y9, 0, 0, }, //93 {0, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, X6^X9^Y9, 0, 0, }, //94 {0, X3, Y3, X7, Y7, X8, Y8, X9, Y9, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, X6^X9^Y9, 0, 0, }, //95 {0, Y2, Y3, X6, X7, Y7, X8, Y8, X9, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, X3^X9^Y9, 0, 0, }, //96 {0, X2, Y3, X6, X7, Y7, X8, Y2, Y8, Y4^X8^Y8, Z2^X4^Y4, Z1^Y5^X7, Z0^X5^Y7, Y2^X6^Y6, X3^X9^Y9, 0, 0, }, //97 {0, X7, Y7, X8, Y8, X9, Y9, X10, Y10, Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, X6^Y7, 0, 0, }, //98 {0, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, X6^Y7, 0, 0, }, //99 {0, X3, Y3, X7, Y7, X8, Y8, X9, Y9, Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, X6^Y7, 0, 0, }, //100 {0, X3, Y3, X7, Y7, X8, Y8, Y2, X9, Y4^X9^Y9, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, Y2^Y6^X7, X6^Y7, 0, 0, }, //101 {0, X3, Y3, X7, Y7, X8, Y8, X2, Y2, Y4^X9^Y9, Z2^X4^Y4, Z1^Y5^X8, Z0^X5^Y8, Y2^Y6^X7, X6^Y7, 0, 0, }, //102 {0, X7, Y7, X8, Y8, X9, Y9, X10, Y10, Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, 0, 0, }, //103 {0, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, 0, 0, }, //104 {0, X3, Y3, X7, Y7, X8, Y8, X9, Y9, Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, 0, 0, }, //105 {0, X3, Y3, X7, Y7, X8, Y8, Y2, X9, Y4^X9^Y9, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, Y2^Y6^X7, Z0^X6^Y7, 0, 0, }, //106 {0, X3, Y3, X7, Y7, X8, Y8, X2, Y2, Y4^X9^Y9, Z2^X4^Y4, Z1^Y5^X8, Z0^X5^Y8, Y2^Y6^X7, X2^X6^Y7, 0, 0, }, //107 }; const UINT_64 HTILE_SW_PATTERN[][18] = { {0, 0, 0, X3, Y3, X4, Y4, X5, Y5, X6, Y6, X7, Y7, 0, 0, 0, 0, 0, }, //0 {0, 0, 0, X3, Y4, X4, X5, Y5, X6, Z0^X3^Y3, Y6, X7, Y7, 0, 0, 0, 0, 0, }, //1 {0, 0, 0, X3, Y4, X5, Y5, X6, Y6, Z1^X3^Y3, Z0^X4^Y4, X7, Y7, X8, 0, 0, 0, 0, }, //2 {0, 0, 0, X3, Y4, Y5, X6, Y6, X7, Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, Y7, X8, Y8, 0, 0, 0, }, //3 {0, 0, 0, X3, Y4, X6, Y6, X7, Y7, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, X8, Y8, X9, 0, 0, }, //4 {0, 0, 0, X3, Y4, X6, Y6, X7, Y7, Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X6, X5^Y6, X8, Y8, X9, 0, 0, }, //5 {0, 0, 0, X3, Y4, Y6, X7, Y7, X8, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, Y8, X9, Y9, 0, }, //6 {0, 0, 0, X3, Y4, Y6, X7, Y7, X8, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X7, Z0^X5^Y7, X6^Y6, Y8, X9, Y9, 0, }, //7 {0, 0, 0, X3, Y4, Y6, X7, Y7, X8, Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, X6^Y6, Y8, X9, Y9, 0, }, //8 {0, 0, 0, X3, Y4, X7, Y7, X8, Y8, X3^Y3^Z5, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, X9, Y9, X10, }, //9 {0, 0, 0, X3, Y4, X7, Y7, X8, Y8, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, Z0^Y6^X7, X6^Y7, X9, Y9, X10, }, //10 {0, 0, 0, X3, Y4, X7, Y7, X8, Y8, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X8, Z0^X5^Y8, Y6^X7, X6^Y7, X9, Y9, X10, }, //11 {0, 0, 0, X3, Y4, X7, Y7, X8, Y8, Z2^X3^Y3, Z1^X4^Y4, Z0^Y5^X8, X5^Y8, Y6^X7, X6^Y7, X9, Y9, X10, }, //12 {0, 0, 0, X3, Y3, Y4, X5, Y5, X6, Z0^X4^Y4, Y6, X7, Y7, 0, 0, 0, 0, 0, }, //13 {0, 0, 0, X3, Y3, X5, Y5, X6, Y6, Y4^X5^Y5, Z0^X4^Y4, X7, Y7, X8, 0, 0, 0, 0, }, //14 {0, 0, 0, X3, Y3, Y5, X6, Y6, X7, Y4^X5^Y5, Z0^X4^Y4, X5^Y5, Y7, X8, Y8, 0, 0, 0, }, //15 {0, 0, 0, X3, Y3, X5, X6, Y6, X7, Y4^X6^Y6, Z1^X4^Y4, Y7, X8, Y8, X5^Y5, 0, 0, 0, }, //16 {0, 0, 0, X3, Y3, X5, X6, Y6, X7, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, Y7, X8, Y8, 0, 0, 0, }, //17 {0, 0, 0, X3, Y3, X6, Y6, X7, Y7, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^Y6, X8, Y8, X9, 0, 0, }, //18 {0, 0, 0, X3, Y3, Y4, X5, X6, Y6, Z1^X4^Y4, Z0^X5^Y5, X7, Y7, X8, 0, 0, 0, 0, }, //19 {0, 0, 0, X3, Y3, X6, Y6, X7, Y7, Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X8, Y8, X9, X5^Y6, 0, 0, }, //20 {0, 0, 0, X3, Y3, X6, Y6, X7, Y7, Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X5^Y6, X8, Y8, X9, 0, 0, }, //21 {0, 0, 0, X3, Y3, Y6, X7, Y7, X8, Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X5^Y6, X6^Y6, Y8, X9, Y9, 0, }, //22 {0, 0, 0, X3, Y3, Y4, X6, Y6, X7, Z1^X4^Y4, Z0^Y5^X6, X5^Y6, Y7, X8, Y8, 0, 0, 0, }, //23 {0, 0, 0, X3, Y3, X6, X7, Y7, X8, Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, Y8, X9, Y9, X6^Y6, 0, }, //24 {0, 0, 0, X3, Y3, X6, X7, Y7, X8, Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, X6^Y6, Y8, X9, Y9, 0, }, //25 {0, 0, 0, X3, Y3, X7, Y7, X8, Y8, Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, X6^Y6, X6^Y8, X9, Y9, X10, }, //26 {0, 0, 0, X3, Y3, Y4, X6, X7, Y7, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, X6^Y6, X8, Y8, X9, 0, 0, }, //27 {0, 0, 0, X3, Y3, X7, Y7, X8, Y8, Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, X5^Y8, Y6^X7, X9, Y9, X10, X6^Y7, }, //28 {0, 0, 0, X3, Y3, X7, Y7, X8, Y8, Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, X5^Y8, Y6^X7, X6^Y7, X9, Y9, X10, }, //29 }; const UINT_64 CMASK_SW_PATTERN[][17] = { {X3, Y3, X4, Y4, X5, Y5, X6, Y6, X7, Y7, X8, Y8, X9, 0, 0, 0, 0, }, //0 {X3, Y4, X4, X5, Y5, X6, Y6, X7, Y7, Z0^X3^Y3, X8, Y8, X9, 0, 0, 0, 0, }, //1 {X3, Y4, X5, Y5, X6, Y6, X7, Y7, X8, Z1^X3^Y3, Z0^X4^Y4, Y8, X9, 0, 0, 0, 0, }, //2 {X3, Y4, Y5, X6, Y6, X7, Y7, X8, Y8, Z2^X3^Y3, Z1^X4^Y4, Z0^X5^Y5, X9, 0, 0, 0, 0, }, //3 {X3, Y4, X6, Y6, X7, Y7, X8, Y8, X9, X3^Y3^Z3, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //4 {X3, Y4, Y6, X7, Y7, X8, Y8, X9, Y9, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //5 {X3, Y4, X7, Y7, X8, Y8, X9, Y9, X10, X3^Y3^Z5, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, 0, 0, }, //6 {X3, Y4, X7, Y7, X8, Y8, X9, Y9, X10, X3^Y3^Z4, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, Y6^X7, Z0^X6^Y7, 0, 0, }, //7 {X3, Y3, Y4, X5, Y5, X6, Y6, X7, Y7, Z0^X4^Y4, X8, Y8, X9, 0, 0, 0, 0, }, //8 {X3, Y3, X5, Y5, X6, Y6, X7, Y7, X8, Y4^X5^Y5, Z0^X4^Y4, Y8, X9, 0, 0, 0, 0, }, //9 {X3, Y3, Y5, X6, Y6, X7, Y7, X8, Y8, Y4^X5^Y5, Z0^X4^Y4, X5^Y5, X9, 0, 0, 0, 0, }, //10 {X3, Y3, X5, X6, Y6, X7, Y7, X8, Y8, Y4^X6^Y6, Z1^X4^Y4, X5^Y5, X9, 0, 0, 0, 0, }, //11 {X3, Y3, X5, X6, Y6, X7, Y7, X8, Y8, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X9, 0, 0, 0, 0, }, //12 {X3, Y3, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X6^Y6, Z1^X4^Y4, Z0^X5^Y5, X5^Y6, 0, 0, 0, 0, }, //13 {X3, Y3, Y4, X5, X6, Y6, X7, Y7, X8, Z1^X4^Y4, Z0^X5^Y5, Y8, X9, 0, 0, 0, 0, }, //14 {X3, Y3, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X5^Y6, 0, 0, 0, 0, }, //15 {X3, Y3, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, X5^Y6, 0, 0, 0, 0, }, //16 {X3, Y3, X6, Y6, X7, Y7, X8, Y8, X9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, 0, 0, 0, 0, }, //17 {X3, Y3, Y6, X7, Y7, X8, Y8, X9, Y9, Y4^X7^Y7, Z1^X4^Y4, Z0^Y5^X6, X5^Y6, X6^Y6, 0, 0, 0, }, //18 {X3, Y3, Y6, X7, Y7, X8, Y8, X9, Y9, Y4^X7^Y7, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, X6^Y6, 0, 0, 0, }, //19 {X3, Y3, Y4, X6, Y6, X7, Y7, X8, Y8, Z1^X4^Y4, Z0^Y5^X6, X5^Y6, X9, 0, 0, 0, 0, }, //20 {X3, Y3, Y4, X6, Y6, X7, Y7, X8, Y8, Z2^X4^Y4, Z1^Y5^X6, Z0^X5^Y6, X9, 0, 0, 0, 0, }, //21 {X3, Y3, X6, X7, Y7, X8, Y8, X9, Y9, Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, X6^Y6, 0, 0, 0, }, //22 {X3, Y3, X6, X7, Y7, X8, Y8, X9, Y9, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, X6^Y6, 0, 0, 0, }, //23 {X3, Y3, X6, X7, Y7, X8, Y8, X9, Y9, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, }, //24 {X3, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X8^Y8, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, X6^Y6, X6^Y8, 0, 0, }, //25 {X3, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, X6^Y8, 0, 0, }, //26 {X3, Y3, Y4, X6, X7, Y7, X8, Y8, X9, Z1^X4^Y4, Z0^Y5^X7, X5^Y7, X6^Y6, 0, 0, 0, 0, }, //27 {X3, Y3, Y4, X6, X7, Y7, X8, Y8, X9, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, 0, 0, 0, 0, }, //28 {X3, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X9^Y9, Z1^X4^Y4, Z0^Y5^X8, X5^Y8, Y6^X7, X6^Y7, 0, 0, }, //29 {X3, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, X6^Y7, 0, 0, }, //30 {X3, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X9^Y9, X4^Y4^Z4, Z3^Y5^X8, Z2^X5^Y8, Z1^Y6^X7, Z0^X6^Y7, 0, 0, }, //31 {X3, Y3, X6, X7, Y7, X8, X9, Y9, X10, Y4^X8^Y8, Z3^X4^Y4, Z2^Y5^X7, Z1^X5^Y7, Z0^X6^Y6, X3^Y8, 0, 0, }, //32 {X3, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X9^Y9, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, Y6^X7, X6^Y7, 0, 0, }, //33 {X3, Y3, X7, Y7, X8, Y8, X9, Y9, X10, Y4^X9^Y9, Z3^X4^Y4, Z2^Y5^X8, Z1^X5^Y8, Y6^X7, Z0^X6^Y7, 0, 0, }, //34 }; } // V2 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/gfx10/gfx10addrlib.cpp000066400000000000000000005044061420110115200245750ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** ************************************************************************************************************************ * @file gfx10addrlib.cpp * @brief Contain the implementation for the Gfx10Lib class. ************************************************************************************************************************ */ #include "gfx10addrlib.h" #include "gfx10_gb_reg.h" #include "amdgpu_asic_addr.h" //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// namespace rocr { namespace Addr { /** ************************************************************************************************************************ * Gfx10HwlInit * * @brief * Creates an Gfx10Lib object. * * @return * Returns an Gfx10Lib object pointer. ************************************************************************************************************************ */ Addr::Lib* Gfx10HwlInit(const Client* pClient) { return V2::Gfx10Lib::CreateObj(pClient); } namespace V2 { //////////////////////////////////////////////////////////////////////////////////////////////////// // Static Const Member //////////////////////////////////////////////////////////////////////////////////////////////////// const SwizzleModeFlags Gfx10Lib::SwizzleModeTable[ADDR_SW_MAX_TYPE] = {//Linear 256B 4KB 64KB Var Z Std Disp Rot XOR T RtOpt Reserved {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // ADDR_SW_LINEAR {0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0}, // ADDR_SW_256B_S {0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}, // ADDR_SW_256B_D {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0}, // ADDR_SW_4KB_S {0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}, // ADDR_SW_4KB_D {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0}, // ADDR_SW_64KB_S {0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0}, // ADDR_SW_64KB_D {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0}, // ADDR_SW_64KB_S_T {0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0}, // ADDR_SW_64KB_D_T {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0}, // ADDR_SW_4KB_S_X {0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0}, // ADDR_SW_4KB_D_X {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0}, // ADDR_SW_64KB_Z_X {0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0}, // ADDR_SW_64KB_S_X {0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0}, // ADDR_SW_64KB_D_X {0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0}, // ADDR_SW_64KB_R_X {0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0}, // ADDR_SW_VAR_Z_X {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0}, // ADDR_SW_VAR_R_X {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // ADDR_SW_LINEAR_GENERAL }; const Dim3d Gfx10Lib::Block256_3d[] = {{8, 4, 8}, {4, 4, 8}, {4, 4, 4}, {4, 2, 4}, {2, 2, 4}}; const Dim3d Gfx10Lib::Block64K_Log2_3d[] = {{6, 5, 5}, {5, 5, 5}, {5, 5, 4}, {5, 4, 4}, {4, 4, 4}}; const Dim3d Gfx10Lib::Block4K_Log2_3d[] = {{4, 4, 4}, {3, 4, 4}, {3, 4, 3}, {3, 3, 3}, {2, 3, 3}}; /** ************************************************************************************************************************ * Gfx10Lib::Gfx10Lib * * @brief * Constructor * ************************************************************************************************************************ */ Gfx10Lib::Gfx10Lib(const Client* pClient) : Lib(pClient), m_colorBaseIndex(0), m_xmaskBaseIndex(0), m_dccBaseIndex(0) { m_class = AI_ADDRLIB; memset(&m_settings, 0, sizeof(m_settings)); memcpy(m_swizzleModeTable, SwizzleModeTable, sizeof(SwizzleModeTable)); } /** ************************************************************************************************************************ * Gfx10Lib::~Gfx10Lib * * @brief * Destructor ************************************************************************************************************************ */ Gfx10Lib::~Gfx10Lib() { } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeHtileInfo * * @brief * Interface function stub of AddrComputeHtilenfo * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeHtileInfo( const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE ret = ADDR_OK; if (((pIn->swizzleMode != ADDR_SW_64KB_Z_X) && ((pIn->swizzleMode != ADDR_SW_VAR_Z_X) || (m_blockVarSizeLog2 == 0))) || (pIn->hTileFlags.pipeAligned != TRUE)) { ret = ADDR_INVALIDPARAMS; } else { Dim3d metaBlk = {0}; const UINT_32 metaBlkSize = GetMetaBlkSize(Gfx10DataDepthStencil, ADDR_RSRC_TEX_2D, pIn->swizzleMode, 0, 0, TRUE, &metaBlk); pOut->pitch = PowTwoAlign(pIn->unalignedWidth, metaBlk.w); pOut->height = PowTwoAlign(pIn->unalignedHeight, metaBlk.h); pOut->baseAlign = Max(metaBlkSize, 1u << (m_pipesLog2 + 11u)); pOut->metaBlkWidth = metaBlk.w; pOut->metaBlkHeight = metaBlk.h; if (pIn->numMipLevels > 1) { ADDR_ASSERT(pIn->firstMipIdInTail <= pIn->numMipLevels); UINT_32 offset = (pIn->firstMipIdInTail == pIn->numMipLevels) ? 0 : metaBlkSize; for (INT_32 i = static_cast(pIn->firstMipIdInTail) - 1; i >=0; i--) { UINT_32 mipWidth, mipHeight; GetMipSize(pIn->unalignedWidth, pIn->unalignedHeight, 1, i, &mipWidth, &mipHeight); mipWidth = PowTwoAlign(mipWidth, metaBlk.w); mipHeight = PowTwoAlign(mipHeight, metaBlk.h); const UINT_32 pitchInM = mipWidth / metaBlk.w; const UINT_32 heightInM = mipHeight / metaBlk.h; const UINT_32 mipSliceSize = pitchInM * heightInM * metaBlkSize; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[i].inMiptail = FALSE; pOut->pMipInfo[i].offset = offset; pOut->pMipInfo[i].sliceSize = mipSliceSize; } offset += mipSliceSize; } pOut->sliceSize = offset; pOut->metaBlkNumPerSlice = offset / metaBlkSize; pOut->htileBytes = pOut->sliceSize * pIn->numSlices; if (pOut->pMipInfo != NULL) { for (UINT_32 i = pIn->firstMipIdInTail; i < pIn->numMipLevels; i++) { pOut->pMipInfo[i].inMiptail = TRUE; pOut->pMipInfo[i].offset = 0; pOut->pMipInfo[i].sliceSize = 0; } if (pIn->firstMipIdInTail != pIn->numMipLevels) { pOut->pMipInfo[pIn->firstMipIdInTail].sliceSize = metaBlkSize; } } } else { const UINT_32 pitchInM = pOut->pitch / metaBlk.w; const UINT_32 heightInM = pOut->height / metaBlk.h; pOut->metaBlkNumPerSlice = pitchInM * heightInM; pOut->sliceSize = pOut->metaBlkNumPerSlice * metaBlkSize; pOut->htileBytes = pOut->sliceSize * pIn->numSlices; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[0].inMiptail = FALSE; pOut->pMipInfo[0].offset = 0; pOut->pMipInfo[0].sliceSize = pOut->sliceSize; } } } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeCmaskInfo * * @brief * Interface function stub of AddrComputeCmaskInfo * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeCmaskInfo( const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE ret = ADDR_OK; if ((pIn->resourceType != ADDR_RSRC_TEX_2D) || (pIn->cMaskFlags.pipeAligned != TRUE) || ((pIn->swizzleMode != ADDR_SW_64KB_Z_X) && ((pIn->swizzleMode != ADDR_SW_VAR_Z_X) || (m_blockVarSizeLog2 == 0)))) { ret = ADDR_INVALIDPARAMS; } else { Dim3d metaBlk = {0}; const UINT_32 metaBlkSize = GetMetaBlkSize(Gfx10DataFmask, ADDR_RSRC_TEX_2D, pIn->swizzleMode, 0, 0, TRUE, &metaBlk); pOut->pitch = PowTwoAlign(pIn->unalignedWidth, metaBlk.w); pOut->height = PowTwoAlign(pIn->unalignedHeight, metaBlk.h); pOut->baseAlign = metaBlkSize; pOut->metaBlkWidth = metaBlk.w; pOut->metaBlkHeight = metaBlk.h; if (pIn->numMipLevels > 1) { ADDR_ASSERT(pIn->firstMipIdInTail <= pIn->numMipLevels); UINT_32 metaBlkPerSlice = (pIn->firstMipIdInTail == pIn->numMipLevels) ? 0 : 1; for (INT_32 i = static_cast(pIn->firstMipIdInTail) - 1; i >= 0; i--) { UINT_32 mipWidth, mipHeight; GetMipSize(pIn->unalignedWidth, pIn->unalignedHeight, 1, i, &mipWidth, &mipHeight); mipWidth = PowTwoAlign(mipWidth, metaBlk.w); mipHeight = PowTwoAlign(mipHeight, metaBlk.h); const UINT_32 pitchInM = mipWidth / metaBlk.w; const UINT_32 heightInM = mipHeight / metaBlk.h; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[i].inMiptail = FALSE; pOut->pMipInfo[i].offset = metaBlkPerSlice * metaBlkSize; pOut->pMipInfo[i].sliceSize = pitchInM * heightInM * metaBlkSize; } metaBlkPerSlice += pitchInM * heightInM; } pOut->metaBlkNumPerSlice = metaBlkPerSlice; if (pOut->pMipInfo != NULL) { for (UINT_32 i = pIn->firstMipIdInTail; i < pIn->numMipLevels; i++) { pOut->pMipInfo[i].inMiptail = TRUE; pOut->pMipInfo[i].offset = 0; pOut->pMipInfo[i].sliceSize = 0; } if (pIn->firstMipIdInTail != pIn->numMipLevels) { pOut->pMipInfo[pIn->firstMipIdInTail].sliceSize = metaBlkSize; } } } else { const UINT_32 pitchInM = pOut->pitch / metaBlk.w; const UINT_32 heightInM = pOut->height / metaBlk.h; pOut->metaBlkNumPerSlice = pitchInM * heightInM; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[0].inMiptail = FALSE; pOut->pMipInfo[0].offset = 0; pOut->pMipInfo[0].sliceSize = pOut->metaBlkNumPerSlice * metaBlkSize; } } pOut->sliceSize = pOut->metaBlkNumPerSlice * metaBlkSize; pOut->cmaskBytes = pOut->sliceSize * pIn->numSlices; } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeDccInfo * * @brief * Interface function to compute DCC key info * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeDccInfo( const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE ret = ADDR_OK; if (pIn->swizzleMode != ADDR_SW_64KB_Z_X && pIn->swizzleMode != ADDR_SW_64KB_R_X) { // Hardware does not support DCC for this swizzle mode. ret = ADDR_INVALIDPARAMS; } else if (m_settings.dccUnsup3DSwDis && IsTex3d(pIn->resourceType) && IsDisplaySwizzle(pIn->swizzleMode)) { // DCC is not supported on 3D Display surfaces for GFX10.0 and GFX10.1 ret = ADDR_INVALIDPARAMS; } else { // only SW_*_R_X surfaces may be DCC compressed when attached to the CB ADDR_ASSERT(IsRtOptSwizzle(pIn->swizzleMode)); Dim3d metaBlk = {0}; const UINT_32 elemLog2 = Log2(pIn->bpp >> 3); const UINT_32 numFragLog2 = Log2(pIn->numFrags); const UINT_32 metaBlkSize = GetMetaBlkSize(Gfx10DataColor, pIn->resourceType, pIn->swizzleMode, elemLog2, numFragLog2, pIn->dccKeyFlags.pipeAligned, &metaBlk); const BOOL_32 isThick = IsThick(pIn->resourceType, pIn->swizzleMode); pOut->compressBlkWidth = isThick ? Block256_3d[elemLog2].w : Block256_2d[elemLog2].w; pOut->compressBlkHeight = isThick ? Block256_3d[elemLog2].h : Block256_2d[elemLog2].h; pOut->compressBlkDepth = isThick ? Block256_3d[elemLog2].d : 1; pOut->dccRamBaseAlign = metaBlkSize; pOut->metaBlkWidth = metaBlk.w; pOut->metaBlkHeight = metaBlk.h; pOut->metaBlkDepth = metaBlk.d; pOut->pitch = PowTwoAlign(pIn->unalignedWidth, metaBlk.w); pOut->height = PowTwoAlign(pIn->unalignedHeight, metaBlk.h); pOut->depth = PowTwoAlign(pIn->numSlices, metaBlk.d); if (pIn->numMipLevels > 1) { ADDR_ASSERT(pIn->firstMipIdInTail <= pIn->numMipLevels); UINT_32 offset = (pIn->firstMipIdInTail == pIn->numMipLevels) ? 0 : metaBlkSize; for (INT_32 i = static_cast(pIn->firstMipIdInTail) - 1; i >= 0; i--) { UINT_32 mipWidth, mipHeight; GetMipSize(pIn->unalignedWidth, pIn->unalignedHeight, 1, i, &mipWidth, &mipHeight); mipWidth = PowTwoAlign(mipWidth, metaBlk.w); mipHeight = PowTwoAlign(mipHeight, metaBlk.h); const UINT_32 pitchInM = mipWidth / metaBlk.w; const UINT_32 heightInM = mipHeight / metaBlk.h; const UINT_32 mipSliceSize = pitchInM * heightInM * metaBlkSize; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[i].inMiptail = FALSE; pOut->pMipInfo[i].offset = offset; pOut->pMipInfo[i].sliceSize = mipSliceSize; } offset += mipSliceSize; } pOut->dccRamSliceSize = offset; pOut->metaBlkNumPerSlice = offset / metaBlkSize; pOut->dccRamSize = pOut->dccRamSliceSize * (pOut->depth / metaBlk.d); if (pOut->pMipInfo != NULL) { for (UINT_32 i = pIn->firstMipIdInTail; i < pIn->numMipLevels; i++) { pOut->pMipInfo[i].inMiptail = TRUE; pOut->pMipInfo[i].offset = 0; pOut->pMipInfo[i].sliceSize = 0; } if (pIn->firstMipIdInTail != pIn->numMipLevels) { pOut->pMipInfo[pIn->firstMipIdInTail].sliceSize = metaBlkSize; } } } else { const UINT_32 pitchInM = pOut->pitch / metaBlk.w; const UINT_32 heightInM = pOut->height / metaBlk.h; pOut->metaBlkNumPerSlice = pitchInM * heightInM; pOut->dccRamSliceSize = pOut->metaBlkNumPerSlice * metaBlkSize; pOut->dccRamSize = pOut->dccRamSliceSize * (pOut->depth / metaBlk.d); if (pOut->pMipInfo != NULL) { pOut->pMipInfo[0].inMiptail = FALSE; pOut->pMipInfo[0].offset = 0; pOut->pMipInfo[0].sliceSize = pOut->dccRamSliceSize; } } } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeCmaskAddrFromCoord * * @brief * Interface function stub of AddrComputeCmaskAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeCmaskAddrFromCoord( const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] output structure { // Only support pipe aligned CMask ADDR_ASSERT(pIn->cMaskFlags.pipeAligned == TRUE); ADDR2_COMPUTE_CMASK_INFO_INPUT input = {}; input.size = sizeof(input); input.cMaskFlags = pIn->cMaskFlags; input.colorFlags = pIn->colorFlags; input.unalignedWidth = Max(pIn->unalignedWidth, 1u); input.unalignedHeight = Max(pIn->unalignedHeight, 1u); input.numSlices = Max(pIn->numSlices, 1u); input.swizzleMode = pIn->swizzleMode; input.resourceType = pIn->resourceType; ADDR2_COMPUTE_CMASK_INFO_OUTPUT output = {}; output.size = sizeof(output); ADDR_E_RETURNCODE returnCode = ComputeCmaskInfo(&input, &output); if (returnCode == ADDR_OK) { const UINT_32 fmaskBpp = GetFmaskBpp(pIn->numSamples, pIn->numFrags); const UINT_32 fmaskElemLog2 = Log2(fmaskBpp >> 3); const UINT_32 pipeMask = (1 << m_pipesLog2) - 1; const UINT_32 index = m_xmaskBaseIndex + fmaskElemLog2; const UINT_8* patIdxTable = (pIn->swizzleMode == ADDR_SW_VAR_Z_X) ? CMASK_VAR_RBPLUS_PATIDX : (m_settings.supportRbPlus ? CMASK_64K_RBPLUS_PATIDX : CMASK_64K_PATIDX); const UINT_32 blkSizeLog2 = Log2(output.metaBlkWidth) + Log2(output.metaBlkHeight) - 7; const UINT_32 blkMask = (1 << blkSizeLog2) - 1; const UINT_32 blkOffset = ComputeOffsetFromSwizzlePattern(CMASK_SW_PATTERN[patIdxTable[index]], blkSizeLog2 + 1, // +1 for nibble offset pIn->x, pIn->y, pIn->slice, 0); const UINT_32 xb = pIn->x / output.metaBlkWidth; const UINT_32 yb = pIn->y / output.metaBlkHeight; const UINT_32 pb = output.pitch / output.metaBlkWidth; const UINT_32 blkIndex = (yb * pb) + xb; const UINT_32 pipeXor = ((pIn->pipeXor & pipeMask) << m_pipeInterleaveLog2) & blkMask; pOut->addr = (output.sliceSize * pIn->slice) + (blkIndex * (1 << blkSizeLog2)) + ((blkOffset >> 1) ^ pipeXor); pOut->bitPosition = (blkOffset & 1) << 2; } return returnCode; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeHtileAddrFromCoord * * @brief * Interface function stub of AddrComputeHtileAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeHtileAddrFromCoord( const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pIn->numMipLevels > 1) { returnCode = ADDR_NOTIMPLEMENTED; } else { ADDR2_COMPUTE_HTILE_INFO_INPUT input = {0}; input.size = sizeof(input); input.hTileFlags = pIn->hTileFlags; input.depthFlags = pIn->depthflags; input.swizzleMode = pIn->swizzleMode; input.unalignedWidth = Max(pIn->unalignedWidth, 1u); input.unalignedHeight = Max(pIn->unalignedHeight, 1u); input.numSlices = Max(pIn->numSlices, 1u); input.numMipLevels = 1; ADDR2_COMPUTE_HTILE_INFO_OUTPUT output = {0}; output.size = sizeof(output); returnCode = ComputeHtileInfo(&input, &output); if (returnCode == ADDR_OK) { const UINT_32 numSampleLog2 = Log2(pIn->numSamples); const UINT_32 pipeMask = (1 << m_pipesLog2) - 1; const UINT_32 index = m_xmaskBaseIndex + numSampleLog2; const UINT_8* patIdxTable = m_settings.supportRbPlus ? HTILE_RBPLUS_PATIDX : HTILE_PATIDX; const UINT_32 blkSizeLog2 = Log2(output.metaBlkWidth) + Log2(output.metaBlkHeight) - 4; const UINT_32 blkMask = (1 << blkSizeLog2) - 1; const UINT_32 blkOffset = ComputeOffsetFromSwizzlePattern(HTILE_SW_PATTERN[patIdxTable[index]], blkSizeLog2 + 1, // +1 for nibble offset pIn->x, pIn->y, pIn->slice, 0); const UINT_32 xb = pIn->x / output.metaBlkWidth; const UINT_32 yb = pIn->y / output.metaBlkHeight; const UINT_32 pb = output.pitch / output.metaBlkWidth; const UINT_32 blkIndex = (yb * pb) + xb; const UINT_32 pipeXor = ((pIn->pipeXor & pipeMask) << m_pipeInterleaveLog2) & blkMask; pOut->addr = (static_cast(output.sliceSize) * pIn->slice) + (blkIndex * (1 << blkSizeLog2)) + ((blkOffset >> 1) ^ pipeXor); } } return returnCode; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeHtileCoordFromAddr * * @brief * Interface function stub of AddrComputeHtileCoordFromAddr * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeHtileCoordFromAddr( const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut) ///< [out] output structure { ADDR_NOT_IMPLEMENTED(); return ADDR_OK; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeDccAddrFromCoord * * @brief * Interface function stub of AddrComputeDccAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeDccAddrFromCoord( const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode = ADDR_OK; if ((pIn->resourceType != ADDR_RSRC_TEX_2D) || (pIn->swizzleMode != ADDR_SW_64KB_R_X) || (pIn->dccKeyFlags.linear == TRUE) || (pIn->numFrags > 1) || (pIn->numMipLevels > 1) || (pIn->mipId > 0)) { returnCode = ADDR_NOTSUPPORTED; } else { const UINT_32 elemLog2 = Log2(pIn->bpp >> 3); const UINT_32 numPipeLog2 = m_pipesLog2; const UINT_32 pipeMask = (1 << numPipeLog2) - 1; UINT_32 index = m_dccBaseIndex + elemLog2; const UINT_8* patIdxTable; if (m_settings.supportRbPlus) { patIdxTable = DCC_64K_R_X_RBPLUS_PATIDX; if (pIn->dccKeyFlags.pipeAligned) { index += MaxNumOfBpp; if (m_numPkrLog2 < 2) { index += m_pipesLog2 * MaxNumOfBpp; } else { // 4 groups for "m_numPkrLog2 < 2" case index += 4 * MaxNumOfBpp; const UINT_32 dccPipePerPkr = 3; index += (m_numPkrLog2 - 2) * dccPipePerPkr * MaxNumOfBpp + (m_pipesLog2 - m_numPkrLog2) * MaxNumOfBpp; } } } else { patIdxTable = DCC_64K_R_X_PATIDX; if (pIn->dccKeyFlags.pipeAligned) { index += (numPipeLog2 + UnalignedDccType) * MaxNumOfBpp; } else { index += Min(numPipeLog2, UnalignedDccType - 1) * MaxNumOfBpp; } } const UINT_32 blkSizeLog2 = Log2(pIn->metaBlkWidth) + Log2(pIn->metaBlkHeight) + elemLog2 - 8; const UINT_32 blkMask = (1 << blkSizeLog2) - 1; const UINT_32 blkOffset = ComputeOffsetFromSwizzlePattern(DCC_64K_R_X_SW_PATTERN[patIdxTable[index]], blkSizeLog2 + 1, // +1 for nibble offset pIn->x, pIn->y, pIn->slice, 0); const UINT_32 xb = pIn->x / pIn->metaBlkWidth; const UINT_32 yb = pIn->y / pIn->metaBlkHeight; const UINT_32 pb = pIn->pitch / pIn->metaBlkWidth; const UINT_32 blkIndex = (yb * pb) + xb; const UINT_32 pipeXor = ((pIn->pipeXor & pipeMask) << m_pipeInterleaveLog2) & blkMask; pOut->addr = (static_cast(pIn->dccRamSliceSize) * pIn->slice) + (blkIndex * (1 << blkSizeLog2)) + ((blkOffset >> 1) ^ pipeXor); } return returnCode; } /** ************************************************************************************************************************ * Gfx10Lib::HwlInitGlobalParams * * @brief * Initializes global parameters * * @return * TRUE if all settings are valid * ************************************************************************************************************************ */ BOOL_32 Gfx10Lib::HwlInitGlobalParams( const ADDR_CREATE_INPUT* pCreateIn) ///< [in] create input { BOOL_32 valid = TRUE; GB_ADDR_CONFIG gbAddrConfig; gbAddrConfig.u32All = pCreateIn->regValue.gbAddrConfig; // These values are copied from CModel code switch (gbAddrConfig.bits.NUM_PIPES) { case ADDR_CONFIG_1_PIPE: m_pipes = 1; m_pipesLog2 = 0; break; case ADDR_CONFIG_2_PIPE: m_pipes = 2; m_pipesLog2 = 1; break; case ADDR_CONFIG_4_PIPE: m_pipes = 4; m_pipesLog2 = 2; break; case ADDR_CONFIG_8_PIPE: m_pipes = 8; m_pipesLog2 = 3; break; case ADDR_CONFIG_16_PIPE: m_pipes = 16; m_pipesLog2 = 4; break; case ADDR_CONFIG_32_PIPE: m_pipes = 32; m_pipesLog2 = 5; break; case ADDR_CONFIG_64_PIPE: m_pipes = 64; m_pipesLog2 = 6; break; default: ADDR_ASSERT_ALWAYS(); valid = FALSE; break; } switch (gbAddrConfig.bits.PIPE_INTERLEAVE_SIZE) { case ADDR_CONFIG_PIPE_INTERLEAVE_256B: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_256B; m_pipeInterleaveLog2 = 8; break; case ADDR_CONFIG_PIPE_INTERLEAVE_512B: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_512B; m_pipeInterleaveLog2 = 9; break; case ADDR_CONFIG_PIPE_INTERLEAVE_1KB: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_1KB; m_pipeInterleaveLog2 = 10; break; case ADDR_CONFIG_PIPE_INTERLEAVE_2KB: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_2KB; m_pipeInterleaveLog2 = 11; break; default: ADDR_ASSERT_ALWAYS(); valid = FALSE; break; } // Addr::V2::Lib::ComputePipeBankXor()/ComputeSlicePipeBankXor() requires pipe interleave to be exactly 8 bits, and // any larger value requires a post-process (left shift) on the output pipeBankXor bits. // And more importantly, SW AddrLib doesn't support sw equation/pattern for PI != 256 case. ADDR_ASSERT(m_pipeInterleaveBytes == ADDR_PIPEINTERLEAVE_256B); switch (gbAddrConfig.bits.MAX_COMPRESSED_FRAGS) { case ADDR_CONFIG_1_MAX_COMPRESSED_FRAGMENTS: m_maxCompFrag = 1; m_maxCompFragLog2 = 0; break; case ADDR_CONFIG_2_MAX_COMPRESSED_FRAGMENTS: m_maxCompFrag = 2; m_maxCompFragLog2 = 1; break; case ADDR_CONFIG_4_MAX_COMPRESSED_FRAGMENTS: m_maxCompFrag = 4; m_maxCompFragLog2 = 2; break; case ADDR_CONFIG_8_MAX_COMPRESSED_FRAGMENTS: m_maxCompFrag = 8; m_maxCompFragLog2 = 3; break; default: ADDR_ASSERT_ALWAYS(); valid = FALSE; break; } { // Skip unaligned case m_xmaskBaseIndex += MaxNumOfAA; m_xmaskBaseIndex += m_pipesLog2 * MaxNumOfAA; m_colorBaseIndex += m_pipesLog2 * MaxNumOfBpp; if (m_settings.supportRbPlus) { m_numPkrLog2 = gbAddrConfig.bits.NUM_PKRS; m_numSaLog2 = (m_numPkrLog2 > 0) ? (m_numPkrLog2 - 1) : 0; ADDR_ASSERT((m_numPkrLog2 <= m_pipesLog2) && ((m_pipesLog2 - m_numPkrLog2) <= 2)); ADDR_C_ASSERT(sizeof(HTILE_RBPLUS_PATIDX) / sizeof(HTILE_RBPLUS_PATIDX[0]) == sizeof(CMASK_64K_RBPLUS_PATIDX) / sizeof(CMASK_64K_RBPLUS_PATIDX[0])); if (m_numPkrLog2 >= 2) { m_colorBaseIndex += (2 * m_numPkrLog2 - 2) * MaxNumOfBpp; m_xmaskBaseIndex += (m_numPkrLog2 - 1) * 3 * MaxNumOfAA; } } else { const UINT_32 numPipeType = static_cast(ADDR_CONFIG_64_PIPE) - static_cast(ADDR_CONFIG_1_PIPE) + 1; ADDR_C_ASSERT(sizeof(HTILE_PATIDX) / sizeof(HTILE_PATIDX[0]) == (numPipeType + 1) * MaxNumOfAA); ADDR_C_ASSERT(sizeof(HTILE_PATIDX) / sizeof(HTILE_PATIDX[0]) == sizeof(CMASK_64K_PATIDX) / sizeof(CMASK_64K_PATIDX[0])); } } if (m_settings.supportRbPlus) { // VAR block size = 16K * num_pipes. For 4 pipe configuration, SW_VAR_* mode swizzle patterns are same as the // corresponding SW_64KB_* mode m_blockVarSizeLog2 = m_pipesLog2 + 14; } if (valid) { InitEquationTable(); } return valid; } /** ************************************************************************************************************************ * Gfx10Lib::HwlConvertChipFamily * * @brief * Convert familyID defined in atiid.h to ChipFamily and set m_chipFamily/m_chipRevision * @return * ChipFamily ************************************************************************************************************************ */ ChipFamily Gfx10Lib::HwlConvertChipFamily( UINT_32 chipFamily, ///< [in] chip family defined in atiih.h UINT_32 chipRevision) ///< [in] chip revision defined in "asic_family"_id.h { ChipFamily family = ADDR_CHIP_FAMILY_NAVI; m_settings.dccUnsup3DSwDis = 1; switch (chipFamily) { case FAMILY_NV: m_settings.isDcn2 = 1; if (ASICREV_IS_SIENNA_M(chipRevision)) { m_settings.supportRbPlus = 1; m_settings.dccUnsup3DSwDis = 0; } break; default: ADDR_ASSERT(!"Unknown chip family"); break; } m_settings.dsMipmapHtileFix = 1; if (ASICREV_IS_NAVI10_P(chipRevision)) { m_settings.dsMipmapHtileFix = 0; } m_configFlags.use32bppFor422Fmt = TRUE; return family; } /** ************************************************************************************************************************ * Gfx10Lib::GetBlk256SizeLog2 * * @brief * Get block 256 size * * @return * N/A ************************************************************************************************************************ */ void Gfx10Lib::GetBlk256SizeLog2( AddrResourceType resourceType, ///< [in] Resource type AddrSwizzleMode swizzleMode, ///< [in] Swizzle mode UINT_32 elemLog2, ///< [in] element size log2 UINT_32 numSamplesLog2, ///< [in] number of samples Dim3d* pBlock ///< [out] block size ) const { if (IsThin(resourceType, swizzleMode)) { UINT_32 blockBits = 8 - elemLog2; if (IsZOrderSwizzle(swizzleMode)) { blockBits -= numSamplesLog2; } pBlock->w = (blockBits >> 1) + (blockBits & 1); pBlock->h = (blockBits >> 1); pBlock->d = 0; } else { ADDR_ASSERT(IsThick(resourceType, swizzleMode)); UINT_32 blockBits = 8 - elemLog2; pBlock->d = (blockBits / 3) + (((blockBits % 3) > 0) ? 1 : 0); pBlock->w = (blockBits / 3) + (((blockBits % 3) > 1) ? 1 : 0); pBlock->h = (blockBits / 3); } } /** ************************************************************************************************************************ * Gfx10Lib::GetCompressedBlockSizeLog2 * * @brief * Get compress block size * * @return * N/A ************************************************************************************************************************ */ void Gfx10Lib::GetCompressedBlockSizeLog2( Gfx10DataType dataType, ///< [in] Data type AddrResourceType resourceType, ///< [in] Resource type AddrSwizzleMode swizzleMode, ///< [in] Swizzle mode UINT_32 elemLog2, ///< [in] element size log2 UINT_32 numSamplesLog2, ///< [in] number of samples Dim3d* pBlock ///< [out] block size ) const { if (dataType == Gfx10DataColor) { GetBlk256SizeLog2(resourceType, swizzleMode, elemLog2, numSamplesLog2, pBlock); } else { ADDR_ASSERT((dataType == Gfx10DataDepthStencil) || (dataType == Gfx10DataFmask)); pBlock->w = 3; pBlock->h = 3; pBlock->d = 0; } } /** ************************************************************************************************************************ * Gfx10Lib::GetMetaOverlapLog2 * * @brief * Get meta block overlap * * @return * N/A ************************************************************************************************************************ */ INT_32 Gfx10Lib::GetMetaOverlapLog2( Gfx10DataType dataType, ///< [in] Data type AddrResourceType resourceType, ///< [in] Resource type AddrSwizzleMode swizzleMode, ///< [in] Swizzle mode UINT_32 elemLog2, ///< [in] element size log2 UINT_32 numSamplesLog2 ///< [in] number of samples ) const { Dim3d compBlock; Dim3d microBlock; GetCompressedBlockSizeLog2(dataType, resourceType, swizzleMode, elemLog2, numSamplesLog2, &compBlock); GetBlk256SizeLog2(resourceType, swizzleMode, elemLog2, numSamplesLog2, µBlock); const INT_32 compSizeLog2 = compBlock.w + compBlock.h + compBlock.d; const INT_32 blk256SizeLog2 = microBlock.w + microBlock.h + microBlock.d; const INT_32 maxSizeLog2 = Max(compSizeLog2, blk256SizeLog2); const INT_32 numPipesLog2 = GetEffectiveNumPipes(); INT_32 overlap = numPipesLog2 - maxSizeLog2; if ((numPipesLog2 > 1) && m_settings.supportRbPlus) { overlap++; } // In 16Bpp 8xaa, we lose 1 overlap bit because the block size reduction eats into a pipe anchor bit (y4) if ((elemLog2 == 4) && (numSamplesLog2 == 3)) { overlap--; } overlap = Max(overlap, 0); return overlap; } /** ************************************************************************************************************************ * Gfx10Lib::Get3DMetaOverlapLog2 * * @brief * Get 3d meta block overlap * * @return * N/A ************************************************************************************************************************ */ INT_32 Gfx10Lib::Get3DMetaOverlapLog2( AddrResourceType resourceType, ///< [in] Resource type AddrSwizzleMode swizzleMode, ///< [in] Swizzle mode UINT_32 elemLog2 ///< [in] element size log2 ) const { Dim3d microBlock; GetBlk256SizeLog2(resourceType, swizzleMode, elemLog2, 0, µBlock); INT_32 overlap = GetEffectiveNumPipes() - static_cast(microBlock.w); if (m_settings.supportRbPlus) { overlap++; } if ((overlap < 0) || (IsStandardSwizzle(resourceType, swizzleMode) == TRUE)) { overlap = 0; } return overlap; } /** ************************************************************************************************************************ * Gfx10Lib::GetPipeRotateAmount * * @brief * Get pipe rotate amount * * @return * Pipe rotate amount ************************************************************************************************************************ */ INT_32 Gfx10Lib::GetPipeRotateAmount( AddrResourceType resourceType, ///< [in] Resource type AddrSwizzleMode swizzleMode ///< [in] Swizzle mode ) const { INT_32 amount = 0; if (m_settings.supportRbPlus && (m_pipesLog2 >= (m_numSaLog2 + 1)) && (m_pipesLog2 > 1)) { amount = ((m_pipesLog2 == (m_numSaLog2 + 1)) && IsRbAligned(resourceType, swizzleMode)) ? 1 : m_pipesLog2 - (m_numSaLog2 + 1); } return amount; } /** ************************************************************************************************************************ * Gfx10Lib::GetMetaBlkSize * * @brief * Get metadata block size * * @return * Meta block size ************************************************************************************************************************ */ UINT_32 Gfx10Lib::GetMetaBlkSize( Gfx10DataType dataType, ///< [in] Data type AddrResourceType resourceType, ///< [in] Resource type AddrSwizzleMode swizzleMode, ///< [in] Swizzle mode UINT_32 elemLog2, ///< [in] element size log2 UINT_32 numSamplesLog2, ///< [in] number of samples BOOL_32 pipeAlign, ///< [in] pipe align Dim3d* pBlock ///< [out] block size ) const { INT_32 metablkSizeLog2; const INT_32 metaElemSizeLog2 = GetMetaElementSizeLog2(dataType); const INT_32 metaCacheSizeLog2 = GetMetaCacheSizeLog2(dataType); const INT_32 compBlkSizeLog2 = (dataType == Gfx10DataColor) ? 8 : 6 + numSamplesLog2 + elemLog2; const INT_32 metaBlkSamplesLog2 = (dataType == Gfx10DataDepthStencil) ? numSamplesLog2 : Min(numSamplesLog2, m_maxCompFragLog2); const INT_32 dataBlkSizeLog2 = GetBlockSizeLog2(swizzleMode); INT_32 numPipesLog2 = m_pipesLog2; if (IsThin(resourceType, swizzleMode)) { if ((pipeAlign == FALSE) || (IsStandardSwizzle(resourceType, swizzleMode) == TRUE) || (IsDisplaySwizzle(resourceType, swizzleMode) == TRUE)) { if (pipeAlign) { metablkSizeLog2 = Max(static_cast(m_pipeInterleaveLog2) + numPipesLog2, 12); metablkSizeLog2 = Min(metablkSizeLog2, dataBlkSizeLog2); } else { metablkSizeLog2 = Min(dataBlkSizeLog2, 12); } } else { if (m_settings.supportRbPlus && (m_pipesLog2 == m_numSaLog2 + 1) && (m_pipesLog2 > 1)) { numPipesLog2++; } INT_32 pipeRotateLog2 = GetPipeRotateAmount(resourceType, swizzleMode); if (numPipesLog2 >= 4) { INT_32 overlapLog2 = GetMetaOverlapLog2(dataType, resourceType, swizzleMode, elemLog2, numSamplesLog2); // In 16Bpe 8xaa, we have an extra overlap bit if ((pipeRotateLog2 > 0) && (elemLog2 == 4) && (numSamplesLog2 == 3) && (IsZOrderSwizzle(swizzleMode) || (GetEffectiveNumPipes() > 3))) { overlapLog2++; } metablkSizeLog2 = metaCacheSizeLog2 + overlapLog2 + numPipesLog2; metablkSizeLog2 = Max(metablkSizeLog2, static_cast(m_pipeInterleaveLog2) + numPipesLog2); if (m_settings.supportRbPlus && IsRtOptSwizzle(swizzleMode) && (numPipesLog2 == 6) && (numSamplesLog2 == 3) && (m_maxCompFragLog2 == 3) && (metablkSizeLog2 < 15)) { metablkSizeLog2 = 15; } } else { metablkSizeLog2 = Max(static_cast(m_pipeInterleaveLog2) + numPipesLog2, 12); } if (dataType == Gfx10DataDepthStencil) { // For htile surfaces, pad meta block size to 2K * num_pipes metablkSizeLog2 = Max(metablkSizeLog2, 11 + numPipesLog2); } const INT_32 compFragLog2 = Min(m_maxCompFragLog2, numSamplesLog2); if (IsRtOptSwizzle(swizzleMode) && (compFragLog2 > 1) && (pipeRotateLog2 >= 1)) { const INT_32 tmp = 8 + m_pipesLog2 + Max(pipeRotateLog2, compFragLog2 - 1); metablkSizeLog2 = Max(metablkSizeLog2, tmp); } } const INT_32 metablkBitsLog2 = metablkSizeLog2 + compBlkSizeLog2 - elemLog2 - metaBlkSamplesLog2 - metaElemSizeLog2; pBlock->w = 1 << ((metablkBitsLog2 >> 1) + (metablkBitsLog2 & 1)); pBlock->h = 1 << (metablkBitsLog2 >> 1); pBlock->d = 1; } else { ADDR_ASSERT(IsThick(resourceType, swizzleMode)); if (pipeAlign) { if (m_settings.supportRbPlus && (m_pipesLog2 == m_numSaLog2 + 1) && (m_pipesLog2 > 1) && IsRbAligned(resourceType, swizzleMode)) { numPipesLog2++; } const INT_32 overlapLog2 = Get3DMetaOverlapLog2(resourceType, swizzleMode, elemLog2); metablkSizeLog2 = metaCacheSizeLog2 + overlapLog2 + numPipesLog2; metablkSizeLog2 = Max(metablkSizeLog2, static_cast(m_pipeInterleaveLog2) + numPipesLog2); metablkSizeLog2 = Max(metablkSizeLog2, 12); } else { metablkSizeLog2 = 12; } const INT_32 metablkBitsLog2 = metablkSizeLog2 + compBlkSizeLog2 - elemLog2 - metaBlkSamplesLog2 - metaElemSizeLog2; pBlock->w = 1 << ((metablkBitsLog2 / 3) + (((metablkBitsLog2 % 3) > 0) ? 1 : 0)); pBlock->h = 1 << ((metablkBitsLog2 / 3) + (((metablkBitsLog2 % 3) > 1) ? 1 : 0)); pBlock->d = 1 << (metablkBitsLog2 / 3); } return (1 << static_cast(metablkSizeLog2)); } /** ************************************************************************************************************************ * Gfx10Lib::ConvertSwizzlePatternToEquation * * @brief * Convert swizzle pattern to equation. * * @return * N/A ************************************************************************************************************************ */ VOID Gfx10Lib::ConvertSwizzlePatternToEquation( UINT_32 elemLog2, ///< [in] element bytes log2 AddrResourceType rsrcType, ///< [in] resource type AddrSwizzleMode swMode, ///< [in] swizzle mode const ADDR_SW_PATINFO* pPatInfo, ///< [in] swizzle pattern infor ADDR_EQUATION* pEquation) ///< [out] equation converted from swizzle pattern const { ADDR_BIT_SETTING fullSwizzlePattern[20]; GetSwizzlePatternFromPatternInfo(pPatInfo, fullSwizzlePattern); const ADDR_BIT_SETTING* pSwizzle = fullSwizzlePattern; const UINT_32 blockSizeLog2 = GetBlockSizeLog2(swMode); pEquation->numBits = blockSizeLog2; pEquation->stackedDepthSlices = FALSE; for (UINT_32 i = 0; i < elemLog2; i++) { pEquation->addr[i].channel = 0; pEquation->addr[i].valid = 1; pEquation->addr[i].index = i; } if (IsXor(swMode) == FALSE) { for (UINT_32 i = elemLog2; i < blockSizeLog2; i++) { ADDR_ASSERT(IsPow2(pSwizzle[i].value)); if (pSwizzle[i].x != 0) { ADDR_ASSERT(IsPow2(static_cast(pSwizzle[i].x))); pEquation->addr[i].channel = 0; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(pSwizzle[i].x) + elemLog2; } else if (pSwizzle[i].y != 0) { ADDR_ASSERT(IsPow2(static_cast(pSwizzle[i].y))); pEquation->addr[i].channel = 1; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(pSwizzle[i].y); } else { ADDR_ASSERT(pSwizzle[i].z != 0); ADDR_ASSERT(IsPow2(static_cast(pSwizzle[i].z))); pEquation->addr[i].channel = 2; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(pSwizzle[i].z); } pEquation->xor1[i].value = 0; pEquation->xor2[i].value = 0; } } else if (IsThin(rsrcType, swMode)) { Dim3d dim; ComputeThinBlockDimension(&dim.w, &dim.h, &dim.d, 8u << elemLog2, 0, rsrcType, swMode); const UINT_32 blkXLog2 = Log2(dim.w); const UINT_32 blkYLog2 = Log2(dim.h); const UINT_32 blkXMask = dim.w - 1; const UINT_32 blkYMask = dim.h - 1; ADDR_BIT_SETTING swizzle[ADDR_MAX_EQUATION_BIT]; UINT_32 xMask = 0; UINT_32 yMask = 0; UINT_32 bMask = (1 << elemLog2) - 1; for (UINT_32 i = elemLog2; i < blockSizeLog2; i++) { if (IsPow2(pSwizzle[i].value)) { if (pSwizzle[i].x != 0) { ADDR_ASSERT((xMask & pSwizzle[i].x) == 0); xMask |= pSwizzle[i].x; const UINT_32 xLog2 = Log2(pSwizzle[i].x); ADDR_ASSERT(xLog2 < blkXLog2); pEquation->addr[i].channel = 0; pEquation->addr[i].valid = 1; pEquation->addr[i].index = xLog2 + elemLog2; } else { ADDR_ASSERT(pSwizzle[i].y != 0); ADDR_ASSERT((yMask & pSwizzle[i].y) == 0); yMask |= pSwizzle[i].y; pEquation->addr[i].channel = 1; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(pSwizzle[i].y); ADDR_ASSERT(pEquation->addr[i].index < blkYLog2); } swizzle[i].value = 0; bMask |= 1 << i; } else { if (pSwizzle[i].z != 0) { ADDR_ASSERT(IsPow2(static_cast(pSwizzle[i].z))); pEquation->xor2[i].channel = 2; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(pSwizzle[i].z); } swizzle[i].x = pSwizzle[i].x; swizzle[i].y = pSwizzle[i].y; swizzle[i].z = swizzle[i].s = 0; ADDR_ASSERT(IsPow2(swizzle[i].value) == FALSE); const UINT_32 xHi = swizzle[i].x & (~blkXMask); if (xHi != 0) { ADDR_ASSERT(IsPow2(xHi)); ADDR_ASSERT(pEquation->xor1[i].value == 0); pEquation->xor1[i].channel = 0; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(xHi) + elemLog2; swizzle[i].x &= blkXMask; } const UINT_32 yHi = swizzle[i].y & (~blkYMask); if (yHi != 0) { ADDR_ASSERT(IsPow2(yHi)); if (xHi == 0) { ADDR_ASSERT(pEquation->xor1[i].value == 0); pEquation->xor1[i].channel = 1; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(yHi); } else { ADDR_ASSERT(pEquation->xor2[i].value == 0); pEquation->xor2[i].channel = 1; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(yHi); } swizzle[i].y &= blkYMask; } if (swizzle[i].value == 0) { bMask |= 1 << i; } } } const UINT_32 pipeIntMask = (1 << m_pipeInterleaveLog2) - 1; const UINT_32 blockMask = (1 << blockSizeLog2) - 1; ADDR_ASSERT((bMask & pipeIntMask) == pipeIntMask); while (bMask != blockMask) { for (UINT_32 i = m_pipeInterleaveLog2; i < blockSizeLog2; i++) { if ((bMask & (1 << i)) == 0) { if (IsPow2(swizzle[i].value)) { if (swizzle[i].x != 0) { ADDR_ASSERT((xMask & swizzle[i].x) == 0); xMask |= swizzle[i].x; const UINT_32 xLog2 = Log2(swizzle[i].x); ADDR_ASSERT(xLog2 < blkXLog2); pEquation->addr[i].channel = 0; pEquation->addr[i].valid = 1; pEquation->addr[i].index = xLog2 + elemLog2; } else { ADDR_ASSERT(swizzle[i].y != 0); ADDR_ASSERT((yMask & swizzle[i].y) == 0); yMask |= swizzle[i].y; pEquation->addr[i].channel = 1; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(swizzle[i].y); ADDR_ASSERT(pEquation->addr[i].index < blkYLog2); } swizzle[i].value = 0; bMask |= 1 << i; } else { const UINT_32 x = swizzle[i].x & xMask; const UINT_32 y = swizzle[i].y & yMask; if (x != 0) { ADDR_ASSERT(IsPow2(x)); if (pEquation->xor1[i].value == 0) { pEquation->xor1[i].channel = 0; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(x) + elemLog2; } else { ADDR_ASSERT(pEquation->xor2[i].value == 0); pEquation->xor2[i].channel = 0; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(x) + elemLog2; } } if (y != 0) { ADDR_ASSERT(IsPow2(y)); if (pEquation->xor1[i].value == 0) { pEquation->xor1[i].channel = 1; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(y); } else { ADDR_ASSERT(pEquation->xor2[i].value == 0); pEquation->xor2[i].channel = 1; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(y); } } swizzle[i].x &= ~x; swizzle[i].y &= ~y; } } } } ADDR_ASSERT((xMask == blkXMask) && (yMask == blkYMask)); } else { const UINT_32 blkXLog2 = (blockSizeLog2 == 12) ? Block4K_Log2_3d[elemLog2].w : Block64K_Log2_3d[elemLog2].w; const UINT_32 blkYLog2 = (blockSizeLog2 == 12) ? Block4K_Log2_3d[elemLog2].h : Block64K_Log2_3d[elemLog2].h; const UINT_32 blkZLog2 = (blockSizeLog2 == 12) ? Block4K_Log2_3d[elemLog2].d : Block64K_Log2_3d[elemLog2].d; const UINT_32 blkXMask = (1 << blkXLog2) - 1; const UINT_32 blkYMask = (1 << blkYLog2) - 1; const UINT_32 blkZMask = (1 << blkZLog2) - 1; ADDR_BIT_SETTING swizzle[ADDR_MAX_EQUATION_BIT]; UINT_32 xMask = 0; UINT_32 yMask = 0; UINT_32 zMask = 0; UINT_32 bMask = (1 << elemLog2) - 1; for (UINT_32 i = elemLog2; i < blockSizeLog2; i++) { if (IsPow2(pSwizzle[i].value)) { if (pSwizzle[i].x != 0) { ADDR_ASSERT((xMask & pSwizzle[i].x) == 0); xMask |= pSwizzle[i].x; const UINT_32 xLog2 = Log2(pSwizzle[i].x); ADDR_ASSERT(xLog2 < blkXLog2); pEquation->addr[i].channel = 0; pEquation->addr[i].valid = 1; pEquation->addr[i].index = xLog2 + elemLog2; } else if (pSwizzle[i].y != 0) { ADDR_ASSERT((yMask & pSwizzle[i].y) == 0); yMask |= pSwizzle[i].y; pEquation->addr[i].channel = 1; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(pSwizzle[i].y); ADDR_ASSERT(pEquation->addr[i].index < blkYLog2); } else { ADDR_ASSERT(pSwizzle[i].z != 0); ADDR_ASSERT((zMask & pSwizzle[i].z) == 0); zMask |= pSwizzle[i].z; pEquation->addr[i].channel = 2; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(pSwizzle[i].z); ADDR_ASSERT(pEquation->addr[i].index < blkZLog2); } swizzle[i].value = 0; bMask |= 1 << i; } else { swizzle[i].x = pSwizzle[i].x; swizzle[i].y = pSwizzle[i].y; swizzle[i].z = pSwizzle[i].z; swizzle[i].s = 0; ADDR_ASSERT(IsPow2(swizzle[i].value) == FALSE); const UINT_32 xHi = swizzle[i].x & (~blkXMask); const UINT_32 yHi = swizzle[i].y & (~blkYMask); const UINT_32 zHi = swizzle[i].z & (~blkZMask); ADDR_ASSERT((xHi == 0) || (yHi== 0) || (zHi == 0)); if (xHi != 0) { ADDR_ASSERT(IsPow2(xHi)); ADDR_ASSERT(pEquation->xor1[i].value == 0); pEquation->xor1[i].channel = 0; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(xHi) + elemLog2; swizzle[i].x &= blkXMask; } if (yHi != 0) { ADDR_ASSERT(IsPow2(yHi)); if (pEquation->xor1[i].value == 0) { pEquation->xor1[i].channel = 1; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(yHi); } else { ADDR_ASSERT(pEquation->xor2[i].value == 0); pEquation->xor2[i].channel = 1; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(yHi); } swizzle[i].y &= blkYMask; } if (zHi != 0) { ADDR_ASSERT(IsPow2(zHi)); if (pEquation->xor1[i].value == 0) { pEquation->xor1[i].channel = 2; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(zHi); } else { ADDR_ASSERT(pEquation->xor2[i].value == 0); pEquation->xor2[i].channel = 2; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(zHi); } swizzle[i].z &= blkZMask; } if (swizzle[i].value == 0) { bMask |= 1 << i; } } } const UINT_32 pipeIntMask = (1 << m_pipeInterleaveLog2) - 1; const UINT_32 blockMask = (1 << blockSizeLog2) - 1; ADDR_ASSERT((bMask & pipeIntMask) == pipeIntMask); while (bMask != blockMask) { for (UINT_32 i = m_pipeInterleaveLog2; i < blockSizeLog2; i++) { if ((bMask & (1 << i)) == 0) { if (IsPow2(swizzle[i].value)) { if (swizzle[i].x != 0) { ADDR_ASSERT((xMask & swizzle[i].x) == 0); xMask |= swizzle[i].x; const UINT_32 xLog2 = Log2(swizzle[i].x); ADDR_ASSERT(xLog2 < blkXLog2); pEquation->addr[i].channel = 0; pEquation->addr[i].valid = 1; pEquation->addr[i].index = xLog2 + elemLog2; } else if (swizzle[i].y != 0) { ADDR_ASSERT((yMask & swizzle[i].y) == 0); yMask |= swizzle[i].y; pEquation->addr[i].channel = 1; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(swizzle[i].y); ADDR_ASSERT(pEquation->addr[i].index < blkYLog2); } else { ADDR_ASSERT(swizzle[i].z != 0); ADDR_ASSERT((zMask & swizzle[i].z) == 0); zMask |= swizzle[i].z; pEquation->addr[i].channel = 2; pEquation->addr[i].valid = 1; pEquation->addr[i].index = Log2(swizzle[i].z); ADDR_ASSERT(pEquation->addr[i].index < blkZLog2); } swizzle[i].value = 0; bMask |= 1 << i; } else { const UINT_32 x = swizzle[i].x & xMask; const UINT_32 y = swizzle[i].y & yMask; const UINT_32 z = swizzle[i].z & zMask; if (x != 0) { ADDR_ASSERT(IsPow2(x)); if (pEquation->xor1[i].value == 0) { pEquation->xor1[i].channel = 0; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(x) + elemLog2; } else { ADDR_ASSERT(pEquation->xor2[i].value == 0); pEquation->xor2[i].channel = 0; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(x) + elemLog2; } } if (y != 0) { ADDR_ASSERT(IsPow2(y)); if (pEquation->xor1[i].value == 0) { pEquation->xor1[i].channel = 1; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(y); } else { ADDR_ASSERT(pEquation->xor2[i].value == 0); pEquation->xor2[i].channel = 1; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(y); } } if (z != 0) { ADDR_ASSERT(IsPow2(z)); if (pEquation->xor1[i].value == 0) { pEquation->xor1[i].channel = 2; pEquation->xor1[i].valid = 1; pEquation->xor1[i].index = Log2(z); } else { ADDR_ASSERT(pEquation->xor2[i].value == 0); pEquation->xor2[i].channel = 2; pEquation->xor2[i].valid = 1; pEquation->xor2[i].index = Log2(z); } } swizzle[i].x &= ~x; swizzle[i].y &= ~y; swizzle[i].z &= ~z; } } } } ADDR_ASSERT((xMask == blkXMask) && (yMask == blkYMask) && (zMask == blkZMask)); } } /** ************************************************************************************************************************ * Gfx10Lib::InitEquationTable * * @brief * Initialize Equation table. * * @return * N/A ************************************************************************************************************************ */ VOID Gfx10Lib::InitEquationTable() { memset(m_equationTable, 0, sizeof(m_equationTable)); for (UINT_32 rsrcTypeIdx = 0; rsrcTypeIdx < MaxRsrcType; rsrcTypeIdx++) { const AddrResourceType rsrcType = static_cast(rsrcTypeIdx + ADDR_RSRC_TEX_2D); for (UINT_32 swModeIdx = 0; swModeIdx < MaxSwModeType; swModeIdx++) { const AddrSwizzleMode swMode = static_cast(swModeIdx); for (UINT_32 elemLog2 = 0; elemLog2 < MaxElementBytesLog2; elemLog2++) { UINT_32 equationIndex = ADDR_INVALID_EQUATION_INDEX; const ADDR_SW_PATINFO* pPatInfo = GetSwizzlePatternInfo(swMode, rsrcType, elemLog2, 1); if (pPatInfo != NULL) { ADDR_ASSERT(IsValidSwMode(swMode)); if (pPatInfo->maxItemCount <= 3) { ADDR_EQUATION equation = {}; ConvertSwizzlePatternToEquation(elemLog2, rsrcType, swMode, pPatInfo, &equation); equationIndex = m_numEquations; ADDR_ASSERT(equationIndex < EquationTableSize); m_equationTable[equationIndex] = equation; m_numEquations++; } else { // We only see "ill" equation from 64/128 BPE + 3D resource + SW_64KB_D_X under RB+ case ADDR_ASSERT((elemLog2 == 3) || (elemLog2 == 4)); ADDR_ASSERT(rsrcTypeIdx == 1); ADDR_ASSERT(swMode == ADDR_SW_64KB_D_X); ADDR_ASSERT(m_settings.supportRbPlus == 1); } } m_equationLookupTable[rsrcTypeIdx][swModeIdx][elemLog2] = equationIndex; } } } } /** ************************************************************************************************************************ * Gfx10Lib::HwlGetEquationIndex * * @brief * Interface function stub of GetEquationIndex * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ UINT_32 Gfx10Lib::HwlGetEquationIndex( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { UINT_32 equationIdx = ADDR_INVALID_EQUATION_INDEX; if ((pIn->resourceType == ADDR_RSRC_TEX_2D) || (pIn->resourceType == ADDR_RSRC_TEX_3D)) { const UINT_32 rsrcTypeIdx = static_cast(pIn->resourceType) - 1; const UINT_32 swModeIdx = static_cast(pIn->swizzleMode); const UINT_32 elemLog2 = Log2(pIn->bpp >> 3); equationIdx = m_equationLookupTable[rsrcTypeIdx][swModeIdx][elemLog2]; } if (pOut->pMipInfo != NULL) { for (UINT_32 i = 0; i < pIn->numMipLevels; i++) { pOut->pMipInfo[i].equationIndex = equationIdx; } } return equationIdx; } /** ************************************************************************************************************************ * Gfx10Lib::IsValidDisplaySwizzleMode * * @brief * Check if a swizzle mode is supported by display engine * * @return * TRUE is swizzle mode is supported by display engine ************************************************************************************************************************ */ BOOL_32 Gfx10Lib::IsValidDisplaySwizzleMode( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn ///< [in] input structure ) const { ADDR_ASSERT(pIn->resourceType == ADDR_RSRC_TEX_2D); BOOL_32 support = FALSE; if (m_settings.isDcn2) { switch (pIn->swizzleMode) { case ADDR_SW_4KB_D: case ADDR_SW_4KB_D_X: case ADDR_SW_64KB_D: case ADDR_SW_64KB_D_T: case ADDR_SW_64KB_D_X: support = (pIn->bpp == 64); break; case ADDR_SW_LINEAR: case ADDR_SW_4KB_S: case ADDR_SW_4KB_S_X: case ADDR_SW_64KB_S: case ADDR_SW_64KB_S_T: case ADDR_SW_64KB_S_X: case ADDR_SW_64KB_R_X: support = (pIn->bpp <= 64); break; default: break; } } else { ADDR_NOT_IMPLEMENTED(); } return support; } /** ************************************************************************************************************************ * Gfx10Lib::GetMaxNumMipsInTail * * @brief * Return max number of mips in tails * * @return * Max number of mips in tails ************************************************************************************************************************ */ UINT_32 Gfx10Lib::GetMaxNumMipsInTail( UINT_32 blockSizeLog2, ///< block size log2 BOOL_32 isThin ///< is thin or thick ) const { UINT_32 effectiveLog2 = blockSizeLog2; if (isThin == FALSE) { effectiveLog2 -= (blockSizeLog2 - 8) / 3; } return (effectiveLog2 <= 11) ? (1 + (1 << (effectiveLog2 - 9))) : (effectiveLog2 - 4); } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputePipeBankXor * * @brief * Generate a PipeBankXor value to be ORed into bits above pipeInterleaveBits of address * * @return * PipeBankXor value ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputePipeBankXor( const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut ///< [out] output structure ) const { if (IsNonPrtXor(pIn->swizzleMode)) { const UINT_32 blockBits = GetBlockSizeLog2(pIn->swizzleMode); const UINT_32 pipeBits = GetPipeXorBits(blockBits); const UINT_32 bankBits = GetBankXorBits(blockBits); UINT_32 pipeXor = 0; UINT_32 bankXor = 0; if (bankBits != 0) { if (blockBits == 16) { const UINT_32 XorPatternLen = 8; static const UINT_32 XorBank1b[XorPatternLen] = {0x00, 0x80, 0x00, 0x80, 0x00, 0x80, 0x00, 0x80}; static const UINT_32 XorBank2b[XorPatternLen] = {0x00, 0x80, 0x40, 0xC0, 0x80, 0x00, 0xC0, 0x40}; static const UINT_32 XorBank3b[XorPatternLen] = {0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0}; const UINT_32 index = pIn->surfIndex % XorPatternLen; if (bankBits == 1) { bankXor = XorBank1b[index]; } else if (bankBits == 2) { bankXor = XorBank2b[index]; } else { bankXor = XorBank3b[index]; if (bankBits == 4) { bankXor >>= (2 - pipeBits); } } } } pOut->pipeBankXor = bankXor | pipeXor; } else { pOut->pipeBankXor = 0; } return ADDR_OK; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeSlicePipeBankXor * * @brief * Generate slice PipeBankXor value based on base PipeBankXor value and slice id * * @return * PipeBankXor value ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeSlicePipeBankXor( const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut ///< [out] output structure ) const { if (IsNonPrtXor(pIn->swizzleMode)) { const UINT_32 blockBits = GetBlockSizeLog2(pIn->swizzleMode); const UINT_32 pipeBits = GetPipeXorBits(blockBits); const UINT_32 pipeXor = ReverseBitVector(pIn->slice, pipeBits); pOut->pipeBankXor = pIn->basePipeBankXor ^ pipeXor; } else { pOut->pipeBankXor = 0; } return ADDR_OK; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeSubResourceOffsetForSwizzlePattern * * @brief * Compute sub resource offset to support swizzle pattern * * @return * Offset ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeSubResourceOffsetForSwizzlePattern( const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut ///< [out] output structure ) const { ADDR_ASSERT(IsThin(pIn->resourceType, pIn->swizzleMode)); pOut->offset = pIn->slice * pIn->sliceSize + pIn->macroBlockOffset; return ADDR_OK; } /** ************************************************************************************************************************ * Gfx10Lib::ValidateNonSwModeParams * * @brief * Validate compute surface info params except swizzle mode * * @return * TRUE if parameters are valid, FALSE otherwise ************************************************************************************************************************ */ BOOL_32 Gfx10Lib::ValidateNonSwModeParams( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { BOOL_32 valid = TRUE; if ((pIn->bpp == 0) || (pIn->bpp > 128) || (pIn->width == 0) || (pIn->numFrags > 8) || (pIn->numSamples > 16)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } if (pIn->resourceType >= ADDR_RSRC_MAX_TYPE) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } const ADDR2_SURFACE_FLAGS flags = pIn->flags; const AddrResourceType rsrcType = pIn->resourceType; const BOOL_32 mipmap = (pIn->numMipLevels > 1); const BOOL_32 msaa = (pIn->numFrags > 1); const BOOL_32 display = flags.display; const BOOL_32 tex3d = IsTex3d(rsrcType); const BOOL_32 tex2d = IsTex2d(rsrcType); const BOOL_32 tex1d = IsTex1d(rsrcType); const BOOL_32 stereo = flags.qbStereo; // Resource type check if (tex1d) { if (msaa || display || stereo) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (tex2d) { if ((msaa && mipmap) || (stereo && msaa) || (stereo && mipmap)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (tex3d) { if (msaa || display || stereo) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else { ADDR_ASSERT_ALWAYS(); valid = FALSE; } return valid; } /** ************************************************************************************************************************ * Gfx10Lib::ValidateSwModeParams * * @brief * Validate compute surface info related to swizzle mode * * @return * TRUE if parameters are valid, FALSE otherwise ************************************************************************************************************************ */ BOOL_32 Gfx10Lib::ValidateSwModeParams( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { BOOL_32 valid = TRUE; if ((pIn->swizzleMode >= ADDR_SW_MAX_TYPE) || (IsValidSwMode(pIn->swizzleMode) == FALSE)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } const ADDR2_SURFACE_FLAGS flags = pIn->flags; const AddrResourceType rsrcType = pIn->resourceType; const AddrSwizzleMode swizzle = pIn->swizzleMode; const BOOL_32 msaa = (pIn->numFrags > 1); const BOOL_32 zbuffer = flags.depth || flags.stencil; const BOOL_32 color = flags.color; const BOOL_32 display = flags.display; const BOOL_32 tex3d = IsTex3d(rsrcType); const BOOL_32 tex2d = IsTex2d(rsrcType); const BOOL_32 tex1d = IsTex1d(rsrcType); const BOOL_32 thin3d = flags.view3dAs2dArray; const BOOL_32 linear = IsLinear(swizzle); const BOOL_32 blk256B = IsBlock256b(swizzle); const BOOL_32 blkVar = IsBlockVariable(swizzle); const BOOL_32 isNonPrtXor = IsNonPrtXor(swizzle); const BOOL_32 prt = flags.prt; const BOOL_32 fmask = flags.fmask; // Misc check if ((pIn->numFrags > 1) && (GetBlockSize(swizzle) < (m_pipeInterleaveBytes * pIn->numFrags))) { // MSAA surface must have blk_bytes/pipe_interleave >= num_samples ADDR_ASSERT_ALWAYS(); valid = FALSE; } if (display && (IsValidDisplaySwizzleMode(pIn) == FALSE)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } if ((pIn->bpp == 96) && (linear == FALSE)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } const UINT_32 swizzleMask = 1 << swizzle; // Resource type check if (tex1d) { if ((swizzleMask & Gfx10Rsrc1dSwModeMask) == 0) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (tex2d) { if (((swizzleMask & Gfx10Rsrc2dSwModeMask) == 0) || (prt && ((swizzleMask & Gfx10Rsrc2dPrtSwModeMask) == 0)) || (fmask && ((swizzleMask & Gfx10ZSwModeMask) == 0))) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (tex3d) { if (((swizzleMask & Gfx10Rsrc3dSwModeMask) == 0) || (prt && ((swizzleMask & Gfx10Rsrc3dPrtSwModeMask) == 0)) || (thin3d && ((swizzleMask & Gfx10Rsrc3dThinSwModeMask) == 0))) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } // Swizzle type check if (linear) { if (zbuffer || msaa || (pIn->bpp == 0) || ((pIn->bpp % 8) != 0)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (IsZOrderSwizzle(swizzle)) { if ((pIn->bpp > 64) || (msaa && (color || (pIn->bpp > 32))) || ElemLib::IsBlockCompressed(pIn->format) || ElemLib::IsMacroPixelPacked(pIn->format)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (IsStandardSwizzle(rsrcType, swizzle)) { if (zbuffer || msaa) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (IsDisplaySwizzle(rsrcType, swizzle)) { if (zbuffer || msaa) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (IsRtOptSwizzle(swizzle)) { if (zbuffer) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else { ADDR_ASSERT_ALWAYS(); valid = FALSE; } // Block type check if (blk256B) { if (zbuffer || tex3d || msaa) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (blkVar) { if (m_blockVarSizeLog2 == 0) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } return valid; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeSurfaceInfoSanityCheck * * @brief * Compute surface info sanity check * * @return * Offset ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeSurfaceInfoSanityCheck( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn ///< [in] input structure ) const { return ValidateNonSwModeParams(pIn) && ValidateSwModeParams(pIn) ? ADDR_OK : ADDR_INVALIDPARAMS; } /** ************************************************************************************************************************ * Gfx10Lib::HwlGetPreferredSurfaceSetting * * @brief * Internal function to get suggested surface information for cliet to use * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlGetPreferredSurfaceSetting( const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ///< [in] input structure ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pIn->flags.fmask) { const BOOL_32 forbid64KbBlockType = pIn->forbiddenBlock.macroThin64KB ? TRUE : FALSE; const BOOL_32 forbidVarBlockType = ((m_blockVarSizeLog2 == 0) || (pIn->forbiddenBlock.var != 0)); if (forbid64KbBlockType && forbidVarBlockType) { // Invalid combination... ADDR_ASSERT_ALWAYS(); returnCode = ADDR_INVALIDPARAMS; } else { pOut->resourceType = ADDR_RSRC_TEX_2D; pOut->validBlockSet.value = 0; pOut->validBlockSet.macroThin64KB = forbid64KbBlockType ? 0 : 1; pOut->validBlockSet.var = forbidVarBlockType ? 0 : 1; pOut->validSwModeSet.value = 0; pOut->validSwModeSet.sw64KB_Z_X = forbid64KbBlockType ? 0 : 1; pOut->validSwModeSet.swVar_Z_X = forbidVarBlockType ? 0 : 1; pOut->canXor = TRUE; pOut->validSwTypeSet.value = AddrSwSetZ; pOut->clientPreferredSwSet = pOut->validSwTypeSet; BOOL_32 use64KbBlockType = (forbid64KbBlockType == FALSE); if ((forbid64KbBlockType == FALSE) && (forbidVarBlockType == FALSE)) { const UINT_8 maxFmaskSwizzleModeType = 2; const UINT_32 ratioLow = pIn->flags.minimizeAlign ? 1 : (pIn->flags.opt4space ? 3 : 2); const UINT_32 ratioHi = pIn->flags.minimizeAlign ? 1 : (pIn->flags.opt4space ? 2 : 1); const UINT_32 fmaskBpp = GetFmaskBpp(pIn->numSamples, pIn->numFrags); const UINT_32 numSlices = Max(pIn->numSlices, 1u); const UINT_32 width = Max(pIn->width, 1u); const UINT_32 height = Max(pIn->height, 1u); const UINT_64 sizeAlignInElement = Max(NextPow2(pIn->minSizeAlign) / (fmaskBpp >> 3), 1u); AddrSwizzleMode swMode[maxFmaskSwizzleModeType] = {ADDR_SW_64KB_Z_X, ADDR_SW_VAR_Z_X}; Dim3d blkDim[maxFmaskSwizzleModeType] = {{0}, {0}}; Dim3d padDim[maxFmaskSwizzleModeType] = {{0}, {0}}; UINT_64 padSize[maxFmaskSwizzleModeType] = {0}; for (UINT_8 i = 0; i < maxFmaskSwizzleModeType; i++) { ComputeBlockDimensionForSurf(&blkDim[i].w, &blkDim[i].h, &blkDim[i].d, fmaskBpp, 1, pOut->resourceType, swMode[i]); padSize[i] = ComputePadSize(&blkDim[i], width, height, numSlices, &padDim[i]); padSize[i] = PowTwoAlign(padSize[i], sizeAlignInElement); } if (GetBlockSizeLog2(swMode[1]) >= GetBlockSizeLog2(swMode[0])) { if ((padSize[1] * ratioHi) <= (padSize[0] * ratioLow)) { use64KbBlockType = FALSE; } } else { if ((padSize[1] * ratioLow) < (padSize[0] * ratioHi)) { use64KbBlockType = FALSE; } } } else if (forbidVarBlockType) { use64KbBlockType = TRUE; } if (use64KbBlockType) { pOut->swizzleMode = ADDR_SW_64KB_Z_X; } else { pOut->swizzleMode = ADDR_SW_VAR_Z_X; } } } else { UINT_32 bpp = pIn->bpp; UINT_32 width = Max(pIn->width, 1u); UINT_32 height = Max(pIn->height, 1u); // Set format to INVALID will skip this conversion if (pIn->format != ADDR_FMT_INVALID) { ElemMode elemMode = ADDR_UNCOMPRESSED; UINT_32 expandX, expandY; // Get compression/expansion factors and element mode which indicates compression/expansion bpp = GetElemLib()->GetBitsPerPixel(pIn->format, &elemMode, &expandX, &expandY); UINT_32 basePitch = 0; GetElemLib()->AdjustSurfaceInfo(elemMode, expandX, expandY, &bpp, &basePitch, &width, &height); } const UINT_32 numSlices = Max(pIn->numSlices, 1u); const UINT_32 numMipLevels = Max(pIn->numMipLevels, 1u); const UINT_32 numSamples = Max(pIn->numSamples, 1u); const UINT_32 numFrags = (pIn->numFrags == 0) ? numSamples : pIn->numFrags; const BOOL_32 msaa = (numFrags > 1) || (numSamples > 1); // Pre sanity check on non swizzle mode parameters ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = {}; localIn.flags = pIn->flags; localIn.resourceType = pIn->resourceType; localIn.format = pIn->format; localIn.bpp = bpp; localIn.width = width; localIn.height = height; localIn.numSlices = numSlices; localIn.numMipLevels = numMipLevels; localIn.numSamples = numSamples; localIn.numFrags = numFrags; if (ValidateNonSwModeParams(&localIn)) { // Forbid swizzle mode(s) by client setting ADDR2_SWMODE_SET allowedSwModeSet = {}; allowedSwModeSet.value |= pIn->forbiddenBlock.linear ? 0 : Gfx10LinearSwModeMask; allowedSwModeSet.value |= pIn->forbiddenBlock.micro ? 0 : Gfx10Blk256BSwModeMask; allowedSwModeSet.value |= pIn->forbiddenBlock.macroThin4KB ? 0 : ((pOut->resourceType == ADDR_RSRC_TEX_3D) ? 0 : Gfx10Blk4KBSwModeMask); allowedSwModeSet.value |= pIn->forbiddenBlock.macroThick4KB ? 0 : ((pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx10Rsrc3dThick4KBSwModeMask : 0); allowedSwModeSet.value |= pIn->forbiddenBlock.macroThin64KB ? 0 : ((pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx10Rsrc3dThin64KBSwModeMask : Gfx10Blk64KBSwModeMask); allowedSwModeSet.value |= pIn->forbiddenBlock.macroThick64KB ? 0 : ((pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx10Rsrc3dThick64KBSwModeMask : 0); allowedSwModeSet.value |= pIn->forbiddenBlock.var ? 0 : (m_blockVarSizeLog2 ? Gfx10BlkVarSwModeMask : 0); if (pIn->preferredSwSet.value != 0) { allowedSwModeSet.value &= pIn->preferredSwSet.sw_Z ? ~0 : ~Gfx10ZSwModeMask; allowedSwModeSet.value &= pIn->preferredSwSet.sw_S ? ~0 : ~Gfx10StandardSwModeMask; allowedSwModeSet.value &= pIn->preferredSwSet.sw_D ? ~0 : ~Gfx10DisplaySwModeMask; allowedSwModeSet.value &= pIn->preferredSwSet.sw_R ? ~0 : ~Gfx10RenderSwModeMask; } if (pIn->noXor) { allowedSwModeSet.value &= ~Gfx10XorSwModeMask; } if (pIn->maxAlign > 0) { if (pIn->maxAlign < (1u << m_blockVarSizeLog2)) { allowedSwModeSet.value &= ~Gfx10BlkVarSwModeMask; } if (pIn->maxAlign < Size64K) { allowedSwModeSet.value &= ~Gfx10Blk64KBSwModeMask; } if (pIn->maxAlign < Size4K) { allowedSwModeSet.value &= ~Gfx10Blk4KBSwModeMask; } if (pIn->maxAlign < Size256) { allowedSwModeSet.value &= ~Gfx10Blk256BSwModeMask; } } // Filter out invalid swizzle mode(s) by image attributes and HW restrictions switch (pIn->resourceType) { case ADDR_RSRC_TEX_1D: allowedSwModeSet.value &= Gfx10Rsrc1dSwModeMask; break; case ADDR_RSRC_TEX_2D: allowedSwModeSet.value &= pIn->flags.prt ? Gfx10Rsrc2dPrtSwModeMask : Gfx10Rsrc2dSwModeMask; break; case ADDR_RSRC_TEX_3D: allowedSwModeSet.value &= pIn->flags.prt ? Gfx10Rsrc3dPrtSwModeMask : Gfx10Rsrc3dSwModeMask; if (pIn->flags.view3dAs2dArray) { allowedSwModeSet.value &= Gfx10Rsrc3dThinSwModeMask; } break; default: ADDR_ASSERT_ALWAYS(); allowedSwModeSet.value = 0; break; } if (ElemLib::IsBlockCompressed(pIn->format) || ElemLib::IsMacroPixelPacked(pIn->format) || (bpp > 64) || (msaa && ((bpp > 32) || pIn->flags.color || pIn->flags.unordered))) { allowedSwModeSet.value &= ~Gfx10ZSwModeMask; } if (pIn->format == ADDR_FMT_32_32_32) { allowedSwModeSet.value &= Gfx10LinearSwModeMask; } if (msaa) { allowedSwModeSet.value &= Gfx10MsaaSwModeMask; } if (pIn->flags.depth || pIn->flags.stencil) { allowedSwModeSet.value &= Gfx10ZSwModeMask; } if (pIn->flags.display) { if (m_settings.isDcn2) { allowedSwModeSet.value &= (bpp == 64) ? Dcn2Bpp64SwModeMask : Dcn2NonBpp64SwModeMask; } else { ADDR_NOT_IMPLEMENTED(); } } if (allowedSwModeSet.value != 0) { #if DEBUG // Post sanity check, at least AddrLib should accept the output generated by its own UINT_32 validateSwModeSet = allowedSwModeSet.value; for (UINT_32 i = 0; validateSwModeSet != 0; i++) { if (validateSwModeSet & 1) { localIn.swizzleMode = static_cast(i); ADDR_ASSERT(ValidateSwModeParams(&localIn)); } validateSwModeSet >>= 1; } #endif pOut->resourceType = pIn->resourceType; pOut->validSwModeSet = allowedSwModeSet; pOut->canXor = (allowedSwModeSet.value & Gfx10XorSwModeMask) ? TRUE : FALSE; pOut->validBlockSet = GetAllowedBlockSet(allowedSwModeSet, pOut->resourceType); pOut->validSwTypeSet = GetAllowedSwSet(allowedSwModeSet); pOut->clientPreferredSwSet = pIn->preferredSwSet; if (pOut->clientPreferredSwSet.value == 0) { pOut->clientPreferredSwSet.value = AddrSwSetAll; } // Apply optional restrictions if ((pIn->flags.depth || pIn->flags.stencil) && msaa && m_configFlags.nonPower2MemConfig) { if ((allowedSwModeSet.value &= ~Gfx10BlkVarSwModeMask) != 0) { // MSAA depth in non power of 2 memory configs would suffer from non-local channel accesses from // the GL2 in VAR mode, so it should be avoided. allowedSwModeSet.value &= ~Gfx10BlkVarSwModeMask; } else { // We should still be able to use VAR for non power of 2 memory configs with MSAA z/stencil. // But we have to suffer from low performance because there is no other choice... ADDR_ASSERT_ALWAYS(); } } if (pIn->flags.needEquation) { FilterInvalidEqSwizzleMode(allowedSwModeSet, pIn->resourceType, Log2(bpp >> 3)); } if (allowedSwModeSet.value == Gfx10LinearSwModeMask) { pOut->swizzleMode = ADDR_SW_LINEAR; } else { // Always ignore linear swizzle mode if there is other choice. allowedSwModeSet.swLinear = 0; ADDR2_BLOCK_SET allowedBlockSet = GetAllowedBlockSet(allowedSwModeSet, pOut->resourceType); // Determine block size if there is 2 or more block type candidates if (IsPow2(allowedBlockSet.value) == FALSE) { AddrSwizzleMode swMode[AddrBlockMaxTiledType] = { ADDR_SW_LINEAR }; if (m_blockVarSizeLog2 != 0) { swMode[AddrBlockVar] = ADDR_SW_VAR_R_X; } if (pOut->resourceType == ADDR_RSRC_TEX_3D) { swMode[AddrBlockThick4KB] = ADDR_SW_4KB_S; swMode[AddrBlockThin64KB] = ADDR_SW_64KB_R_X; swMode[AddrBlockThick64KB] = ADDR_SW_64KB_S; } else { swMode[AddrBlockMicro] = ADDR_SW_256B_S; swMode[AddrBlockThin4KB] = ADDR_SW_4KB_S; swMode[AddrBlockThin64KB] = ADDR_SW_64KB_S; } Dim3d blkDim[AddrBlockMaxTiledType] = {{0}, {0}, {0}, {0}, {0}, {0}}; Dim3d padDim[AddrBlockMaxTiledType] = {{0}, {0}, {0}, {0}, {0}, {0}}; UINT_64 padSize[AddrBlockMaxTiledType] = {0}; const UINT_32 ratioLow = pIn->flags.minimizeAlign ? 1 : (pIn->flags.opt4space ? 3 : 2); const UINT_32 ratioHi = pIn->flags.minimizeAlign ? 1 : (pIn->flags.opt4space ? 2 : 1); const UINT_64 sizeAlignInElement = Max(NextPow2(pIn->minSizeAlign) / (bpp >> 3), 1u); UINT_32 minSizeBlk = AddrBlockMicro; UINT_64 minSize = 0; for (UINT_32 i = AddrBlockMicro; i < AddrBlockMaxTiledType; i++) { if (allowedBlockSet.value & (1 << i)) { ComputeBlockDimensionForSurf(&blkDim[i].w, &blkDim[i].h, &blkDim[i].d, bpp, numFrags, pOut->resourceType, swMode[i]); padSize[i] = ComputePadSize(&blkDim[i], width, height, numSlices, &padDim[i]); padSize[i] = PowTwoAlign(padSize[i] * numFrags, sizeAlignInElement); if (minSize == 0) { minSize = padSize[i]; minSizeBlk = i; } else { // Due to the fact that VAR block size = 16KB * m_pipes, it is possible that VAR // block size < 64KB. And ratio[Hi/Low] logic implicitly requires iterating from // smaller block type to bigger block type. So we have to correct comparing logic // according to the size of existing "minimun block" and size of coming/comparing // block. The new logic can also be useful to any future change about AddrBlockType. if (GetBlockSizeLog2(swMode[i]) >= GetBlockSizeLog2(swMode[minSizeBlk])) { if ((padSize[i] * ratioHi) <= (minSize * ratioLow)) { minSize = padSize[i]; minSizeBlk = i; } } else { if ((padSize[i] * ratioLow) < (minSize * ratioHi)) { minSize = padSize[i]; minSizeBlk = i; } } } } } if ((allowedBlockSet.micro == TRUE) && (width <= blkDim[AddrBlockMicro].w) && (height <= blkDim[AddrBlockMicro].h)) { minSizeBlk = AddrBlockMicro; } if (minSizeBlk == AddrBlockMicro) { ADDR_ASSERT(pOut->resourceType != ADDR_RSRC_TEX_3D); allowedSwModeSet.value &= Gfx10Blk256BSwModeMask; } else if (minSizeBlk == AddrBlockThick4KB) { ADDR_ASSERT(pOut->resourceType == ADDR_RSRC_TEX_3D); allowedSwModeSet.value &= Gfx10Rsrc3dThick4KBSwModeMask; } else if (minSizeBlk == AddrBlockThin4KB) { ADDR_ASSERT(pOut->resourceType != ADDR_RSRC_TEX_3D); allowedSwModeSet.value &= Gfx10Blk4KBSwModeMask; } else if (minSizeBlk == AddrBlockThick64KB) { ADDR_ASSERT(pOut->resourceType == ADDR_RSRC_TEX_3D); allowedSwModeSet.value &= Gfx10Rsrc3dThick64KBSwModeMask; } else if (minSizeBlk == AddrBlockThin64KB) { allowedSwModeSet.value &= (pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx10Rsrc3dThin64KBSwModeMask : Gfx10Blk64KBSwModeMask; } else { ADDR_ASSERT(minSizeBlk == AddrBlockVar); allowedSwModeSet.value &= Gfx10BlkVarSwModeMask; } } // Block type should be determined. ADDR_ASSERT(IsPow2(GetAllowedBlockSet(allowedSwModeSet, pOut->resourceType).value)); ADDR2_SWTYPE_SET allowedSwSet = GetAllowedSwSet(allowedSwModeSet); // Determine swizzle type if there is 2 or more swizzle type candidates if (IsPow2(allowedSwSet.value) == FALSE) { if (ElemLib::IsBlockCompressed(pIn->format)) { if (allowedSwSet.sw_D) { allowedSwModeSet.value &= Gfx10DisplaySwModeMask; } else if (allowedSwSet.sw_S) { allowedSwModeSet.value &= Gfx10StandardSwModeMask; } else { ADDR_ASSERT(allowedSwSet.sw_R); allowedSwModeSet.value &= Gfx10RenderSwModeMask; } } else if (ElemLib::IsMacroPixelPacked(pIn->format)) { if (allowedSwSet.sw_S) { allowedSwModeSet.value &= Gfx10StandardSwModeMask; } else if (allowedSwSet.sw_D) { allowedSwModeSet.value &= Gfx10DisplaySwModeMask; } else { ADDR_ASSERT(allowedSwSet.sw_R); allowedSwModeSet.value &= Gfx10RenderSwModeMask; } } else if (pIn->resourceType == ADDR_RSRC_TEX_3D) { if (pIn->flags.color && GetAllowedBlockSet(allowedSwModeSet, pOut->resourceType).macroThick64KB && allowedSwSet.sw_D) { allowedSwModeSet.value &= Gfx10DisplaySwModeMask; } else if (allowedSwSet.sw_S) { allowedSwModeSet.value &= Gfx10StandardSwModeMask; } else if (allowedSwSet.sw_R) { allowedSwModeSet.value &= Gfx10RenderSwModeMask; } else { ADDR_ASSERT(allowedSwSet.sw_Z); allowedSwModeSet.value &= Gfx10ZSwModeMask; } } else { if (allowedSwSet.sw_R) { allowedSwModeSet.value &= Gfx10RenderSwModeMask; } else if (allowedSwSet.sw_D) { allowedSwModeSet.value &= Gfx10DisplaySwModeMask; } else if (allowedSwSet.sw_S) { allowedSwModeSet.value &= Gfx10StandardSwModeMask; } else { ADDR_ASSERT(allowedSwSet.sw_Z); allowedSwModeSet.value &= Gfx10ZSwModeMask; } } } // Swizzle type should be determined. ADDR_ASSERT(IsPow2(GetAllowedSwSet(allowedSwModeSet).value)); // Determine swizzle mode now. Always select the "largest" swizzle mode for a given block type + // swizzle type combination. E.g, for AddrBlockThin64KB + ADDR_SW_S, select SW_64KB_S_X(25) if it's // available, or otherwise select SW_64KB_S_T(17) if it's available, or otherwise select SW_64KB_S(9). pOut->swizzleMode = static_cast(Log2NonPow2(allowedSwModeSet.value)); } } else { // Invalid combination... ADDR_ASSERT_ALWAYS(); returnCode = ADDR_INVALIDPARAMS; } } else { // Invalid combination... ADDR_ASSERT_ALWAYS(); returnCode = ADDR_INVALIDPARAMS; } } return returnCode; } /** ************************************************************************************************************************ * Gfx10Lib::ComputeStereoInfo * * @brief * Compute height alignment and right eye pipeBankXor for stereo surface * * @return * Error code * ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::ComputeStereoInfo( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< Compute surface info UINT_32 blkHeight, ///< Block height UINT_32* pAlignY, ///< Stereo requested additional alignment in Y UINT_32* pRightXor ///< Right eye xor ) const { ADDR_E_RETURNCODE ret = ADDR_OK; *pAlignY = 1; *pRightXor = 0; if (IsNonPrtXor(pIn->swizzleMode)) { const UINT_32 blkSizeLog2 = GetBlockSizeLog2(pIn->swizzleMode); const UINT_32 elemLog2 = Log2(pIn->bpp >> 3); const UINT_32 rsrcType = static_cast(pIn->resourceType) - 1; const UINT_32 swMode = static_cast(pIn->swizzleMode); const UINT_32 eqIndex = m_equationLookupTable[rsrcType][swMode][elemLog2]; if (eqIndex != ADDR_INVALID_EQUATION_INDEX) { UINT_32 yMax = 0; UINT_32 yPos = 0; for (UINT_32 i = m_pipeInterleaveLog2; i < blkSizeLog2; i++) { if (m_equationTable[eqIndex].xor1[i].value == 0) { break; } ADDR_ASSERT(m_equationTable[eqIndex].xor1[i].valid == 1); if ((m_equationTable[eqIndex].xor1[i].channel == 1) && (m_equationTable[eqIndex].xor1[i].index > yMax)) { yMax = m_equationTable[eqIndex].xor1[i].index; yPos = i; } } const UINT_32 additionalAlign = 1 << yMax; if (additionalAlign >= blkHeight) { *pAlignY *= (additionalAlign / blkHeight); const UINT_32 alignedHeight = PowTwoAlign(pIn->height, additionalAlign); if ((alignedHeight >> yMax) & 1) { *pRightXor = 1 << (yPos - m_pipeInterleaveLog2); } } } else { ret = ADDR_INVALIDPARAMS; } } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeSurfaceInfoTiled * * @brief * Internal function to calculate alignment for tiled surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeSurfaceInfoTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE ret; if (IsBlock256b(pIn->swizzleMode)) { ret = ComputeSurfaceInfoMicroTiled(pIn, pOut); } else { ret = ComputeSurfaceInfoMacroTiled(pIn, pOut); } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::ComputeSurfaceInfoMicroTiled * * @brief * Internal function to calculate alignment for micro tiled surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::ComputeSurfaceInfoMicroTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE ret = ComputeBlockDimensionForSurf(&pOut->blockWidth, &pOut->blockHeight, &pOut->blockSlices, pIn->bpp, pIn->numFrags, pIn->resourceType, pIn->swizzleMode); if (ret == ADDR_OK) { pOut->mipChainPitch = 0; pOut->mipChainHeight = 0; pOut->mipChainSlice = 0; pOut->epitchIsHeight = FALSE; pOut->mipChainInTail = FALSE; pOut->firstMipIdInTail = pIn->numMipLevels; const UINT_32 blockSize = GetBlockSize(pIn->swizzleMode); pOut->pitch = PowTwoAlign(pIn->width, pOut->blockWidth); pOut->height = PowTwoAlign(pIn->height, pOut->blockHeight); pOut->numSlices = pIn->numSlices; pOut->baseAlign = blockSize; if (pIn->numMipLevels > 1) { const UINT_32 mip0Width = pIn->width; const UINT_32 mip0Height = pIn->height; UINT_64 mipSliceSize = 0; for (INT_32 i = static_cast(pIn->numMipLevels) - 1; i >= 0; i--) { UINT_32 mipWidth, mipHeight; GetMipSize(mip0Width, mip0Height, 1, i, &mipWidth, &mipHeight); const UINT_32 mipActualWidth = PowTwoAlign(mipWidth, pOut->blockWidth); const UINT_32 mipActualHeight = PowTwoAlign(mipHeight, pOut->blockHeight); if (pOut->pMipInfo != NULL) { pOut->pMipInfo[i].pitch = mipActualWidth; pOut->pMipInfo[i].height = mipActualHeight; pOut->pMipInfo[i].depth = 1; pOut->pMipInfo[i].offset = mipSliceSize; pOut->pMipInfo[i].mipTailOffset = 0; pOut->pMipInfo[i].macroBlockOffset = mipSliceSize; } mipSliceSize += mipActualWidth * mipActualHeight * (pIn->bpp >> 3); } pOut->sliceSize = mipSliceSize; pOut->surfSize = mipSliceSize * pOut->numSlices; } else { pOut->sliceSize = static_cast(pOut->pitch) * pOut->height * (pIn->bpp >> 3); pOut->surfSize = pOut->sliceSize * pOut->numSlices; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[0].pitch = pOut->pitch; pOut->pMipInfo[0].height = pOut->height; pOut->pMipInfo[0].depth = 1; pOut->pMipInfo[0].offset = 0; pOut->pMipInfo[0].mipTailOffset = 0; pOut->pMipInfo[0].macroBlockOffset = 0; } } } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::ComputeSurfaceInfoMacroTiled * * @brief * Internal function to calculate alignment for macro tiled surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::ComputeSurfaceInfoMacroTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ComputeBlockDimensionForSurf(&pOut->blockWidth, &pOut->blockHeight, &pOut->blockSlices, pIn->bpp, pIn->numFrags, pIn->resourceType, pIn->swizzleMode); if (returnCode == ADDR_OK) { UINT_32 heightAlign = pOut->blockHeight; if (pIn->flags.qbStereo) { UINT_32 rightXor = 0; UINT_32 alignY = 1; returnCode = ComputeStereoInfo(pIn, heightAlign, &alignY, &rightXor); if (returnCode == ADDR_OK) { pOut->pStereoInfo->rightSwizzle = rightXor; heightAlign *= alignY; } } if (returnCode == ADDR_OK) { // Mip chain dimesion and epitch has no meaning in GFX10, set to default value pOut->mipChainPitch = 0; pOut->mipChainHeight = 0; pOut->mipChainSlice = 0; pOut->epitchIsHeight = FALSE; pOut->mipChainInTail = FALSE; pOut->firstMipIdInTail = pIn->numMipLevels; const UINT_32 blockSizeLog2 = GetBlockSizeLog2(pIn->swizzleMode); const UINT_32 blockSize = 1 << blockSizeLog2; pOut->pitch = PowTwoAlign(pIn->width, pOut->blockWidth); pOut->height = PowTwoAlign(pIn->height, heightAlign); pOut->numSlices = PowTwoAlign(pIn->numSlices, pOut->blockSlices); pOut->baseAlign = blockSize; if (pIn->numMipLevels > 1) { const Dim3d tailMaxDim = GetMipTailDim(pIn->resourceType, pIn->swizzleMode, pOut->blockWidth, pOut->blockHeight, pOut->blockSlices); const UINT_32 mip0Width = pIn->width; const UINT_32 mip0Height = pIn->height; const BOOL_32 isThin = IsThin(pIn->resourceType, pIn->swizzleMode); const UINT_32 mip0Depth = isThin ? 1 : pIn->numSlices; const UINT_32 maxMipsInTail = GetMaxNumMipsInTail(blockSizeLog2, isThin); const UINT_32 index = Log2(pIn->bpp >> 3); UINT_32 firstMipInTail = pIn->numMipLevels; UINT_64 mipChainSliceSize = 0; UINT_64 mipSize[MaxMipLevels]; UINT_64 mipSliceSize[MaxMipLevels]; Dim3d fixedTailMaxDim = tailMaxDim; if (m_settings.dsMipmapHtileFix && IsZOrderSwizzle(pIn->swizzleMode) && (index <= 1)) { fixedTailMaxDim.w /= Block256_2d[index].w / Block256_2d[2].w; fixedTailMaxDim.h /= Block256_2d[index].h / Block256_2d[2].h; } for (UINT_32 i = 0; i < pIn->numMipLevels; i++) { UINT_32 mipWidth, mipHeight, mipDepth; GetMipSize(mip0Width, mip0Height, mip0Depth, i, &mipWidth, &mipHeight, &mipDepth); if (IsInMipTail(fixedTailMaxDim, maxMipsInTail, mipWidth, mipHeight, pIn->numMipLevels - i)) { firstMipInTail = i; mipChainSliceSize += blockSize / pOut->blockSlices; break; } else { const UINT_32 pitch = PowTwoAlign(mipWidth, pOut->blockWidth); const UINT_32 height = PowTwoAlign(mipHeight, pOut->blockHeight); const UINT_32 depth = PowTwoAlign(mipDepth, pOut->blockSlices); const UINT_64 sliceSize = static_cast(pitch) * height * (pIn->bpp >> 3); mipSize[i] = sliceSize * depth; mipSliceSize[i] = sliceSize * pOut->blockSlices; mipChainSliceSize += sliceSize; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[i].pitch = pitch; pOut->pMipInfo[i].height = height; pOut->pMipInfo[i].depth = depth; } } } pOut->sliceSize = mipChainSliceSize; pOut->surfSize = mipChainSliceSize * pOut->numSlices; pOut->mipChainInTail = (firstMipInTail == 0) ? TRUE : FALSE; pOut->firstMipIdInTail = firstMipInTail; if (pOut->pMipInfo != NULL) { UINT_64 offset = 0; UINT_64 macroBlkOffset = 0; UINT_32 tailMaxDepth = 0; if (firstMipInTail != pIn->numMipLevels) { UINT_32 mipWidth, mipHeight; GetMipSize(mip0Width, mip0Height, mip0Depth, firstMipInTail, &mipWidth, &mipHeight, &tailMaxDepth); offset = blockSize * PowTwoAlign(tailMaxDepth, pOut->blockSlices) / pOut->blockSlices; macroBlkOffset = blockSize; } for (INT_32 i = firstMipInTail - 1; i >= 0; i--) { pOut->pMipInfo[i].offset = offset; pOut->pMipInfo[i].macroBlockOffset = macroBlkOffset; pOut->pMipInfo[i].mipTailOffset = 0; offset += mipSize[i]; macroBlkOffset += mipSliceSize[i]; } UINT_32 pitch = tailMaxDim.w; UINT_32 height = tailMaxDim.h; UINT_32 depth = isThin ? 1 : PowTwoAlign(tailMaxDepth, Block256_3d[index].d); tailMaxDepth = isThin ? 1 : (depth / Block256_3d[index].d); for (UINT_32 i = firstMipInTail; i < pIn->numMipLevels; i++) { const UINT_32 m = maxMipsInTail - 1 - (i - firstMipInTail); const UINT_32 mipOffset = (m > 6) ? (16 << m) : (m << 8); pOut->pMipInfo[i].offset = mipOffset * tailMaxDepth; pOut->pMipInfo[i].mipTailOffset = mipOffset; pOut->pMipInfo[i].macroBlockOffset = 0; pOut->pMipInfo[i].pitch = pitch; pOut->pMipInfo[i].height = height; pOut->pMipInfo[i].depth = depth; UINT_32 mipX = ((mipOffset >> 9) & 1) | ((mipOffset >> 10) & 2) | ((mipOffset >> 11) & 4) | ((mipOffset >> 12) & 8) | ((mipOffset >> 13) & 16) | ((mipOffset >> 14) & 32); UINT_32 mipY = ((mipOffset >> 8) & 1) | ((mipOffset >> 9) & 2) | ((mipOffset >> 10) & 4) | ((mipOffset >> 11) & 8) | ((mipOffset >> 12) & 16) | ((mipOffset >> 13) & 32); if (blockSizeLog2 & 1) { const UINT_32 temp = mipX; mipX = mipY; mipY = temp; if (index & 1) { mipY = (mipY << 1) | (mipX & 1); mipX = mipX >> 1; } } if (isThin) { pOut->pMipInfo[i].mipTailCoordX = mipX * Block256_2d[index].w; pOut->pMipInfo[i].mipTailCoordY = mipY * Block256_2d[index].h; pOut->pMipInfo[i].mipTailCoordZ = 0; pitch = Max(pitch >> 1, Block256_2d[index].w); height = Max(height >> 1, Block256_2d[index].h); depth = 1; } else { pOut->pMipInfo[i].mipTailCoordX = mipX * Block256_3d[index].w; pOut->pMipInfo[i].mipTailCoordY = mipY * Block256_3d[index].h; pOut->pMipInfo[i].mipTailCoordZ = 0; pitch = Max(pitch >> 1, Block256_3d[index].w); height = Max(height >> 1, Block256_3d[index].h); depth = PowTwoAlign(Max(depth >> 1, 1u), Block256_3d[index].d); } } } } else { pOut->sliceSize = static_cast(pOut->pitch) * pOut->height * (pIn->bpp >> 3) * pIn->numFrags; pOut->surfSize = pOut->sliceSize * pOut->numSlices; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[0].pitch = pOut->pitch; pOut->pMipInfo[0].height = pOut->height; pOut->pMipInfo[0].depth = IsTex3d(pIn->resourceType)? pOut->numSlices : 1; pOut->pMipInfo[0].offset = 0; pOut->pMipInfo[0].mipTailOffset = 0; pOut->pMipInfo[0].macroBlockOffset = 0; pOut->pMipInfo[0].mipTailCoordX = 0; pOut->pMipInfo[0].mipTailCoordY = 0; pOut->pMipInfo[0].mipTailCoordZ = 0; } } } } return returnCode; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeSurfaceAddrFromCoordTiled * * @brief * Internal function to calculate address from coord for tiled swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeSurfaceAddrFromCoordTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE ret; if (IsBlock256b(pIn->swizzleMode)) { ret = ComputeSurfaceAddrFromCoordMicroTiled(pIn, pOut); } else { ret = ComputeSurfaceAddrFromCoordMacroTiled(pIn, pOut); } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::ComputeOffsetFromEquation * * @brief * Compute offset from equation * * @return * Offset ************************************************************************************************************************ */ UINT_32 Gfx10Lib::ComputeOffsetFromEquation( const ADDR_EQUATION* pEq, ///< Equation UINT_32 x, ///< x coord in bytes UINT_32 y, ///< y coord in pixel UINT_32 z ///< z coord in slice ) const { UINT_32 offset = 0; for (UINT_32 i = 0; i < pEq->numBits; i++) { UINT_32 v = 0; if (pEq->addr[i].valid) { if (pEq->addr[i].channel == 0) { v ^= (x >> pEq->addr[i].index) & 1; } else if (pEq->addr[i].channel == 1) { v ^= (y >> pEq->addr[i].index) & 1; } else { ADDR_ASSERT(pEq->addr[i].channel == 2); v ^= (z >> pEq->addr[i].index) & 1; } } if (pEq->xor1[i].valid) { if (pEq->xor1[i].channel == 0) { v ^= (x >> pEq->xor1[i].index) & 1; } else if (pEq->xor1[i].channel == 1) { v ^= (y >> pEq->xor1[i].index) & 1; } else { ADDR_ASSERT(pEq->xor1[i].channel == 2); v ^= (z >> pEq->xor1[i].index) & 1; } } if (pEq->xor2[i].valid) { if (pEq->xor2[i].channel == 0) { v ^= (x >> pEq->xor2[i].index) & 1; } else if (pEq->xor2[i].channel == 1) { v ^= (y >> pEq->xor2[i].index) & 1; } else { ADDR_ASSERT(pEq->xor2[i].channel == 2); v ^= (z >> pEq->xor2[i].index) & 1; } } offset |= (v << i); } return offset; } /** ************************************************************************************************************************ * Gfx10Lib::ComputeOffsetFromSwizzlePattern * * @brief * Compute offset from swizzle pattern * * @return * Offset ************************************************************************************************************************ */ UINT_32 Gfx10Lib::ComputeOffsetFromSwizzlePattern( const UINT_64* pPattern, ///< Swizzle pattern UINT_32 numBits, ///< Number of bits in pattern UINT_32 x, ///< x coord in pixel UINT_32 y, ///< y coord in pixel UINT_32 z, ///< z coord in slice UINT_32 s ///< sample id ) const { UINT_32 offset = 0; const ADDR_BIT_SETTING* pSwizzlePattern = reinterpret_cast(pPattern); for (UINT_32 i = 0; i < numBits; i++) { UINT_32 v = 0; if (pSwizzlePattern[i].x != 0) { UINT_16 mask = pSwizzlePattern[i].x; UINT_32 xBits = x; while (mask != 0) { if (mask & 1) { v ^= xBits & 1; } xBits >>= 1; mask >>= 1; } } if (pSwizzlePattern[i].y != 0) { UINT_16 mask = pSwizzlePattern[i].y; UINT_32 yBits = y; while (mask != 0) { if (mask & 1) { v ^= yBits & 1; } yBits >>= 1; mask >>= 1; } } if (pSwizzlePattern[i].z != 0) { UINT_16 mask = pSwizzlePattern[i].z; UINT_32 zBits = z; while (mask != 0) { if (mask & 1) { v ^= zBits & 1; } zBits >>= 1; mask >>= 1; } } if (pSwizzlePattern[i].s != 0) { UINT_16 mask = pSwizzlePattern[i].s; UINT_32 sBits = s; while (mask != 0) { if (mask & 1) { v ^= sBits & 1; } sBits >>= 1; mask >>= 1; } } offset |= (v << i); } return offset; } /** ************************************************************************************************************************ * Gfx10Lib::GetSwizzlePatternInfo * * @brief * Get swizzle pattern * * @return * Swizzle pattern information ************************************************************************************************************************ */ const ADDR_SW_PATINFO* Gfx10Lib::GetSwizzlePatternInfo( AddrSwizzleMode swizzleMode, ///< Swizzle mode AddrResourceType resourceType, ///< Resource type UINT_32 elemLog2, ///< Element size in bytes log2 UINT_32 numFrag ///< Number of fragment ) const { const UINT_32 index = IsXor(swizzleMode) ? (m_colorBaseIndex + elemLog2) : elemLog2; const ADDR_SW_PATINFO* patInfo = NULL; const UINT_32 swizzleMask = 1 << swizzleMode; if (IsLinear(swizzleMode) == FALSE) { if (IsBlockVariable(swizzleMode)) { if (m_blockVarSizeLog2 != 0) { ADDR_ASSERT(m_settings.supportRbPlus); if (IsRtOptSwizzle(swizzleMode)) { if (numFrag == 1) { patInfo = SW_VAR_R_X_1xaa_RBPLUS_PATINFO; } else if (numFrag == 2) { patInfo = SW_VAR_R_X_2xaa_RBPLUS_PATINFO; } else if (numFrag == 4) { patInfo = SW_VAR_R_X_4xaa_RBPLUS_PATINFO; } else { ADDR_ASSERT(numFrag == 8); patInfo = SW_VAR_R_X_8xaa_RBPLUS_PATINFO; } } else if (IsZOrderSwizzle(swizzleMode)) { if (numFrag == 1) { patInfo = SW_VAR_Z_X_1xaa_RBPLUS_PATINFO; } else if (numFrag == 2) { patInfo = SW_VAR_Z_X_2xaa_RBPLUS_PATINFO; } else if (numFrag == 4) { patInfo = SW_VAR_Z_X_4xaa_RBPLUS_PATINFO; } else { ADDR_ASSERT(numFrag == 8); patInfo = SW_VAR_Z_X_8xaa_RBPLUS_PATINFO; } } } } else if (resourceType == ADDR_RSRC_TEX_3D) { ADDR_ASSERT(numFrag == 1); if ((swizzleMask & Gfx10Rsrc3dSwModeMask) != 0) { if (IsRtOptSwizzle(swizzleMode)) { patInfo = m_settings.supportRbPlus ? SW_64K_R_X_1xaa_RBPLUS_PATINFO : SW_64K_R_X_1xaa_PATINFO; } else if (IsZOrderSwizzle(swizzleMode)) { patInfo = m_settings.supportRbPlus ? SW_64K_Z_X_1xaa_RBPLUS_PATINFO : SW_64K_Z_X_1xaa_PATINFO; } else if (IsDisplaySwizzle(resourceType, swizzleMode)) { ADDR_ASSERT(swizzleMode == ADDR_SW_64KB_D_X); patInfo = m_settings.supportRbPlus ? SW_64K_D3_X_RBPLUS_PATINFO : SW_64K_D3_X_PATINFO; } else { ADDR_ASSERT(IsStandardSwizzle(resourceType, swizzleMode)); if (IsBlock4kb(swizzleMode)) { if (swizzleMode == ADDR_SW_4KB_S) { patInfo = m_settings.supportRbPlus ? SW_4K_S3_RBPLUS_PATINFO : SW_4K_S3_PATINFO; } else { ADDR_ASSERT(swizzleMode == ADDR_SW_4KB_S_X); patInfo = m_settings.supportRbPlus ? SW_4K_S3_X_RBPLUS_PATINFO : SW_4K_S3_X_PATINFO; } } else { if (swizzleMode == ADDR_SW_64KB_S) { patInfo = m_settings.supportRbPlus ? SW_64K_S3_RBPLUS_PATINFO : SW_64K_S3_PATINFO; } else if (swizzleMode == ADDR_SW_64KB_S_X) { patInfo = m_settings.supportRbPlus ? SW_64K_S3_X_RBPLUS_PATINFO : SW_64K_S3_X_PATINFO; } else { ADDR_ASSERT(swizzleMode == ADDR_SW_64KB_S_T); patInfo = m_settings.supportRbPlus ? SW_64K_S3_T_RBPLUS_PATINFO : SW_64K_S3_T_PATINFO; } } } } } else { if ((swizzleMask & Gfx10Rsrc2dSwModeMask) != 0) { if (IsBlock256b(swizzleMode)) { if (swizzleMode == ADDR_SW_256B_S) { patInfo = m_settings.supportRbPlus ? SW_256_S_RBPLUS_PATINFO : SW_256_S_PATINFO; } else { ADDR_ASSERT(swizzleMode == ADDR_SW_256B_D); patInfo = m_settings.supportRbPlus ? SW_256_D_RBPLUS_PATINFO : SW_256_D_PATINFO; } } else if (IsBlock4kb(swizzleMode)) { if (IsStandardSwizzle(resourceType, swizzleMode)) { if (swizzleMode == ADDR_SW_4KB_S) { patInfo = m_settings.supportRbPlus ? SW_4K_S_RBPLUS_PATINFO : SW_4K_S_PATINFO; } else { ADDR_ASSERT(swizzleMode == ADDR_SW_4KB_S_X); patInfo = m_settings.supportRbPlus ? SW_4K_S_X_RBPLUS_PATINFO : SW_4K_S_X_PATINFO; } } else { if (swizzleMode == ADDR_SW_4KB_D) { patInfo = m_settings.supportRbPlus ? SW_4K_D_RBPLUS_PATINFO : SW_4K_D_PATINFO; } else { ADDR_ASSERT(swizzleMode == ADDR_SW_4KB_D_X); patInfo = m_settings.supportRbPlus ? SW_4K_D_X_RBPLUS_PATINFO : SW_4K_D_X_PATINFO; } } } else { if (IsRtOptSwizzle(swizzleMode)) { if (numFrag == 1) { patInfo = m_settings.supportRbPlus ? SW_64K_R_X_1xaa_RBPLUS_PATINFO : SW_64K_R_X_1xaa_PATINFO; } else if (numFrag == 2) { patInfo = m_settings.supportRbPlus ? SW_64K_R_X_2xaa_RBPLUS_PATINFO : SW_64K_R_X_2xaa_PATINFO; } else if (numFrag == 4) { patInfo = m_settings.supportRbPlus ? SW_64K_R_X_4xaa_RBPLUS_PATINFO : SW_64K_R_X_4xaa_PATINFO; } else { ADDR_ASSERT(numFrag == 8); patInfo = m_settings.supportRbPlus ? SW_64K_R_X_8xaa_RBPLUS_PATINFO : SW_64K_R_X_8xaa_PATINFO; } } else if (IsZOrderSwizzle(swizzleMode)) { if (numFrag == 1) { patInfo = m_settings.supportRbPlus ? SW_64K_Z_X_1xaa_RBPLUS_PATINFO : SW_64K_Z_X_1xaa_PATINFO; } else if (numFrag == 2) { patInfo = m_settings.supportRbPlus ? SW_64K_Z_X_2xaa_RBPLUS_PATINFO : SW_64K_Z_X_2xaa_PATINFO; } else if (numFrag == 4) { patInfo = m_settings.supportRbPlus ? SW_64K_Z_X_4xaa_RBPLUS_PATINFO : SW_64K_Z_X_4xaa_PATINFO; } else { ADDR_ASSERT(numFrag == 8); patInfo = m_settings.supportRbPlus ? SW_64K_Z_X_8xaa_RBPLUS_PATINFO : SW_64K_Z_X_8xaa_PATINFO; } } else if (IsDisplaySwizzle(resourceType, swizzleMode)) { if (swizzleMode == ADDR_SW_64KB_D) { patInfo = m_settings.supportRbPlus ? SW_64K_D_RBPLUS_PATINFO : SW_64K_D_PATINFO; } else if (swizzleMode == ADDR_SW_64KB_D_X) { patInfo = m_settings.supportRbPlus ? SW_64K_D_X_RBPLUS_PATINFO : SW_64K_D_X_PATINFO; } else { ADDR_ASSERT(swizzleMode == ADDR_SW_64KB_D_T); patInfo = m_settings.supportRbPlus ? SW_64K_D_T_RBPLUS_PATINFO : SW_64K_D_T_PATINFO; } } else { if (swizzleMode == ADDR_SW_64KB_S) { patInfo = m_settings.supportRbPlus ? SW_64K_S_RBPLUS_PATINFO : SW_64K_S_PATINFO; } else if (swizzleMode == ADDR_SW_64KB_S_X) { patInfo = m_settings.supportRbPlus ? SW_64K_S_X_RBPLUS_PATINFO : SW_64K_S_X_PATINFO; } else { ADDR_ASSERT(swizzleMode == ADDR_SW_64KB_S_T); patInfo = m_settings.supportRbPlus ? SW_64K_S_T_RBPLUS_PATINFO : SW_64K_S_T_PATINFO; } } } } } } return (patInfo != NULL) ? &patInfo[index] : NULL; } /** ************************************************************************************************************************ * Gfx10Lib::ComputeSurfaceAddrFromCoordMicroTiled * * @brief * Internal function to calculate address from coord for micro tiled swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::ComputeSurfaceAddrFromCoordMicroTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = {0}; ADDR2_COMPUTE_SURFACE_INFO_OUTPUT localOut = {0}; ADDR2_MIP_INFO mipInfo[MaxMipLevels]; localIn.swizzleMode = pIn->swizzleMode; localIn.flags = pIn->flags; localIn.resourceType = pIn->resourceType; localIn.bpp = pIn->bpp; localIn.width = Max(pIn->unalignedWidth, 1u); localIn.height = Max(pIn->unalignedHeight, 1u); localIn.numSlices = Max(pIn->numSlices, 1u); localIn.numMipLevels = Max(pIn->numMipLevels, 1u); localIn.numSamples = Max(pIn->numSamples, 1u); localIn.numFrags = Max(pIn->numFrags, 1u); localOut.pMipInfo = mipInfo; ADDR_E_RETURNCODE ret = ComputeSurfaceInfoMicroTiled(&localIn, &localOut); if (ret == ADDR_OK) { const UINT_32 elemLog2 = Log2(pIn->bpp >> 3); const UINT_32 rsrcType = static_cast(pIn->resourceType) - 1; const UINT_32 swMode = static_cast(pIn->swizzleMode); const UINT_32 eqIndex = m_equationLookupTable[rsrcType][swMode][elemLog2]; if (eqIndex != ADDR_INVALID_EQUATION_INDEX) { const UINT_32 pb = mipInfo[pIn->mipId].pitch / localOut.blockWidth; const UINT_32 yb = pIn->y / localOut.blockHeight; const UINT_32 xb = pIn->x / localOut.blockWidth; const UINT_32 blockIndex = yb * pb + xb; const UINT_32 blockSize = 256; const UINT_32 blk256Offset = ComputeOffsetFromEquation(&m_equationTable[eqIndex], pIn->x << elemLog2, pIn->y, 0); pOut->addr = localOut.sliceSize * pIn->slice + mipInfo[pIn->mipId].macroBlockOffset + (blockIndex * blockSize) + blk256Offset; } else { ret = ADDR_INVALIDPARAMS; } } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::ComputeSurfaceAddrFromCoordMacroTiled * * @brief * Internal function to calculate address from coord for macro tiled swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::ComputeSurfaceAddrFromCoordMacroTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = {0}; ADDR2_COMPUTE_SURFACE_INFO_OUTPUT localOut = {0}; ADDR2_MIP_INFO mipInfo[MaxMipLevels]; localIn.swizzleMode = pIn->swizzleMode; localIn.flags = pIn->flags; localIn.resourceType = pIn->resourceType; localIn.bpp = pIn->bpp; localIn.width = Max(pIn->unalignedWidth, 1u); localIn.height = Max(pIn->unalignedHeight, 1u); localIn.numSlices = Max(pIn->numSlices, 1u); localIn.numMipLevels = Max(pIn->numMipLevels, 1u); localIn.numSamples = Max(pIn->numSamples, 1u); localIn.numFrags = Max(pIn->numFrags, 1u); localOut.pMipInfo = mipInfo; ADDR_E_RETURNCODE ret = ComputeSurfaceInfoMacroTiled(&localIn, &localOut); if (ret == ADDR_OK) { const UINT_32 elemLog2 = Log2(pIn->bpp >> 3); const UINT_32 blkSizeLog2 = GetBlockSizeLog2(pIn->swizzleMode); const UINT_32 blkMask = (1 << blkSizeLog2) - 1; const UINT_32 pipeMask = (1 << m_pipesLog2) - 1; const UINT_32 bankMask = ((1 << GetBankXorBits(blkSizeLog2)) - 1) << (m_pipesLog2 + ColumnBits); const UINT_32 pipeBankXor = IsXor(pIn->swizzleMode) ? (((pIn->pipeBankXor & (pipeMask | bankMask)) << m_pipeInterleaveLog2) & blkMask) : 0; if (localIn.numFrags > 1) { const ADDR_SW_PATINFO* pPatInfo = GetSwizzlePatternInfo(pIn->swizzleMode, pIn->resourceType, elemLog2, localIn.numFrags); if (pPatInfo != NULL) { const UINT_32 pb = localOut.pitch / localOut.blockWidth; const UINT_32 yb = pIn->y / localOut.blockHeight; const UINT_32 xb = pIn->x / localOut.blockWidth; const UINT_64 blkIdx = yb * pb + xb; ADDR_BIT_SETTING fullSwizzlePattern[20]; GetSwizzlePatternFromPatternInfo(pPatInfo, fullSwizzlePattern); const UINT_32 blkOffset = ComputeOffsetFromSwizzlePattern(reinterpret_cast(fullSwizzlePattern), blkSizeLog2, pIn->x, pIn->y, pIn->slice, pIn->sample); pOut->addr = (localOut.sliceSize * pIn->slice) + (blkIdx << blkSizeLog2) + (blkOffset ^ pipeBankXor); } else { ret = ADDR_INVALIDPARAMS; } } else { const UINT_32 rsrcIdx = (pIn->resourceType == ADDR_RSRC_TEX_3D) ? 1 : 0; const UINT_32 swMode = static_cast(pIn->swizzleMode); const UINT_32 eqIndex = m_equationLookupTable[rsrcIdx][swMode][elemLog2]; if (eqIndex != ADDR_INVALID_EQUATION_INDEX) { const BOOL_32 inTail = (mipInfo[pIn->mipId].mipTailOffset != 0) ? TRUE : FALSE; const BOOL_32 isThin = IsThin(pIn->resourceType, pIn->swizzleMode); const UINT_64 sliceSize = isThin ? localOut.sliceSize : (localOut.sliceSize * localOut.blockSlices); const UINT_32 sliceId = isThin ? pIn->slice : (pIn->slice / localOut.blockSlices); const UINT_32 x = inTail ? (pIn->x + mipInfo[pIn->mipId].mipTailCoordX) : pIn->x; const UINT_32 y = inTail ? (pIn->y + mipInfo[pIn->mipId].mipTailCoordY) : pIn->y; const UINT_32 z = inTail ? (pIn->slice + mipInfo[pIn->mipId].mipTailCoordZ) : pIn->slice; const UINT_32 pb = mipInfo[pIn->mipId].pitch / localOut.blockWidth; const UINT_32 yb = pIn->y / localOut.blockHeight; const UINT_32 xb = pIn->x / localOut.blockWidth; const UINT_64 blkIdx = yb * pb + xb; const UINT_32 blkOffset = ComputeOffsetFromEquation(&m_equationTable[eqIndex], x << elemLog2, y, z); pOut->addr = sliceSize * sliceId + mipInfo[pIn->mipId].macroBlockOffset + (blkIdx << blkSizeLog2) + (blkOffset ^ pipeBankXor); } else { ret = ADDR_INVALIDPARAMS; } } } return ret; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeMaxBaseAlignments * * @brief * Gets maximum alignments * @return * maximum alignments ************************************************************************************************************************ */ UINT_32 Gfx10Lib::HwlComputeMaxBaseAlignments() const { return m_blockVarSizeLog2 ? Max(Size64K, 1u << m_blockVarSizeLog2) : Size64K; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeMaxMetaBaseAlignments * * @brief * Gets maximum alignments for metadata * @return * maximum alignments for metadata ************************************************************************************************************************ */ UINT_32 Gfx10Lib::HwlComputeMaxMetaBaseAlignments() const { Dim3d metaBlk; const AddrSwizzleMode ValidSwizzleModeForXmask[] = { ADDR_SW_64KB_Z_X, m_blockVarSizeLog2 ? ADDR_SW_VAR_Z_X : ADDR_SW_64KB_Z_X, }; UINT_32 maxBaseAlignHtile = 0; UINT_32 maxBaseAlignCmask = 0; for (UINT_32 swIdx = 0; swIdx < sizeof(ValidSwizzleModeForXmask) / sizeof(ValidSwizzleModeForXmask[0]); swIdx++) { for (UINT_32 bppLog2 = 0; bppLog2 < 3; bppLog2++) { for (UINT_32 numFragLog2 = 0; numFragLog2 < 4; numFragLog2++) { // Max base alignment for Htile const UINT_32 metaBlkSizeHtile = GetMetaBlkSize(Gfx10DataDepthStencil, ADDR_RSRC_TEX_2D, ValidSwizzleModeForXmask[swIdx], bppLog2, numFragLog2, TRUE, &metaBlk); maxBaseAlignHtile = Max(maxBaseAlignHtile, metaBlkSizeHtile); } } // Max base alignment for Cmask const UINT_32 metaBlkSizeCmask = GetMetaBlkSize(Gfx10DataFmask, ADDR_RSRC_TEX_2D, ValidSwizzleModeForXmask[swIdx], 0, 0, TRUE, &metaBlk); maxBaseAlignCmask = Max(maxBaseAlignCmask, metaBlkSizeCmask); } // Max base alignment for 2D Dcc const AddrSwizzleMode ValidSwizzleModeForDcc2D[] = { ADDR_SW_64KB_S_X, ADDR_SW_64KB_D_X, ADDR_SW_64KB_R_X, m_blockVarSizeLog2 ? ADDR_SW_VAR_R_X : ADDR_SW_64KB_R_X, }; UINT_32 maxBaseAlignDcc2D = 0; for (UINT_32 swIdx = 0; swIdx < sizeof(ValidSwizzleModeForDcc2D) / sizeof(ValidSwizzleModeForDcc2D[0]); swIdx++) { for (UINT_32 bppLog2 = 0; bppLog2 < MaxNumOfBpp; bppLog2++) { for (UINT_32 numFragLog2 = 0; numFragLog2 < 4; numFragLog2++) { const UINT_32 metaBlkSize2D = GetMetaBlkSize(Gfx10DataColor, ADDR_RSRC_TEX_2D, ValidSwizzleModeForDcc2D[swIdx], bppLog2, numFragLog2, TRUE, &metaBlk); maxBaseAlignDcc2D = Max(maxBaseAlignDcc2D, metaBlkSize2D); } } } // Max base alignment for 3D Dcc const AddrSwizzleMode ValidSwizzleModeForDcc3D[] = { ADDR_SW_64KB_Z_X, ADDR_SW_64KB_S_X, ADDR_SW_64KB_D_X, ADDR_SW_64KB_R_X, m_blockVarSizeLog2 ? ADDR_SW_VAR_R_X : ADDR_SW_64KB_R_X, }; UINT_32 maxBaseAlignDcc3D = 0; for (UINT_32 swIdx = 0; swIdx < sizeof(ValidSwizzleModeForDcc3D) / sizeof(ValidSwizzleModeForDcc3D[0]); swIdx++) { for (UINT_32 bppLog2 = 0; bppLog2 < MaxNumOfBpp; bppLog2++) { const UINT_32 metaBlkSize3D = GetMetaBlkSize(Gfx10DataColor, ADDR_RSRC_TEX_3D, ValidSwizzleModeForDcc3D[swIdx], bppLog2, 0, TRUE, &metaBlk); maxBaseAlignDcc3D = Max(maxBaseAlignDcc3D, metaBlkSize3D); } } return Max(Max(maxBaseAlignHtile, maxBaseAlignCmask), Max(maxBaseAlignDcc2D, maxBaseAlignDcc3D)); } /** ************************************************************************************************************************ * Gfx10Lib::GetMetaElementSizeLog2 * * @brief * Gets meta data element size log2 * @return * Meta data element size log2 ************************************************************************************************************************ */ INT_32 Gfx10Lib::GetMetaElementSizeLog2( Gfx10DataType dataType) ///< Data surface type { INT_32 elemSizeLog2 = 0; if (dataType == Gfx10DataColor) { elemSizeLog2 = 0; } else if (dataType == Gfx10DataDepthStencil) { elemSizeLog2 = 2; } else { ADDR_ASSERT(dataType == Gfx10DataFmask); elemSizeLog2 = -1; } return elemSizeLog2; } /** ************************************************************************************************************************ * Gfx10Lib::GetMetaCacheSizeLog2 * * @brief * Gets meta data cache line size log2 * @return * Meta data cache line size log2 ************************************************************************************************************************ */ INT_32 Gfx10Lib::GetMetaCacheSizeLog2( Gfx10DataType dataType) ///< Data surface type { INT_32 cacheSizeLog2 = 0; if (dataType == Gfx10DataColor) { cacheSizeLog2 = 6; } else if (dataType == Gfx10DataDepthStencil) { cacheSizeLog2 = 8; } else { ADDR_ASSERT(dataType == Gfx10DataFmask); cacheSizeLog2 = 8; } return cacheSizeLog2; } /** ************************************************************************************************************************ * Gfx10Lib::HwlComputeSurfaceInfoLinear * * @brief * Internal function to calculate alignment for linear surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx10Lib::HwlComputeSurfaceInfoLinear( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (IsTex1d(pIn->resourceType) && (pIn->height > 1)) { returnCode = ADDR_INVALIDPARAMS; } else { const UINT_32 elementBytes = pIn->bpp >> 3; const UINT_32 pitchAlign = (pIn->swizzleMode == ADDR_SW_LINEAR_GENERAL) ? 1 : (256 / elementBytes); const UINT_32 mipDepth = (pIn->resourceType == ADDR_RSRC_TEX_3D) ? pIn->numSlices : 1; UINT_32 pitch = PowTwoAlign(pIn->width, pitchAlign); UINT_32 actualHeight = pIn->height; UINT_64 sliceSize = 0; if (pIn->numMipLevels > 1) { for (INT_32 i = static_cast(pIn->numMipLevels) - 1; i >= 0; i--) { UINT_32 mipWidth, mipHeight; GetMipSize(pIn->width, pIn->height, 1, i, &mipWidth, &mipHeight); const UINT_32 mipActualWidth = PowTwoAlign(mipWidth, pitchAlign); if (pOut->pMipInfo != NULL) { pOut->pMipInfo[i].pitch = mipActualWidth; pOut->pMipInfo[i].height = mipHeight; pOut->pMipInfo[i].depth = mipDepth; pOut->pMipInfo[i].offset = sliceSize; pOut->pMipInfo[i].mipTailOffset = 0; pOut->pMipInfo[i].macroBlockOffset = sliceSize; } sliceSize += static_cast(mipActualWidth) * mipHeight * elementBytes; } } else { returnCode = ApplyCustomizedPitchHeight(pIn, elementBytes, pitchAlign, &pitch, &actualHeight); if (returnCode == ADDR_OK) { sliceSize = static_cast(pitch) * actualHeight * elementBytes; if (pOut->pMipInfo != NULL) { pOut->pMipInfo[0].pitch = pitch; pOut->pMipInfo[0].height = actualHeight; pOut->pMipInfo[0].depth = mipDepth; pOut->pMipInfo[0].offset = 0; pOut->pMipInfo[0].mipTailOffset = 0; pOut->pMipInfo[0].macroBlockOffset = 0; } } } if (returnCode == ADDR_OK) { pOut->pitch = pitch; pOut->height = actualHeight; pOut->numSlices = pIn->numSlices; pOut->sliceSize = sliceSize; pOut->surfSize = sliceSize * pOut->numSlices; pOut->baseAlign = (pIn->swizzleMode == ADDR_SW_LINEAR_GENERAL) ? elementBytes : 256; pOut->blockWidth = pitchAlign; pOut->blockHeight = 1; pOut->blockSlices = 1; // Following members are useless on GFX10 pOut->mipChainPitch = 0; pOut->mipChainHeight = 0; pOut->mipChainSlice = 0; pOut->epitchIsHeight = FALSE; // Post calculation validate ADDR_ASSERT(pOut->sliceSize > 0); } } return returnCode; } } // V2 } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/gfx10/gfx10addrlib.h000066400000000000000000000541241420110115200242370ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** ************************************************************************************************************************ * @file gfx10addrlib.h * @brief Contains the Gfx10Lib class definition. ************************************************************************************************************************ */ #ifndef __GFX10_ADDR_LIB_H__ #define __GFX10_ADDR_LIB_H__ #include "addrlib2.h" #include "coord.h" #include "gfx10SwizzlePattern.h" namespace rocr { namespace Addr { namespace V2 { /** ************************************************************************************************************************ * @brief GFX10 specific settings structure. ************************************************************************************************************************ */ struct Gfx10ChipSettings { struct { UINT_32 reserved1 : 32; // Misc configuration bits UINT_32 isDcn2 : 1; UINT_32 supportRbPlus : 1; UINT_32 dsMipmapHtileFix : 1; UINT_32 dccUnsup3DSwDis : 1; UINT_32 reserved2 : 28; }; }; /** ************************************************************************************************************************ * @brief GFX10 data surface type. ************************************************************************************************************************ */ enum Gfx10DataType { Gfx10DataColor, Gfx10DataDepthStencil, Gfx10DataFmask }; const UINT_32 Gfx10LinearSwModeMask = (1u << ADDR_SW_LINEAR); const UINT_32 Gfx10Blk256BSwModeMask = (1u << ADDR_SW_256B_S) | (1u << ADDR_SW_256B_D); const UINT_32 Gfx10Blk4KBSwModeMask = (1u << ADDR_SW_4KB_S) | (1u << ADDR_SW_4KB_D) | (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_4KB_D_X); const UINT_32 Gfx10Blk64KBSwModeMask = (1u << ADDR_SW_64KB_S) | (1u << ADDR_SW_64KB_D) | (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_64KB_D_T) | (1u << ADDR_SW_64KB_Z_X) | (1u << ADDR_SW_64KB_S_X) | (1u << ADDR_SW_64KB_D_X) | (1u << ADDR_SW_64KB_R_X); const UINT_32 Gfx10BlkVarSwModeMask = (1u << ADDR_SW_VAR_Z_X) | (1u << ADDR_SW_VAR_R_X); const UINT_32 Gfx10ZSwModeMask = (1u << ADDR_SW_64KB_Z_X) | (1u << ADDR_SW_VAR_Z_X); const UINT_32 Gfx10StandardSwModeMask = (1u << ADDR_SW_256B_S) | (1u << ADDR_SW_4KB_S) | (1u << ADDR_SW_64KB_S) | (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_64KB_S_X); const UINT_32 Gfx10DisplaySwModeMask = (1u << ADDR_SW_256B_D) | (1u << ADDR_SW_4KB_D) | (1u << ADDR_SW_64KB_D) | (1u << ADDR_SW_64KB_D_T) | (1u << ADDR_SW_4KB_D_X) | (1u << ADDR_SW_64KB_D_X); const UINT_32 Gfx10RenderSwModeMask = (1u << ADDR_SW_64KB_R_X) | (1u << ADDR_SW_VAR_R_X); const UINT_32 Gfx10XSwModeMask = (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_4KB_D_X) | (1u << ADDR_SW_64KB_Z_X) | (1u << ADDR_SW_64KB_S_X) | (1u << ADDR_SW_64KB_D_X) | (1u << ADDR_SW_64KB_R_X) | Gfx10BlkVarSwModeMask; const UINT_32 Gfx10TSwModeMask = (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_64KB_D_T); const UINT_32 Gfx10XorSwModeMask = Gfx10XSwModeMask | Gfx10TSwModeMask; const UINT_32 Gfx10Rsrc1dSwModeMask = Gfx10LinearSwModeMask | Gfx10RenderSwModeMask | Gfx10ZSwModeMask; const UINT_32 Gfx10Rsrc2dSwModeMask = Gfx10LinearSwModeMask | Gfx10Blk256BSwModeMask | Gfx10Blk4KBSwModeMask | Gfx10Blk64KBSwModeMask | Gfx10BlkVarSwModeMask; const UINT_32 Gfx10Rsrc3dSwModeMask = (1u << ADDR_SW_LINEAR) | (1u << ADDR_SW_4KB_S) | (1u << ADDR_SW_64KB_S) | (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_64KB_Z_X) | (1u << ADDR_SW_64KB_S_X) | (1u << ADDR_SW_64KB_D_X) | (1u << ADDR_SW_64KB_R_X) | Gfx10BlkVarSwModeMask; const UINT_32 Gfx10Rsrc2dPrtSwModeMask = (Gfx10Blk4KBSwModeMask | Gfx10Blk64KBSwModeMask) & ~Gfx10XSwModeMask; const UINT_32 Gfx10Rsrc3dPrtSwModeMask = Gfx10Rsrc2dPrtSwModeMask & ~Gfx10DisplaySwModeMask; const UINT_32 Gfx10Rsrc3dThin64KBSwModeMask = (1u << ADDR_SW_64KB_Z_X) | (1u << ADDR_SW_64KB_R_X); const UINT_32 Gfx10Rsrc3dThinSwModeMask = Gfx10Rsrc3dThin64KBSwModeMask | Gfx10BlkVarSwModeMask; const UINT_32 Gfx10Rsrc3dThickSwModeMask = Gfx10Rsrc3dSwModeMask & ~(Gfx10Rsrc3dThinSwModeMask | Gfx10LinearSwModeMask); const UINT_32 Gfx10Rsrc3dThick4KBSwModeMask = Gfx10Rsrc3dThickSwModeMask & Gfx10Blk4KBSwModeMask; const UINT_32 Gfx10Rsrc3dThick64KBSwModeMask = Gfx10Rsrc3dThickSwModeMask & Gfx10Blk64KBSwModeMask; const UINT_32 Gfx10MsaaSwModeMask = Gfx10ZSwModeMask | Gfx10RenderSwModeMask; const UINT_32 Dcn2NonBpp64SwModeMask = (1u << ADDR_SW_LINEAR) | (1u << ADDR_SW_4KB_S) | (1u << ADDR_SW_64KB_S) | (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_64KB_S_X) | (1u << ADDR_SW_64KB_R_X); const UINT_32 Dcn2Bpp64SwModeMask = (1u << ADDR_SW_4KB_D) | (1u << ADDR_SW_64KB_D) | (1u << ADDR_SW_64KB_D_T) | (1u << ADDR_SW_4KB_D_X) | (1u << ADDR_SW_64KB_D_X) | Dcn2NonBpp64SwModeMask; /** ************************************************************************************************************************ * @brief This class is the GFX10 specific address library * function set. ************************************************************************************************************************ */ class Gfx10Lib : public Lib { public: /// Creates Gfx10Lib object static Addr::Lib* CreateObj(const Client* pClient) { VOID* pMem = Object::ClientAlloc(sizeof(Gfx10Lib), pClient); return (pMem != NULL) ? new (pMem) Gfx10Lib(pClient) : NULL; } protected: Gfx10Lib(const Client* pClient); virtual ~Gfx10Lib(); virtual BOOL_32 HwlIsStandardSwizzle( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isStd; } virtual BOOL_32 HwlIsDisplaySwizzle( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isDisp; } virtual BOOL_32 HwlIsThin( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return ((IsTex1d(resourceType) == TRUE) || (IsTex2d(resourceType) == TRUE) || ((IsTex3d(resourceType) == TRUE) && (m_swizzleModeTable[swizzleMode].isStd == FALSE) && (m_swizzleModeTable[swizzleMode].isDisp == FALSE))); } virtual BOOL_32 HwlIsThick( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return ((IsTex3d(resourceType) == TRUE) && (m_swizzleModeTable[swizzleMode].isStd || m_swizzleModeTable[swizzleMode].isDisp)); } virtual ADDR_E_RETURNCODE HwlComputeHtileInfo( const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeCmaskInfo( const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeDccInfo( const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeCmaskAddrFromCoord( const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut); virtual ADDR_E_RETURNCODE HwlComputeHtileAddrFromCoord( const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut); virtual ADDR_E_RETURNCODE HwlComputeHtileCoordFromAddr( const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut); virtual ADDR_E_RETURNCODE HwlComputeDccAddrFromCoord( const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut); virtual UINT_32 HwlGetEquationIndex( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual UINT_32 HwlGetEquationTableInfo(const ADDR_EQUATION** ppEquationTable) const { *ppEquationTable = m_equationTable; return m_numEquations; } virtual ADDR_E_RETURNCODE HwlComputePipeBankXor( const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSlicePipeBankXor( const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSubResourceOffsetForSwizzlePattern( const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlGetPreferredSurfaceSetting( const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoSanityCheck( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoLinear( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceAddrFromCoordTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; virtual UINT_32 HwlComputeMaxBaseAlignments() const; virtual UINT_32 HwlComputeMaxMetaBaseAlignments() const; virtual BOOL_32 HwlInitGlobalParams(const ADDR_CREATE_INPUT* pCreateIn); virtual ChipFamily HwlConvertChipFamily(UINT_32 uChipFamily, UINT_32 uChipRevision); // Initialize equation table VOID InitEquationTable(); ADDR_E_RETURNCODE ComputeSurfaceInfoMacroTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceInfoMicroTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceAddrFromCoordMacroTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; ADDR_E_RETURNCODE ComputeSurfaceAddrFromCoordMicroTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; private: UINT_32 ComputeOffsetFromSwizzlePattern( const UINT_64* pPattern, UINT_32 numBits, UINT_32 x, UINT_32 y, UINT_32 z, UINT_32 s) const; UINT_32 ComputeOffsetFromEquation( const ADDR_EQUATION* pEq, UINT_32 x, UINT_32 y, UINT_32 z) const; ADDR_E_RETURNCODE ComputeStereoInfo( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32 blkHeight, UINT_32* pAlignY, UINT_32* pRightXor) const; Dim3d GetDccCompressBlk( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 bpp) const { UINT_32 index = Log2(bpp >> 3); Dim3d compressBlkDim; if (IsThin(resourceType, swizzleMode)) { compressBlkDim.w = Block256_2d[index].w; compressBlkDim.h = Block256_2d[index].h; compressBlkDim.d = 1; } else { compressBlkDim = Block256_3d[index]; } return compressBlkDim; } static void GetMipSize( UINT_32 mip0Width, UINT_32 mip0Height, UINT_32 mip0Depth, UINT_32 mipId, UINT_32* pMipWidth, UINT_32* pMipHeight, UINT_32* pMipDepth = NULL) { *pMipWidth = ShiftCeil(Max(mip0Width, 1u), mipId); *pMipHeight = ShiftCeil(Max(mip0Height, 1u), mipId); if (pMipDepth != NULL) { *pMipDepth = ShiftCeil(Max(mip0Depth, 1u), mipId); } } const ADDR_SW_PATINFO* GetSwizzlePatternInfo( AddrSwizzleMode swizzleMode, AddrResourceType resourceType, UINT_32 log2Elem, UINT_32 numFrag) const; VOID GetSwizzlePatternFromPatternInfo( const ADDR_SW_PATINFO* pPatInfo, ADDR_BIT_SETTING (&pSwizzle)[20]) const { memcpy(pSwizzle, GFX10_SW_PATTERN_NIBBLE01[pPatInfo->nibble01Idx], sizeof(GFX10_SW_PATTERN_NIBBLE01[pPatInfo->nibble01Idx])); memcpy(&pSwizzle[8], GFX10_SW_PATTERN_NIBBLE2[pPatInfo->nibble2Idx], sizeof(GFX10_SW_PATTERN_NIBBLE2[pPatInfo->nibble2Idx])); memcpy(&pSwizzle[12], GFX10_SW_PATTERN_NIBBLE3[pPatInfo->nibble3Idx], sizeof(GFX10_SW_PATTERN_NIBBLE3[pPatInfo->nibble3Idx])); memcpy(&pSwizzle[16], GFX10_SW_PATTERN_NIBBLE4[pPatInfo->nibble4Idx], sizeof(GFX10_SW_PATTERN_NIBBLE4[pPatInfo->nibble4Idx])); } VOID ConvertSwizzlePatternToEquation( UINT_32 elemLog2, AddrResourceType rsrcType, AddrSwizzleMode swMode, const ADDR_SW_PATINFO* pPatInfo, ADDR_EQUATION* pEquation) const; static INT_32 GetMetaElementSizeLog2(Gfx10DataType dataType); static INT_32 GetMetaCacheSizeLog2(Gfx10DataType dataType); void GetBlk256SizeLog2( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 elemLog2, UINT_32 numSamplesLog2, Dim3d* pBlock) const; void GetCompressedBlockSizeLog2( Gfx10DataType dataType, AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 elemLog2, UINT_32 numSamplesLog2, Dim3d* pBlock) const; INT_32 GetMetaOverlapLog2( Gfx10DataType dataType, AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 elemLog2, UINT_32 numSamplesLog2) const; INT_32 Get3DMetaOverlapLog2( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 elemLog2) const; UINT_32 GetMetaBlkSize( Gfx10DataType dataType, AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 elemLog2, UINT_32 numSamplesLog2, BOOL_32 pipeAlign, Dim3d* pBlock) const; INT_32 GetPipeRotateAmount( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const; INT_32 GetEffectiveNumPipes() const { return ((m_settings.supportRbPlus == FALSE) || ((m_numSaLog2 + 1) >= m_pipesLog2)) ? m_pipesLog2 : m_numSaLog2 + 1; } BOOL_32 IsRbAligned( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { const BOOL_32 isRtopt = IsRtOptSwizzle(swizzleMode); const BOOL_32 isZ = IsZOrderSwizzle(swizzleMode); const BOOL_32 isDisplay = IsDisplaySwizzle(swizzleMode); return (IsTex2d(resourceType) && (isRtopt || isZ)) || (IsTex3d(resourceType) && isDisplay); } BOOL_32 IsValidDisplaySwizzleMode(const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; UINT_32 GetMaxNumMipsInTail(UINT_32 blockSizeLog2, BOOL_32 isThin) const; static ADDR2_BLOCK_SET GetAllowedBlockSet(ADDR2_SWMODE_SET allowedSwModeSet, AddrResourceType rsrcType) { ADDR2_BLOCK_SET allowedBlockSet = {}; allowedBlockSet.micro = (allowedSwModeSet.value & Gfx10Blk256BSwModeMask) ? TRUE : FALSE; allowedBlockSet.linear = (allowedSwModeSet.value & Gfx10LinearSwModeMask) ? TRUE : FALSE; allowedBlockSet.var = (allowedSwModeSet.value & Gfx10BlkVarSwModeMask) ? TRUE : FALSE; if (rsrcType == ADDR_RSRC_TEX_3D) { allowedBlockSet.macroThick4KB = (allowedSwModeSet.value & Gfx10Rsrc3dThick4KBSwModeMask) ? TRUE : FALSE; allowedBlockSet.macroThin64KB = (allowedSwModeSet.value & Gfx10Rsrc3dThin64KBSwModeMask) ? TRUE : FALSE; allowedBlockSet.macroThick64KB = (allowedSwModeSet.value & Gfx10Rsrc3dThick64KBSwModeMask) ? TRUE : FALSE; } else { allowedBlockSet.macroThin4KB = (allowedSwModeSet.value & Gfx10Blk4KBSwModeMask) ? TRUE : FALSE; allowedBlockSet.macroThin64KB = (allowedSwModeSet.value & Gfx10Blk64KBSwModeMask) ? TRUE : FALSE; } return allowedBlockSet; } static ADDR2_SWTYPE_SET GetAllowedSwSet(ADDR2_SWMODE_SET allowedSwModeSet) { ADDR2_SWTYPE_SET allowedSwSet = {}; allowedSwSet.sw_Z = (allowedSwModeSet.value & Gfx10ZSwModeMask) ? TRUE : FALSE; allowedSwSet.sw_S = (allowedSwModeSet.value & Gfx10StandardSwModeMask) ? TRUE : FALSE; allowedSwSet.sw_D = (allowedSwModeSet.value & Gfx10DisplaySwModeMask) ? TRUE : FALSE; allowedSwSet.sw_R = (allowedSwModeSet.value & Gfx10RenderSwModeMask) ? TRUE : FALSE; return allowedSwSet; } BOOL_32 IsInMipTail( Dim3d mipTailDim, UINT_32 maxNumMipsInTail, UINT_32 mipWidth, UINT_32 mipHeight, UINT_32 numMipsToTheEnd) const { BOOL_32 inTail = ((mipWidth <= mipTailDim.w) && (mipHeight <= mipTailDim.h) && (numMipsToTheEnd <= maxNumMipsInTail)); return inTail; } UINT_32 GetBankXorBits(UINT_32 blockBits) const { return (blockBits > m_pipeInterleaveLog2 + m_pipesLog2 + ColumnBits) ? Min(blockBits - m_pipeInterleaveLog2 - m_pipesLog2 - ColumnBits, BankBits) : 0; } BOOL_32 ValidateNonSwModeParams(const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; BOOL_32 ValidateSwModeParams(const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; static const UINT_32 ColumnBits = 2; static const UINT_32 BankBits = 4; static const UINT_32 UnalignedDccType = 3; static const Dim3d Block256_3d[MaxNumOfBpp]; static const Dim3d Block64K_Log2_3d[MaxNumOfBpp]; static const Dim3d Block4K_Log2_3d[MaxNumOfBpp]; static const SwizzleModeFlags SwizzleModeTable[ADDR_SW_MAX_TYPE]; // Number of packers log2 UINT_32 m_numPkrLog2; // Number of shader array log2 UINT_32 m_numSaLog2; Gfx10ChipSettings m_settings; UINT_32 m_colorBaseIndex; UINT_32 m_xmaskBaseIndex; UINT_32 m_dccBaseIndex; }; } // V2 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/gfx9/000077500000000000000000000000001420110115200215415ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/gfx9/gfx9addrlib.cpp000066400000000000000000005401241420110115200244520ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** ************************************************************************************************************************ * @file gfx9addrlib.cpp * @brief Contgfx9ns the implementation for the Gfx9Lib class. ************************************************************************************************************************ */ #include "gfx9addrlib.h" #include "gfx9_gb_reg.h" #include "amdgpu_asic_addr.h" #include "util/macros.h" //////////////////////////////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////////////////////////////// namespace rocr { namespace Addr { /** ************************************************************************************************************************ * Gfx9HwlInit * * @brief * Creates an Gfx9Lib object. * * @return * Returns an Gfx9Lib object pointer. ************************************************************************************************************************ */ Addr::Lib* Gfx9HwlInit(const Client* pClient) { return V2::Gfx9Lib::CreateObj(pClient); } namespace V2 { //////////////////////////////////////////////////////////////////////////////////////////////////// // Static Const Member //////////////////////////////////////////////////////////////////////////////////////////////////// const SwizzleModeFlags Gfx9Lib::SwizzleModeTable[ADDR_SW_MAX_TYPE] = {//Linear 256B 4KB 64KB Var Z Std Disp Rot XOR T RtOpt Reserved {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // ADDR_SW_LINEAR {0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0}, // ADDR_SW_256B_S {0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}, // ADDR_SW_256B_D {0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, // ADDR_SW_256B_R {0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0}, // ADDR_SW_4KB_Z {0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0}, // ADDR_SW_4KB_S {0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}, // ADDR_SW_4KB_D {0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, // ADDR_SW_4KB_R {0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0}, // ADDR_SW_64KB_Z {0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0}, // ADDR_SW_64KB_S {0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0}, // ADDR_SW_64KB_D {0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0}, // ADDR_SW_64KB_R {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0}, // ADDR_SW_64KB_Z_T {0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0}, // ADDR_SW_64KB_S_T {0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0}, // ADDR_SW_64KB_D_T {0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0}, // ADDR_SW_64KB_R_T {0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0}, // ADDR_SW_4KB_Z_x {0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0}, // ADDR_SW_4KB_S_x {0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0}, // ADDR_SW_4KB_D_x {0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0}, // ADDR_SW_4KB_R_x {0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0}, // ADDR_SW_64KB_Z_X {0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0}, // ADDR_SW_64KB_S_X {0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0}, // ADDR_SW_64KB_D_X {0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0}, // ADDR_SW_64KB_R_X {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // Reserved {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, // ADDR_SW_LINEAR_GENERAL }; const UINT_32 Gfx9Lib::MipTailOffset256B[] = {2048, 1024, 512, 256, 128, 64, 32, 16, 8, 6, 5, 4, 3, 2, 1, 0}; const Dim3d Gfx9Lib::Block256_3dS[] = {{16, 4, 4}, {8, 4, 4}, {4, 4, 4}, {2, 4, 4}, {1, 4, 4}}; const Dim3d Gfx9Lib::Block256_3dZ[] = {{8, 4, 8}, {4, 4, 8}, {4, 4, 4}, {4, 2, 4}, {2, 2, 4}}; /** ************************************************************************************************************************ * Gfx9Lib::Gfx9Lib * * @brief * Constructor * ************************************************************************************************************************ */ Gfx9Lib::Gfx9Lib(const Client* pClient) : Lib(pClient) { m_class = AI_ADDRLIB; memset(&m_settings, 0, sizeof(m_settings)); memcpy(m_swizzleModeTable, SwizzleModeTable, sizeof(SwizzleModeTable)); memset(m_cachedMetaEqKey, 0, sizeof(m_cachedMetaEqKey)); m_metaEqOverrideIndex = 0; } /** ************************************************************************************************************************ * Gfx9Lib::~Gfx9Lib * * @brief * Destructor ************************************************************************************************************************ */ Gfx9Lib::~Gfx9Lib() { } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeHtileInfo * * @brief * Interface function stub of AddrComputeHtilenfo * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeHtileInfo( const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut ///< [out] output structure ) const { UINT_32 numPipeTotal = GetPipeNumForMetaAddressing(pIn->hTileFlags.pipeAligned, pIn->swizzleMode); UINT_32 numRbTotal = pIn->hTileFlags.rbAligned ? m_se * m_rbPerSe : 1; UINT_32 numCompressBlkPerMetaBlk, numCompressBlkPerMetaBlkLog2; if ((numPipeTotal == 1) && (numRbTotal == 1)) { numCompressBlkPerMetaBlkLog2 = 10; } else { if (m_settings.applyAliasFix) { numCompressBlkPerMetaBlkLog2 = m_seLog2 + m_rbPerSeLog2 + Max(10u, m_pipeInterleaveLog2); } else { numCompressBlkPerMetaBlkLog2 = m_seLog2 + m_rbPerSeLog2 + 10; } } numCompressBlkPerMetaBlk = 1 << numCompressBlkPerMetaBlkLog2; Dim3d metaBlkDim = {8, 8, 1}; UINT_32 totalAmpBits = numCompressBlkPerMetaBlkLog2; UINT_32 widthAmp = (pIn->numMipLevels > 1) ? (totalAmpBits >> 1) : RoundHalf(totalAmpBits); UINT_32 heightAmp = totalAmpBits - widthAmp; metaBlkDim.w <<= widthAmp; metaBlkDim.h <<= heightAmp; #if DEBUG Dim3d metaBlkDimDbg = {8, 8, 1}; for (UINT_32 index = 0; index < numCompressBlkPerMetaBlkLog2; index++) { if ((metaBlkDimDbg.h < metaBlkDimDbg.w) || ((pIn->numMipLevels > 1) && (metaBlkDimDbg.h == metaBlkDimDbg.w))) { metaBlkDimDbg.h <<= 1; } else { metaBlkDimDbg.w <<= 1; } } ADDR_ASSERT((metaBlkDimDbg.w == metaBlkDim.w) && (metaBlkDimDbg.h == metaBlkDim.h)); #endif UINT_32 numMetaBlkX; UINT_32 numMetaBlkY; UINT_32 numMetaBlkZ; GetMetaMipInfo(pIn->numMipLevels, &metaBlkDim, FALSE, pOut->pMipInfo, pIn->unalignedWidth, pIn->unalignedHeight, pIn->numSlices, &numMetaBlkX, &numMetaBlkY, &numMetaBlkZ); const UINT_32 metaBlkSize = numCompressBlkPerMetaBlk << 2; UINT_32 align = numPipeTotal * numRbTotal * m_pipeInterleaveBytes; if ((IsXor(pIn->swizzleMode) == FALSE) && (numPipeTotal > 2)) { align *= (numPipeTotal >> 1); } align = Max(align, metaBlkSize); if (m_settings.metaBaseAlignFix) { align = Max(align, GetBlockSize(pIn->swizzleMode)); } if (m_settings.htileAlignFix) { const INT_32 metaBlkSizeLog2 = numCompressBlkPerMetaBlkLog2 + 2; const INT_32 htileCachelineSizeLog2 = 11; const INT_32 maxNumOfRbMaskBits = 1 + Log2(numPipeTotal) + Log2(numRbTotal); INT_32 rbMaskPadding = Max(0, htileCachelineSizeLog2 - (metaBlkSizeLog2 - maxNumOfRbMaskBits)); align <<= rbMaskPadding; } pOut->pitch = numMetaBlkX * metaBlkDim.w; pOut->height = numMetaBlkY * metaBlkDim.h; pOut->sliceSize = numMetaBlkX * numMetaBlkY * metaBlkSize; pOut->metaBlkWidth = metaBlkDim.w; pOut->metaBlkHeight = metaBlkDim.h; pOut->metaBlkNumPerSlice = numMetaBlkX * numMetaBlkY; pOut->baseAlign = align; pOut->htileBytes = PowTwoAlign(pOut->sliceSize * numMetaBlkZ, align); return ADDR_OK; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeCmaskInfo * * @brief * Interface function stub of AddrComputeCmaskInfo * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeCmaskInfo( const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_ASSERT(pIn->resourceType == ADDR_RSRC_TEX_2D); UINT_32 numPipeTotal = GetPipeNumForMetaAddressing(pIn->cMaskFlags.pipeAligned, pIn->swizzleMode); UINT_32 numRbTotal = pIn->cMaskFlags.rbAligned ? m_se * m_rbPerSe : 1; UINT_32 numCompressBlkPerMetaBlkLog2, numCompressBlkPerMetaBlk; if ((numPipeTotal == 1) && (numRbTotal == 1)) { numCompressBlkPerMetaBlkLog2 = 13; } else { if (m_settings.applyAliasFix) { numCompressBlkPerMetaBlkLog2 = m_seLog2 + m_rbPerSeLog2 + Max(10u, m_pipeInterleaveLog2); } else { numCompressBlkPerMetaBlkLog2 = m_seLog2 + m_rbPerSeLog2 + 10; } numCompressBlkPerMetaBlkLog2 = Max(numCompressBlkPerMetaBlkLog2, 13u); } numCompressBlkPerMetaBlk = 1 << numCompressBlkPerMetaBlkLog2; Dim2d metaBlkDim = {8, 8}; UINT_32 totalAmpBits = numCompressBlkPerMetaBlkLog2; UINT_32 heightAmp = totalAmpBits >> 1; UINT_32 widthAmp = totalAmpBits - heightAmp; metaBlkDim.w <<= widthAmp; metaBlkDim.h <<= heightAmp; #if DEBUG Dim2d metaBlkDimDbg = {8, 8}; for (UINT_32 index = 0; index < numCompressBlkPerMetaBlkLog2; index++) { if (metaBlkDimDbg.h < metaBlkDimDbg.w) { metaBlkDimDbg.h <<= 1; } else { metaBlkDimDbg.w <<= 1; } } ADDR_ASSERT((metaBlkDimDbg.w == metaBlkDim.w) && (metaBlkDimDbg.h == metaBlkDim.h)); #endif UINT_32 numMetaBlkX = (pIn->unalignedWidth + metaBlkDim.w - 1) / metaBlkDim.w; UINT_32 numMetaBlkY = (pIn->unalignedHeight + metaBlkDim.h - 1) / metaBlkDim.h; UINT_32 numMetaBlkZ = Max(pIn->numSlices, 1u); UINT_32 sizeAlign = numPipeTotal * numRbTotal * m_pipeInterleaveBytes; if (m_settings.metaBaseAlignFix) { sizeAlign = Max(sizeAlign, GetBlockSize(pIn->swizzleMode)); } pOut->pitch = numMetaBlkX * metaBlkDim.w; pOut->height = numMetaBlkY * metaBlkDim.h; pOut->sliceSize = (numMetaBlkX * numMetaBlkY * numCompressBlkPerMetaBlk) >> 1; pOut->cmaskBytes = PowTwoAlign(pOut->sliceSize * numMetaBlkZ, sizeAlign); pOut->baseAlign = Max(numCompressBlkPerMetaBlk >> 1, sizeAlign); pOut->metaBlkWidth = metaBlkDim.w; pOut->metaBlkHeight = metaBlkDim.h; pOut->metaBlkNumPerSlice = numMetaBlkX * numMetaBlkY; return ADDR_OK; } /** ************************************************************************************************************************ * Gfx9Lib::GetMetaMipInfo * * @brief * Get meta mip info * * @return * N/A ************************************************************************************************************************ */ VOID Gfx9Lib::GetMetaMipInfo( UINT_32 numMipLevels, ///< [in] number of mip levels Dim3d* pMetaBlkDim, ///< [in] meta block dimension BOOL_32 dataThick, ///< [in] data surface is thick ADDR2_META_MIP_INFO* pInfo, ///< [out] meta mip info UINT_32 mip0Width, ///< [in] mip0 width UINT_32 mip0Height, ///< [in] mip0 height UINT_32 mip0Depth, ///< [in] mip0 depth UINT_32* pNumMetaBlkX, ///< [out] number of metablock X in mipchain UINT_32* pNumMetaBlkY, ///< [out] number of metablock Y in mipchain UINT_32* pNumMetaBlkZ) ///< [out] number of metablock Z in mipchain const { UINT_32 numMetaBlkX = (mip0Width + pMetaBlkDim->w - 1) / pMetaBlkDim->w; UINT_32 numMetaBlkY = (mip0Height + pMetaBlkDim->h - 1) / pMetaBlkDim->h; UINT_32 numMetaBlkZ = (mip0Depth + pMetaBlkDim->d - 1) / pMetaBlkDim->d; UINT_32 tailWidth = pMetaBlkDim->w; UINT_32 tailHeight = pMetaBlkDim->h >> 1; UINT_32 tailDepth = pMetaBlkDim->d; BOOL_32 inTail = FALSE; AddrMajorMode major = ADDR_MAJOR_MAX_TYPE; if (numMipLevels > 1) { if (dataThick && (numMetaBlkZ > numMetaBlkX) && (numMetaBlkZ > numMetaBlkY)) { // Z major major = ADDR_MAJOR_Z; } else if (numMetaBlkX >= numMetaBlkY) { // X major major = ADDR_MAJOR_X; } else { // Y major major = ADDR_MAJOR_Y; } inTail = ((mip0Width <= tailWidth) && (mip0Height <= tailHeight) && ((dataThick == FALSE) || (mip0Depth <= tailDepth))); if (inTail == FALSE) { UINT_32 orderLimit; UINT_32 *pMipDim; UINT_32 *pOrderDim; if (major == ADDR_MAJOR_Z) { // Z major pMipDim = &numMetaBlkY; pOrderDim = &numMetaBlkZ; orderLimit = 4; } else if (major == ADDR_MAJOR_X) { // X major pMipDim = &numMetaBlkY; pOrderDim = &numMetaBlkX; orderLimit = 4; } else { // Y major pMipDim = &numMetaBlkX; pOrderDim = &numMetaBlkY; orderLimit = 2; } if ((*pMipDim < 3) && (*pOrderDim > orderLimit) && (numMipLevels > 3)) { *pMipDim += 2; } else { *pMipDim += ((*pMipDim / 2) + (*pMipDim & 1)); } } } if (pInfo != NULL) { UINT_32 mipWidth = mip0Width; UINT_32 mipHeight = mip0Height; UINT_32 mipDepth = mip0Depth; Dim3d mipCoord = {0}; for (UINT_32 mip = 0; mip < numMipLevels; mip++) { if (inTail) { GetMetaMiptailInfo(&pInfo[mip], mipCoord, numMipLevels - mip, pMetaBlkDim); break; } else { mipWidth = PowTwoAlign(mipWidth, pMetaBlkDim->w); mipHeight = PowTwoAlign(mipHeight, pMetaBlkDim->h); mipDepth = PowTwoAlign(mipDepth, pMetaBlkDim->d); pInfo[mip].inMiptail = FALSE; pInfo[mip].startX = mipCoord.w; pInfo[mip].startY = mipCoord.h; pInfo[mip].startZ = mipCoord.d; pInfo[mip].width = mipWidth; pInfo[mip].height = mipHeight; pInfo[mip].depth = dataThick ? mipDepth : 1; if ((mip >= 3) || (mip & 1)) { switch (major) { case ADDR_MAJOR_X: mipCoord.w += mipWidth; break; case ADDR_MAJOR_Y: mipCoord.h += mipHeight; break; case ADDR_MAJOR_Z: mipCoord.d += mipDepth; break; default: break; } } else { switch (major) { case ADDR_MAJOR_X: mipCoord.h += mipHeight; break; case ADDR_MAJOR_Y: mipCoord.w += mipWidth; break; case ADDR_MAJOR_Z: mipCoord.h += mipHeight; break; default: break; } } mipWidth = Max(mipWidth >> 1, 1u); mipHeight = Max(mipHeight >> 1, 1u); mipDepth = Max(mipDepth >> 1, 1u); inTail = ((mipWidth <= tailWidth) && (mipHeight <= tailHeight) && ((dataThick == FALSE) || (mipDepth <= tailDepth))); } } } *pNumMetaBlkX = numMetaBlkX; *pNumMetaBlkY = numMetaBlkY; *pNumMetaBlkZ = numMetaBlkZ; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeDccInfo * * @brief * Interface function to compute DCC key info * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeDccInfo( const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut ///< [out] output structure ) const { BOOL_32 dataLinear = IsLinear(pIn->swizzleMode); BOOL_32 metaLinear = pIn->dccKeyFlags.linear; BOOL_32 pipeAligned = pIn->dccKeyFlags.pipeAligned; if (dataLinear) { metaLinear = TRUE; } else if (metaLinear == TRUE) { pipeAligned = FALSE; } UINT_32 numPipeTotal = GetPipeNumForMetaAddressing(pipeAligned, pIn->swizzleMode); if (metaLinear) { // Linear metadata supporting was removed for GFX9! No one can use this feature on GFX9. ADDR_ASSERT_ALWAYS(); pOut->dccRamBaseAlign = numPipeTotal * m_pipeInterleaveBytes; pOut->dccRamSize = PowTwoAlign((pIn->dataSurfaceSize / 256), pOut->dccRamBaseAlign); } else { BOOL_32 dataThick = IsThick(pIn->resourceType, pIn->swizzleMode); UINT_32 minMetaBlkSize = dataThick ? 65536 : 4096; UINT_32 numFrags = Max(pIn->numFrags, 1u); UINT_32 numSlices = Max(pIn->numSlices, 1u); minMetaBlkSize /= numFrags; UINT_32 numCompressBlkPerMetaBlk = minMetaBlkSize; UINT_32 numRbTotal = pIn->dccKeyFlags.rbAligned ? m_se * m_rbPerSe : 1; if ((numPipeTotal > 1) || (numRbTotal > 1)) { const UINT_32 thinBlkSize = 1 << (m_settings.applyAliasFix ? Max(10u, m_pipeInterleaveLog2) : 10); numCompressBlkPerMetaBlk = Max(numCompressBlkPerMetaBlk, m_se * m_rbPerSe * (dataThick ? 262144 : thinBlkSize)); if (numCompressBlkPerMetaBlk > 65536 * pIn->bpp) { numCompressBlkPerMetaBlk = 65536 * pIn->bpp; } } Dim3d compressBlkDim = GetDccCompressBlk(pIn->resourceType, pIn->swizzleMode, pIn->bpp); Dim3d metaBlkDim = compressBlkDim; for (UINT_32 index = 1; index < numCompressBlkPerMetaBlk; index <<= 1) { if ((metaBlkDim.h < metaBlkDim.w) || ((pIn->numMipLevels > 1) && (metaBlkDim.h == metaBlkDim.w))) { if ((dataThick == FALSE) || (metaBlkDim.h <= metaBlkDim.d)) { metaBlkDim.h <<= 1; } else { metaBlkDim.d <<= 1; } } else { if ((dataThick == FALSE) || (metaBlkDim.w <= metaBlkDim.d)) { metaBlkDim.w <<= 1; } else { metaBlkDim.d <<= 1; } } } UINT_32 numMetaBlkX; UINT_32 numMetaBlkY; UINT_32 numMetaBlkZ; GetMetaMipInfo(pIn->numMipLevels, &metaBlkDim, dataThick, pOut->pMipInfo, pIn->unalignedWidth, pIn->unalignedHeight, numSlices, &numMetaBlkX, &numMetaBlkY, &numMetaBlkZ); UINT_32 sizeAlign = numPipeTotal * numRbTotal * m_pipeInterleaveBytes; if (numFrags > m_maxCompFrag) { sizeAlign *= (numFrags / m_maxCompFrag); } if (m_settings.metaBaseAlignFix) { sizeAlign = Max(sizeAlign, GetBlockSize(pIn->swizzleMode)); } pOut->dccRamSize = numMetaBlkX * numMetaBlkY * numMetaBlkZ * numCompressBlkPerMetaBlk * numFrags; pOut->dccRamSize = PowTwoAlign(pOut->dccRamSize, sizeAlign); pOut->dccRamBaseAlign = Max(numCompressBlkPerMetaBlk, sizeAlign); pOut->pitch = numMetaBlkX * metaBlkDim.w; pOut->height = numMetaBlkY * metaBlkDim.h; pOut->depth = numMetaBlkZ * metaBlkDim.d; pOut->compressBlkWidth = compressBlkDim.w; pOut->compressBlkHeight = compressBlkDim.h; pOut->compressBlkDepth = compressBlkDim.d; pOut->metaBlkWidth = metaBlkDim.w; pOut->metaBlkHeight = metaBlkDim.h; pOut->metaBlkDepth = metaBlkDim.d; pOut->metaBlkNumPerSlice = numMetaBlkX * numMetaBlkY; pOut->fastClearSizePerSlice = pOut->metaBlkNumPerSlice * numCompressBlkPerMetaBlk * Min(numFrags, m_maxCompFrag); } return ADDR_OK; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeMaxBaseAlignments * * @brief * Gets maximum alignments * @return * maximum alignments ************************************************************************************************************************ */ UINT_32 Gfx9Lib::HwlComputeMaxBaseAlignments() const { return Size64K; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeMaxMetaBaseAlignments * * @brief * Gets maximum alignments for metadata * @return * maximum alignments for metadata ************************************************************************************************************************ */ UINT_32 Gfx9Lib::HwlComputeMaxMetaBaseAlignments() const { // Max base alignment for Htile const UINT_32 maxNumPipeTotal = GetPipeNumForMetaAddressing(TRUE, ADDR_SW_64KB_Z); const UINT_32 maxNumRbTotal = m_se * m_rbPerSe; // If applyAliasFix was set, the extra bits should be MAX(10u, m_pipeInterleaveLog2), // but we never saw any ASIC whose m_pipeInterleaveLog2 != 8, so just put an assertion and simply the logic. ADDR_ASSERT((m_settings.applyAliasFix == FALSE) || (m_pipeInterleaveLog2 <= 10u)); const UINT_32 maxNumCompressBlkPerMetaBlk = 1u << (m_seLog2 + m_rbPerSeLog2 + 10u); UINT_32 maxBaseAlignHtile = maxNumPipeTotal * maxNumRbTotal * m_pipeInterleaveBytes; if (maxNumPipeTotal > 2) { maxBaseAlignHtile *= (maxNumPipeTotal >> 1); } maxBaseAlignHtile = Max(maxNumCompressBlkPerMetaBlk << 2, maxBaseAlignHtile); if (m_settings.metaBaseAlignFix) { maxBaseAlignHtile = Max(maxBaseAlignHtile, Size64K); } if (m_settings.htileAlignFix) { maxBaseAlignHtile *= maxNumPipeTotal; } // Max base alignment for Cmask will not be larger than that for Htile, no need to calculate // Max base alignment for 2D Dcc will not be larger than that for 3D, no need to calculate UINT_32 maxBaseAlignDcc3D = 65536; if ((maxNumPipeTotal > 1) || (maxNumRbTotal > 1)) { maxBaseAlignDcc3D = Min(m_se * m_rbPerSe * 262144, 65536 * 128u); } // Max base alignment for Msaa Dcc UINT_32 maxBaseAlignDccMsaa = maxNumPipeTotal * maxNumRbTotal * m_pipeInterleaveBytes * (8 / m_maxCompFrag); if (m_settings.metaBaseAlignFix) { maxBaseAlignDccMsaa = Max(maxBaseAlignDccMsaa, Size64K); } return Max(maxBaseAlignHtile, Max(maxBaseAlignDccMsaa, maxBaseAlignDcc3D)); } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeCmaskAddrFromCoord * * @brief * Interface function stub of AddrComputeCmaskAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeCmaskAddrFromCoord( const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] output structure { ADDR2_COMPUTE_CMASK_INFO_INPUT input = {0}; input.size = sizeof(input); input.cMaskFlags = pIn->cMaskFlags; input.colorFlags = pIn->colorFlags; input.unalignedWidth = Max(pIn->unalignedWidth, 1u); input.unalignedHeight = Max(pIn->unalignedHeight, 1u); input.numSlices = Max(pIn->numSlices, 1u); input.swizzleMode = pIn->swizzleMode; input.resourceType = pIn->resourceType; ADDR2_COMPUTE_CMASK_INFO_OUTPUT output = {0}; output.size = sizeof(output); ADDR_E_RETURNCODE returnCode = ComputeCmaskInfo(&input, &output); if (returnCode == ADDR_OK) { UINT_32 fmaskBpp = GetFmaskBpp(pIn->numSamples, pIn->numFrags); UINT_32 fmaskElementBytesLog2 = Log2(fmaskBpp >> 3); UINT_32 metaBlkWidthLog2 = Log2(output.metaBlkWidth); UINT_32 metaBlkHeightLog2 = Log2(output.metaBlkHeight); MetaEqParams metaEqParams = {0, fmaskElementBytesLog2, 0, pIn->cMaskFlags, Gfx9DataFmask, pIn->swizzleMode, pIn->resourceType, metaBlkWidthLog2, metaBlkHeightLog2, 0, 3, 3, 0}; const CoordEq* pMetaEq = GetMetaEquation(metaEqParams); UINT_32 xb = pIn->x / output.metaBlkWidth; UINT_32 yb = pIn->y / output.metaBlkHeight; UINT_32 zb = pIn->slice; UINT_32 pitchInBlock = output.pitch / output.metaBlkWidth; UINT_32 sliceSizeInBlock = (output.height / output.metaBlkHeight) * pitchInBlock; UINT_32 blockIndex = zb * sliceSizeInBlock + yb * pitchInBlock + xb; UINT_32 coords[] = { pIn->x, pIn->y, pIn->slice, 0, blockIndex }; UINT_64 address = pMetaEq->solve(coords); pOut->addr = address >> 1; pOut->bitPosition = static_cast((address & 1) << 2); UINT_32 numPipeBits = GetPipeLog2ForMetaAddressing(pIn->cMaskFlags.pipeAligned, pIn->swizzleMode); UINT_64 pipeXor = static_cast(pIn->pipeXor & ((1 << numPipeBits) - 1)); pOut->addr ^= (pipeXor << m_pipeInterleaveLog2); } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeHtileAddrFromCoord * * @brief * Interface function stub of AddrComputeHtileAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeHtileAddrFromCoord( const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pIn->numMipLevels > 1) { returnCode = ADDR_NOTIMPLEMENTED; } else { ADDR2_COMPUTE_HTILE_INFO_INPUT input = {0}; input.size = sizeof(input); input.hTileFlags = pIn->hTileFlags; input.depthFlags = pIn->depthflags; input.swizzleMode = pIn->swizzleMode; input.unalignedWidth = Max(pIn->unalignedWidth, 1u); input.unalignedHeight = Max(pIn->unalignedHeight, 1u); input.numSlices = Max(pIn->numSlices, 1u); input.numMipLevels = Max(pIn->numMipLevels, 1u); ADDR2_COMPUTE_HTILE_INFO_OUTPUT output = {0}; output.size = sizeof(output); returnCode = ComputeHtileInfo(&input, &output); if (returnCode == ADDR_OK) { UINT_32 elementBytesLog2 = Log2(pIn->bpp >> 3); UINT_32 metaBlkWidthLog2 = Log2(output.metaBlkWidth); UINT_32 metaBlkHeightLog2 = Log2(output.metaBlkHeight); UINT_32 numSamplesLog2 = Log2(pIn->numSamples); MetaEqParams metaEqParams = {0, elementBytesLog2, numSamplesLog2, pIn->hTileFlags, Gfx9DataDepthStencil, pIn->swizzleMode, ADDR_RSRC_TEX_2D, metaBlkWidthLog2, metaBlkHeightLog2, 0, 3, 3, 0}; const CoordEq* pMetaEq = GetMetaEquation(metaEqParams); UINT_32 xb = pIn->x / output.metaBlkWidth; UINT_32 yb = pIn->y / output.metaBlkHeight; UINT_32 zb = pIn->slice; UINT_32 pitchInBlock = output.pitch / output.metaBlkWidth; UINT_32 sliceSizeInBlock = (output.height / output.metaBlkHeight) * pitchInBlock; UINT_32 blockIndex = zb * sliceSizeInBlock + yb * pitchInBlock + xb; UINT_32 coords[] = { pIn->x, pIn->y, pIn->slice, 0, blockIndex }; UINT_64 address = pMetaEq->solve(coords); pOut->addr = address >> 1; UINT_32 numPipeBits = GetPipeLog2ForMetaAddressing(pIn->hTileFlags.pipeAligned, pIn->swizzleMode); UINT_64 pipeXor = static_cast(pIn->pipeXor & ((1 << numPipeBits) - 1)); pOut->addr ^= (pipeXor << m_pipeInterleaveLog2); } } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeHtileCoordFromAddr * * @brief * Interface function stub of AddrComputeHtileCoordFromAddr * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeHtileCoordFromAddr( const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (pIn->numMipLevels > 1) { returnCode = ADDR_NOTIMPLEMENTED; } else { ADDR2_COMPUTE_HTILE_INFO_INPUT input = {0}; input.size = sizeof(input); input.hTileFlags = pIn->hTileFlags; input.swizzleMode = pIn->swizzleMode; input.unalignedWidth = Max(pIn->unalignedWidth, 1u); input.unalignedHeight = Max(pIn->unalignedHeight, 1u); input.numSlices = Max(pIn->numSlices, 1u); input.numMipLevels = Max(pIn->numMipLevels, 1u); ADDR2_COMPUTE_HTILE_INFO_OUTPUT output = {0}; output.size = sizeof(output); returnCode = ComputeHtileInfo(&input, &output); if (returnCode == ADDR_OK) { UINT_32 elementBytesLog2 = Log2(pIn->bpp >> 3); UINT_32 metaBlkWidthLog2 = Log2(output.metaBlkWidth); UINT_32 metaBlkHeightLog2 = Log2(output.metaBlkHeight); UINT_32 numSamplesLog2 = Log2(pIn->numSamples); MetaEqParams metaEqParams = {0, elementBytesLog2, numSamplesLog2, pIn->hTileFlags, Gfx9DataDepthStencil, pIn->swizzleMode, ADDR_RSRC_TEX_2D, metaBlkWidthLog2, metaBlkHeightLog2, 0, 3, 3, 0}; const CoordEq* pMetaEq = GetMetaEquation(metaEqParams); UINT_32 numPipeBits = GetPipeLog2ForMetaAddressing(pIn->hTileFlags.pipeAligned, pIn->swizzleMode); UINT_64 pipeXor = static_cast(pIn->pipeXor & ((1 << numPipeBits) - 1)); UINT_64 nibbleAddress = (pIn->addr ^ (pipeXor << m_pipeInterleaveLog2)) << 1; UINT_32 pitchInBlock = output.pitch / output.metaBlkWidth; UINT_32 sliceSizeInBlock = (output.height / output.metaBlkHeight) * pitchInBlock; UINT_32 coords[NUM_DIMS]; pMetaEq->solveAddr(nibbleAddress, sliceSizeInBlock, coords); pOut->slice = coords[DIM_M] / sliceSizeInBlock; pOut->y = ((coords[DIM_M] % sliceSizeInBlock) / pitchInBlock) * output.metaBlkHeight + coords[DIM_Y]; pOut->x = (coords[DIM_M] % pitchInBlock) * output.metaBlkWidth + coords[DIM_X]; } } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeDccAddrFromCoord * * @brief * Interface function stub of AddrComputeDccAddrFromCoord * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeDccAddrFromCoord( const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut) { ADDR_E_RETURNCODE returnCode = ADDR_OK; if ((pIn->numMipLevels > 1) || (pIn->mipId > 1) || pIn->dccKeyFlags.linear) { returnCode = ADDR_NOTIMPLEMENTED; } else { UINT_32 elementBytesLog2 = Log2(pIn->bpp >> 3); UINT_32 numSamplesLog2 = Log2(pIn->numFrags); UINT_32 metaBlkWidthLog2 = Log2(pIn->metaBlkWidth); UINT_32 metaBlkHeightLog2 = Log2(pIn->metaBlkHeight); UINT_32 metaBlkDepthLog2 = Log2(pIn->metaBlkDepth); UINT_32 compBlkWidthLog2 = Log2(pIn->compressBlkWidth); UINT_32 compBlkHeightLog2 = Log2(pIn->compressBlkHeight); UINT_32 compBlkDepthLog2 = Log2(pIn->compressBlkDepth); MetaEqParams metaEqParams = {pIn->mipId, elementBytesLog2, numSamplesLog2, pIn->dccKeyFlags, Gfx9DataColor, pIn->swizzleMode, pIn->resourceType, metaBlkWidthLog2, metaBlkHeightLog2, metaBlkDepthLog2, compBlkWidthLog2, compBlkHeightLog2, compBlkDepthLog2}; const CoordEq* pMetaEq = GetMetaEquation(metaEqParams); UINT_32 xb = pIn->x / pIn->metaBlkWidth; UINT_32 yb = pIn->y / pIn->metaBlkHeight; UINT_32 zb = pIn->slice / pIn->metaBlkDepth; UINT_32 pitchInBlock = pIn->pitch / pIn->metaBlkWidth; UINT_32 sliceSizeInBlock = (pIn->height / pIn->metaBlkHeight) * pitchInBlock; UINT_32 blockIndex = zb * sliceSizeInBlock + yb * pitchInBlock + xb; UINT_32 coords[] = { pIn->x, pIn->y, pIn->slice, pIn->sample, blockIndex }; UINT_64 address = pMetaEq->solve(coords); pOut->addr = address >> 1; UINT_32 numPipeBits = GetPipeLog2ForMetaAddressing(pIn->dccKeyFlags.pipeAligned, pIn->swizzleMode); UINT_64 pipeXor = static_cast(pIn->pipeXor & ((1 << numPipeBits) - 1)); pOut->addr ^= (pipeXor << m_pipeInterleaveLog2); } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::HwlInitGlobalParams * * @brief * Initializes global parameters * * @return * TRUE if all settings are valid * ************************************************************************************************************************ */ BOOL_32 Gfx9Lib::HwlInitGlobalParams( const ADDR_CREATE_INPUT* pCreateIn) ///< [in] create input { BOOL_32 valid = TRUE; if (m_settings.isArcticIsland) { GB_ADDR_CONFIG_gfx9 gbAddrConfig; gbAddrConfig.u32All = pCreateIn->regValue.gbAddrConfig; // These values are copied from CModel code switch (gbAddrConfig.bits.NUM_PIPES) { case ADDR_CONFIG_1_PIPE: m_pipes = 1; m_pipesLog2 = 0; break; case ADDR_CONFIG_2_PIPE: m_pipes = 2; m_pipesLog2 = 1; break; case ADDR_CONFIG_4_PIPE: m_pipes = 4; m_pipesLog2 = 2; break; case ADDR_CONFIG_8_PIPE: m_pipes = 8; m_pipesLog2 = 3; break; case ADDR_CONFIG_16_PIPE: m_pipes = 16; m_pipesLog2 = 4; break; case ADDR_CONFIG_32_PIPE: m_pipes = 32; m_pipesLog2 = 5; break; default: ADDR_ASSERT_ALWAYS(); break; } switch (gbAddrConfig.bits.PIPE_INTERLEAVE_SIZE) { case ADDR_CONFIG_PIPE_INTERLEAVE_256B: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_256B; m_pipeInterleaveLog2 = 8; break; case ADDR_CONFIG_PIPE_INTERLEAVE_512B: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_512B; m_pipeInterleaveLog2 = 9; break; case ADDR_CONFIG_PIPE_INTERLEAVE_1KB: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_1KB; m_pipeInterleaveLog2 = 10; break; case ADDR_CONFIG_PIPE_INTERLEAVE_2KB: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_2KB; m_pipeInterleaveLog2 = 11; break; default: ADDR_ASSERT_ALWAYS(); break; } // Addr::V2::Lib::ComputePipeBankXor()/ComputeSlicePipeBankXor() requires pipe interleave to be exactly 8 bits, // and any larger value requires a post-process (left shift) on the output pipeBankXor bits. ADDR_ASSERT(m_pipeInterleaveBytes == ADDR_PIPEINTERLEAVE_256B); switch (gbAddrConfig.bits.NUM_BANKS) { case ADDR_CONFIG_1_BANK: m_banks = 1; m_banksLog2 = 0; break; case ADDR_CONFIG_2_BANK: m_banks = 2; m_banksLog2 = 1; break; case ADDR_CONFIG_4_BANK: m_banks = 4; m_banksLog2 = 2; break; case ADDR_CONFIG_8_BANK: m_banks = 8; m_banksLog2 = 3; break; case ADDR_CONFIG_16_BANK: m_banks = 16; m_banksLog2 = 4; break; default: ADDR_ASSERT_ALWAYS(); break; } switch (gbAddrConfig.bits.NUM_SHADER_ENGINES) { case ADDR_CONFIG_1_SHADER_ENGINE: m_se = 1; m_seLog2 = 0; break; case ADDR_CONFIG_2_SHADER_ENGINE: m_se = 2; m_seLog2 = 1; break; case ADDR_CONFIG_4_SHADER_ENGINE: m_se = 4; m_seLog2 = 2; break; case ADDR_CONFIG_8_SHADER_ENGINE: m_se = 8; m_seLog2 = 3; break; default: ADDR_ASSERT_ALWAYS(); break; } switch (gbAddrConfig.bits.NUM_RB_PER_SE) { case ADDR_CONFIG_1_RB_PER_SHADER_ENGINE: m_rbPerSe = 1; m_rbPerSeLog2 = 0; break; case ADDR_CONFIG_2_RB_PER_SHADER_ENGINE: m_rbPerSe = 2; m_rbPerSeLog2 = 1; break; case ADDR_CONFIG_4_RB_PER_SHADER_ENGINE: m_rbPerSe = 4; m_rbPerSeLog2 = 2; break; default: ADDR_ASSERT_ALWAYS(); break; } switch (gbAddrConfig.bits.MAX_COMPRESSED_FRAGS) { case ADDR_CONFIG_1_MAX_COMPRESSED_FRAGMENTS: m_maxCompFrag = 1; m_maxCompFragLog2 = 0; break; case ADDR_CONFIG_2_MAX_COMPRESSED_FRAGMENTS: m_maxCompFrag = 2; m_maxCompFragLog2 = 1; break; case ADDR_CONFIG_4_MAX_COMPRESSED_FRAGMENTS: m_maxCompFrag = 4; m_maxCompFragLog2 = 2; break; case ADDR_CONFIG_8_MAX_COMPRESSED_FRAGMENTS: m_maxCompFrag = 8; m_maxCompFragLog2 = 3; break; default: ADDR_ASSERT_ALWAYS(); break; } if ((m_rbPerSeLog2 == 1) && (((m_pipesLog2 == 1) && ((m_seLog2 == 2) || (m_seLog2 == 3))) || ((m_pipesLog2 == 2) && ((m_seLog2 == 1) || (m_seLog2 == 2))))) { ADDR_ASSERT(m_settings.isVega10 == FALSE); ADDR_ASSERT(m_settings.isRaven == FALSE); ADDR_ASSERT(m_settings.isVega20 == FALSE); if (m_settings.isVega12) { m_settings.htileCacheRbConflict = 1; } } // For simplicity we never allow VAR swizzle mode for GFX9, the actural value is 18 on GFX9 m_blockVarSizeLog2 = 0; } else { valid = FALSE; ADDR_NOT_IMPLEMENTED(); } if (valid) { InitEquationTable(); } return valid; } /** ************************************************************************************************************************ * Gfx9Lib::HwlConvertChipFamily * * @brief * Convert familyID defined in atiid.h to ChipFamily and set m_chipFamily/m_chipRevision * @return * ChipFamily ************************************************************************************************************************ */ ChipFamily Gfx9Lib::HwlConvertChipFamily( UINT_32 uChipFamily, ///< [in] chip family defined in atiih.h UINT_32 uChipRevision) ///< [in] chip revision defined in "asic_family"_id.h { ChipFamily family = ADDR_CHIP_FAMILY_AI; switch (uChipFamily) { case FAMILY_AI: m_settings.isArcticIsland = 1; m_settings.isVega10 = ASICREV_IS_VEGA10_P(uChipRevision); m_settings.isVega12 = ASICREV_IS_VEGA12_P(uChipRevision); m_settings.isVega20 = ASICREV_IS_VEGA20_P(uChipRevision); m_settings.isDce12 = 1; if (m_settings.isVega10 == 0) { m_settings.htileAlignFix = 1; m_settings.applyAliasFix = 1; } m_settings.metaBaseAlignFix = 1; m_settings.depthPipeXorDisable = 1; break; case FAMILY_RV: m_settings.isArcticIsland = 1; if (ASICREV_IS_RAVEN(uChipRevision)) { m_settings.isRaven = 1; m_settings.depthPipeXorDisable = 1; } if (ASICREV_IS_RAVEN2(uChipRevision)) { m_settings.isRaven = 1; } if (m_settings.isRaven == 0) { m_settings.htileAlignFix = 1; m_settings.applyAliasFix = 1; } if (ASICREV_IS_RENOIR(uChipRevision)) { m_settings.isRaven = 1; } m_settings.isDcn1 = m_settings.isRaven; m_settings.metaBaseAlignFix = 1; break; default: ADDR_ASSERT(!"This should be a Fusion"); break; } return family; } /** ************************************************************************************************************************ * Gfx9Lib::InitRbEquation * * @brief * Init RB equation * @return * N/A ************************************************************************************************************************ */ VOID Gfx9Lib::GetRbEquation( CoordEq* pRbEq, ///< [out] rb equation UINT_32 numRbPerSeLog2, ///< [in] number of rb per shader engine UINT_32 numSeLog2) ///< [in] number of shader engine const { // RB's are distributed on 16x16, except when we have 1 rb per se, in which case its 32x32 UINT_32 rbRegion = (numRbPerSeLog2 == 0) ? 5 : 4; Coordinate cx(DIM_X, rbRegion); Coordinate cy(DIM_Y, rbRegion); UINT_32 start = 0; UINT_32 numRbTotalLog2 = numRbPerSeLog2 + numSeLog2; // Clear the rb equation pRbEq->resize(0); pRbEq->resize(numRbTotalLog2); if ((numSeLog2 > 0) && (numRbPerSeLog2 == 1)) { // Special case when more than 1 SE, and 2 RB per SE (*pRbEq)[0].add(cx); (*pRbEq)[0].add(cy); cx++; cy++; if (m_settings.applyAliasFix == false) { (*pRbEq)[0].add(cy); } (*pRbEq)[0].add(cy); start++; } UINT_32 numBits = 2 * (numRbTotalLog2 - start); for (UINT_32 i = 0; i < numBits; i++) { UINT_32 idx = start + (((start + i) >= numRbTotalLog2) ? (2 * (numRbTotalLog2 - start) - i - 1) : i); if ((i % 2) == 1) { (*pRbEq)[idx].add(cx); cx++; } else { (*pRbEq)[idx].add(cy); cy++; } } } /** ************************************************************************************************************************ * Gfx9Lib::GetDataEquation * * @brief * Get data equation for fmask and Z * @return * N/A ************************************************************************************************************************ */ VOID Gfx9Lib::GetDataEquation( CoordEq* pDataEq, ///< [out] data surface equation Gfx9DataType dataSurfaceType, ///< [in] data surface type AddrSwizzleMode swizzleMode, ///< [in] data surface swizzle mode AddrResourceType resourceType, ///< [in] data surface resource type UINT_32 elementBytesLog2, ///< [in] data surface element bytes UINT_32 numSamplesLog2) ///< [in] data surface sample count const { Coordinate cx(DIM_X, 0); Coordinate cy(DIM_Y, 0); Coordinate cz(DIM_Z, 0); Coordinate cs(DIM_S, 0); // Clear the equation pDataEq->resize(0); pDataEq->resize(27); if (dataSurfaceType == Gfx9DataColor) { if (IsLinear(swizzleMode)) { Coordinate cm(DIM_M, 0); pDataEq->resize(49); for (UINT_32 i = 0; i < 49; i++) { (*pDataEq)[i].add(cm); cm++; } } else if (IsThick(resourceType, swizzleMode)) { // Color 3d_S and 3d_Z modes, 3d_D is same as color 2d UINT_32 i; if (IsStandardSwizzle(resourceType, swizzleMode)) { // Standard 3d swizzle // Fill in bottom x bits for (i = elementBytesLog2; i < 4; i++) { (*pDataEq)[i].add(cx); cx++; } // Fill in 2 bits of y and then z for (i = 4; i < 6; i++) { (*pDataEq)[i].add(cy); cy++; } for (i = 6; i < 8; i++) { (*pDataEq)[i].add(cz); cz++; } if (elementBytesLog2 < 2) { // fill in z & y bit (*pDataEq)[8].add(cz); (*pDataEq)[9].add(cy); cz++; cy++; } else if (elementBytesLog2 == 2) { // fill in y and x bit (*pDataEq)[8].add(cy); (*pDataEq)[9].add(cx); cy++; cx++; } else { // fill in 2 x bits (*pDataEq)[8].add(cx); cx++; (*pDataEq)[9].add(cx); cx++; } } else { // Z 3d swizzle UINT_32 m2dEnd = (elementBytesLog2 ==0) ? 3 : ((elementBytesLog2 < 4) ? 4 : 5); UINT_32 numZs = (elementBytesLog2 == 0 || elementBytesLog2 == 4) ? 2 : ((elementBytesLog2 == 1) ? 3 : 1); pDataEq->mort2d(cx, cy, elementBytesLog2, m2dEnd); for (i = m2dEnd + 1; i <= m2dEnd + numZs; i++) { (*pDataEq)[i].add(cz); cz++; } if ((elementBytesLog2 == 0) || (elementBytesLog2 == 3)) { // add an x and z (*pDataEq)[6].add(cx); (*pDataEq)[7].add(cz); cx++; cz++; } else if (elementBytesLog2 == 2) { // add a y and z (*pDataEq)[6].add(cy); (*pDataEq)[7].add(cz); cy++; cz++; } // add y and x (*pDataEq)[8].add(cy); (*pDataEq)[9].add(cx); cy++; cx++; } // Fill in bit 10 and up pDataEq->mort3d( cz, cy, cx, 10 ); } else if (IsThin(resourceType, swizzleMode)) { UINT_32 blockSizeLog2 = GetBlockSizeLog2(swizzleMode); // Color 2D UINT_32 microYBits = (8 - elementBytesLog2) / 2; UINT_32 tileSplitStart = blockSizeLog2 - numSamplesLog2; UINT_32 i; // Fill in bottom x bits for (i = elementBytesLog2; i < 4; i++) { (*pDataEq)[i].add(cx); cx++; } // Fill in bottom y bits for (i = 4; i < 4 + microYBits; i++) { (*pDataEq)[i].add(cy); cy++; } // Fill in last of the micro_x bits for (i = 4 + microYBits; i < 8; i++) { (*pDataEq)[i].add(cx); cx++; } // Fill in x/y bits below sample split pDataEq->mort2d(cy, cx, 8, tileSplitStart - 1); // Fill in sample bits for (i = 0; i < numSamplesLog2; i++) { cs.set(DIM_S, i); (*pDataEq)[tileSplitStart + i].add(cs); } // Fill in x/y bits above sample split if ((numSamplesLog2 & 1) ^ (blockSizeLog2 & 1)) { pDataEq->mort2d(cx, cy, blockSizeLog2); } else { pDataEq->mort2d(cy, cx, blockSizeLog2); } } else { ADDR_ASSERT_ALWAYS(); } } else { // Fmask or depth UINT_32 sampleStart = elementBytesLog2; UINT_32 pixelStart = elementBytesLog2 + numSamplesLog2; UINT_32 ymajStart = 6 + numSamplesLog2; for (UINT_32 s = 0; s < numSamplesLog2; s++) { cs.set(DIM_S, s); (*pDataEq)[sampleStart + s].add(cs); } // Put in the x-major order pixel bits pDataEq->mort2d(cx, cy, pixelStart, ymajStart - 1); // Put in the y-major order pixel bits pDataEq->mort2d(cy, cx, ymajStart); } } /** ************************************************************************************************************************ * Gfx9Lib::GetPipeEquation * * @brief * Get pipe equation * @return * N/A ************************************************************************************************************************ */ VOID Gfx9Lib::GetPipeEquation( CoordEq* pPipeEq, ///< [out] pipe equation CoordEq* pDataEq, ///< [in] data equation UINT_32 pipeInterleaveLog2, ///< [in] pipe interleave UINT_32 numPipeLog2, ///< [in] number of pipes UINT_32 numSamplesLog2, ///< [in] data surface sample count Gfx9DataType dataSurfaceType, ///< [in] data surface type AddrSwizzleMode swizzleMode, ///< [in] data surface swizzle mode AddrResourceType resourceType ///< [in] data surface resource type ) const { UINT_32 blockSizeLog2 = GetBlockSizeLog2(swizzleMode); CoordEq dataEq; pDataEq->copy(dataEq); if (dataSurfaceType == Gfx9DataColor) { INT_32 shift = static_cast(numSamplesLog2); dataEq.shift(-shift, blockSizeLog2 - numSamplesLog2); } dataEq.copy(*pPipeEq, pipeInterleaveLog2, numPipeLog2); // This section should only apply to z/stencil, maybe fmask // If the pipe bit is below the comp block size, // then keep moving up the address until we find a bit that is above UINT_32 pipeStart = 0; if (dataSurfaceType != Gfx9DataColor) { Coordinate tileMin(DIM_X, 3); while (dataEq[pipeInterleaveLog2 + pipeStart][0] < tileMin) { pipeStart++; } // if pipe is 0, then the first pipe bit is above the comp block size, // so we don't need to do anything // Note, this if condition is not necessary, since if we execute the loop when pipe==0, // we will get the same pipe equation if (pipeStart != 0) { for (UINT_32 i = 0; i < numPipeLog2; i++) { // Copy the jth bit above pipe interleave to the current pipe equation bit dataEq[pipeInterleaveLog2 + pipeStart + i].copyto((*pPipeEq)[i]); } } } if (IsPrt(swizzleMode)) { // Clear out bits above the block size if prt's are enabled dataEq.resize(blockSizeLog2); dataEq.resize(48); } if (IsXor(swizzleMode)) { CoordEq xorMask; if (IsThick(resourceType, swizzleMode)) { CoordEq xorMask2; dataEq.copy(xorMask2, pipeInterleaveLog2 + numPipeLog2, 2 * numPipeLog2); xorMask.resize(numPipeLog2); for (UINT_32 pipeIdx = 0; pipeIdx < numPipeLog2; pipeIdx++) { xorMask[pipeIdx].add(xorMask2[2 * pipeIdx]); xorMask[pipeIdx].add(xorMask2[2 * pipeIdx + 1]); } } else { // Xor in the bits above the pipe+gpu bits dataEq.copy(xorMask, pipeInterleaveLog2 + pipeStart + numPipeLog2, numPipeLog2); if ((numSamplesLog2 == 0) && (IsPrt(swizzleMode) == FALSE)) { Coordinate co; CoordEq xorMask2; // if 1xaa and not prt, then xor in the z bits xorMask2.resize(0); xorMask2.resize(numPipeLog2); for (UINT_32 pipeIdx = 0; pipeIdx < numPipeLog2; pipeIdx++) { co.set(DIM_Z, numPipeLog2 - 1 - pipeIdx); xorMask2[pipeIdx].add(co); } pPipeEq->xorin(xorMask2); } } xorMask.reverse(); pPipeEq->xorin(xorMask); } } /** ************************************************************************************************************************ * Gfx9Lib::GetMetaEquation * * @brief * Get meta equation for cmask/htile/DCC * @return * Pointer to a calculated meta equation ************************************************************************************************************************ */ const CoordEq* Gfx9Lib::GetMetaEquation( const MetaEqParams& metaEqParams) { UINT_32 cachedMetaEqIndex; for (cachedMetaEqIndex = 0; cachedMetaEqIndex < MaxCachedMetaEq; cachedMetaEqIndex++) { if (memcmp(&metaEqParams, &m_cachedMetaEqKey[cachedMetaEqIndex], static_cast(sizeof(metaEqParams))) == 0) { break; } } CoordEq* pMetaEq = NULL; if (cachedMetaEqIndex < MaxCachedMetaEq) { pMetaEq = &m_cachedMetaEq[cachedMetaEqIndex]; } else { m_cachedMetaEqKey[m_metaEqOverrideIndex] = metaEqParams; pMetaEq = &m_cachedMetaEq[m_metaEqOverrideIndex++]; m_metaEqOverrideIndex %= MaxCachedMetaEq; GenMetaEquation(pMetaEq, metaEqParams.maxMip, metaEqParams.elementBytesLog2, metaEqParams.numSamplesLog2, metaEqParams.metaFlag, metaEqParams.dataSurfaceType, metaEqParams.swizzleMode, metaEqParams.resourceType, metaEqParams.metaBlkWidthLog2, metaEqParams.metaBlkHeightLog2, metaEqParams.metaBlkDepthLog2, metaEqParams.compBlkWidthLog2, metaEqParams.compBlkHeightLog2, metaEqParams.compBlkDepthLog2); } return pMetaEq; } /** ************************************************************************************************************************ * Gfx9Lib::GenMetaEquation * * @brief * Get meta equation for cmask/htile/DCC * @return * N/A ************************************************************************************************************************ */ VOID Gfx9Lib::GenMetaEquation( CoordEq* pMetaEq, ///< [out] meta equation UINT_32 maxMip, ///< [in] max mip Id UINT_32 elementBytesLog2, ///< [in] data surface element bytes UINT_32 numSamplesLog2, ///< [in] data surface sample count ADDR2_META_FLAGS metaFlag, ///< [in] meta falg Gfx9DataType dataSurfaceType, ///< [in] data surface type AddrSwizzleMode swizzleMode, ///< [in] data surface swizzle mode AddrResourceType resourceType, ///< [in] data surface resource type UINT_32 metaBlkWidthLog2, ///< [in] meta block width UINT_32 metaBlkHeightLog2, ///< [in] meta block height UINT_32 metaBlkDepthLog2, ///< [in] meta block depth UINT_32 compBlkWidthLog2, ///< [in] compress block width UINT_32 compBlkHeightLog2, ///< [in] compress block height UINT_32 compBlkDepthLog2) ///< [in] compress block depth const { UINT_32 numPipeTotalLog2 = GetPipeLog2ForMetaAddressing(metaFlag.pipeAligned, swizzleMode); UINT_32 pipeInterleaveLog2 = m_pipeInterleaveLog2; // Get the correct data address and rb equation CoordEq dataEq; GetDataEquation(&dataEq, dataSurfaceType, swizzleMode, resourceType, elementBytesLog2, numSamplesLog2); // Get pipe and rb equations CoordEq pipeEquation; GetPipeEquation(&pipeEquation, &dataEq, pipeInterleaveLog2, numPipeTotalLog2, numSamplesLog2, dataSurfaceType, swizzleMode, resourceType); numPipeTotalLog2 = pipeEquation.getsize(); if (metaFlag.linear) { // Linear metadata supporting was removed for GFX9! No one can use this feature. ADDR_ASSERT_ALWAYS(); ADDR_ASSERT(dataSurfaceType == Gfx9DataColor); dataEq.copy(*pMetaEq); if (IsLinear(swizzleMode)) { if (metaFlag.pipeAligned) { // Remove the pipe bits INT_32 shift = static_cast(numPipeTotalLog2); pMetaEq->shift(-shift, pipeInterleaveLog2); } // Divide by comp block size, which for linear (which is always color) is 256 B pMetaEq->shift(-8); if (metaFlag.pipeAligned) { // Put pipe bits back in pMetaEq->shift(numPipeTotalLog2, pipeInterleaveLog2); for (UINT_32 i = 0; i < numPipeTotalLog2; i++) { pipeEquation[i].copyto((*pMetaEq)[pipeInterleaveLog2 + i]); } } } pMetaEq->shift(1); } else { UINT_32 maxCompFragLog2 = static_cast(m_maxCompFragLog2); UINT_32 compFragLog2 = ((dataSurfaceType == Gfx9DataColor) && (numSamplesLog2 > maxCompFragLog2)) ? maxCompFragLog2 : numSamplesLog2; UINT_32 uncompFragLog2 = numSamplesLog2 - compFragLog2; // Make sure the metaaddr is cleared pMetaEq->resize(0); pMetaEq->resize(27); if (IsThick(resourceType, swizzleMode)) { Coordinate cx(DIM_X, 0); Coordinate cy(DIM_Y, 0); Coordinate cz(DIM_Z, 0); if (maxMip > 0) { pMetaEq->mort3d(cy, cx, cz); } else { pMetaEq->mort3d(cx, cy, cz); } } else { Coordinate cx(DIM_X, 0); Coordinate cy(DIM_Y, 0); Coordinate cs; if (maxMip > 0) { pMetaEq->mort2d(cy, cx, compFragLog2); } else { pMetaEq->mort2d(cx, cy, compFragLog2); } //------------------------------------------------------------------------------------------------------------------------ // Put the compressible fragments at the lsb // the uncompressible frags will be at the msb of the micro address //------------------------------------------------------------------------------------------------------------------------ for (UINT_32 s = 0; s < compFragLog2; s++) { cs.set(DIM_S, s); (*pMetaEq)[s].add(cs); } } // Keep a copy of the pipe equations CoordEq origPipeEquation; pipeEquation.copy(origPipeEquation); Coordinate co; // filter out everything under the compressed block size co.set(DIM_X, compBlkWidthLog2); pMetaEq->Filter('<', co, 0, DIM_X); co.set(DIM_Y, compBlkHeightLog2); pMetaEq->Filter('<', co, 0, DIM_Y); co.set(DIM_Z, compBlkDepthLog2); pMetaEq->Filter('<', co, 0, DIM_Z); // For non-color, filter out sample bits if (dataSurfaceType != Gfx9DataColor) { co.set(DIM_X, 0); pMetaEq->Filter('<', co, 0, DIM_S); } // filter out everything above the metablock size co.set(DIM_X, metaBlkWidthLog2 - 1); pMetaEq->Filter('>', co, 0, DIM_X); co.set(DIM_Y, metaBlkHeightLog2 - 1); pMetaEq->Filter('>', co, 0, DIM_Y); co.set(DIM_Z, metaBlkDepthLog2 - 1); pMetaEq->Filter('>', co, 0, DIM_Z); // filter out everything above the metablock size for the channel bits co.set(DIM_X, metaBlkWidthLog2 - 1); pipeEquation.Filter('>', co, 0, DIM_X); co.set(DIM_Y, metaBlkHeightLog2 - 1); pipeEquation.Filter('>', co, 0, DIM_Y); co.set(DIM_Z, metaBlkDepthLog2 - 1); pipeEquation.Filter('>', co, 0, DIM_Z); // Make sure we still have the same number of channel bits if (pipeEquation.getsize() != numPipeTotalLog2) { ADDR_ASSERT_ALWAYS(); } // Loop through all channel and rb bits, // and make sure these components exist in the metadata address for (UINT_32 i = 0; i < numPipeTotalLog2; i++) { for (UINT_32 j = pipeEquation[i].getsize(); j > 0; j--) { if (pMetaEq->Exists(pipeEquation[i][j - 1]) == FALSE) { ADDR_ASSERT_ALWAYS(); } } } const UINT_32 numSeLog2 = metaFlag.rbAligned ? m_seLog2 : 0; const UINT_32 numRbPeSeLog2 = metaFlag.rbAligned ? m_rbPerSeLog2 : 0; const UINT_32 numRbTotalLog2 = numRbPeSeLog2 + numSeLog2; CoordEq origRbEquation; GetRbEquation(&origRbEquation, numRbPeSeLog2, numSeLog2); CoordEq rbEquation = origRbEquation; for (UINT_32 i = 0; i < numRbTotalLog2; i++) { for (UINT_32 j = rbEquation[i].getsize(); j > 0; j--) { if (pMetaEq->Exists(rbEquation[i][j - 1]) == FALSE) { ADDR_ASSERT_ALWAYS(); } } } if (m_settings.applyAliasFix) { co.set(DIM_Z, -1); } // Loop through each rb id bit; if it is equal to any of the filtered channel bits, clear it for (UINT_32 i = 0; i < numRbTotalLog2; i++) { for (UINT_32 j = 0; j < numPipeTotalLog2; j++) { BOOL_32 isRbEquationInPipeEquation = FALSE; if (m_settings.applyAliasFix) { CoordTerm filteredPipeEq; filteredPipeEq = pipeEquation[j]; filteredPipeEq.Filter('>', co, 0, DIM_Z); isRbEquationInPipeEquation = (rbEquation[i] == filteredPipeEq); } else { isRbEquationInPipeEquation = (rbEquation[i] == pipeEquation[j]); } if (isRbEquationInPipeEquation) { rbEquation[i].Clear(); } } } bool rbAppendedWithPipeBits[1 << (MaxSeLog2 + MaxRbPerSeLog2)] = {}; // Loop through each bit of the channel, get the smallest coordinate, // and remove it from the metaaddr, and rb_equation for (UINT_32 i = 0; i < numPipeTotalLog2; i++) { pipeEquation[i].getsmallest(co); UINT_32 old_size = pMetaEq->getsize(); pMetaEq->Filter('=', co); UINT_32 new_size = pMetaEq->getsize(); if (new_size != old_size-1) { ADDR_ASSERT_ALWAYS(); } pipeEquation.remove(co); for (UINT_32 j = 0; j < numRbTotalLog2; j++) { if (rbEquation[j].remove(co)) { // if we actually removed something from this bit, then add the remaining // channel bits, as these can be removed for this bit for (UINT_32 k = 0; k < pipeEquation[i].getsize(); k++) { if (pipeEquation[i][k] != co) { rbEquation[j].add(pipeEquation[i][k]); rbAppendedWithPipeBits[j] = true; } } } } } // Loop through the rb bits and see what remain; // filter out the smallest coordinate if it remains UINT_32 rbBitsLeft = 0; for (UINT_32 i = 0; i < numRbTotalLog2; i++) { BOOL_32 isRbEqAppended = FALSE; if (m_settings.applyAliasFix) { isRbEqAppended = (rbEquation[i].getsize() > (rbAppendedWithPipeBits[i] ? 1 : 0)); } else { isRbEqAppended = (rbEquation[i].getsize() > 0); } if (isRbEqAppended) { rbBitsLeft++; rbEquation[i].getsmallest(co); UINT_32 old_size = pMetaEq->getsize(); pMetaEq->Filter('=', co); UINT_32 new_size = pMetaEq->getsize(); if (new_size != old_size - 1) { // assert warning } for (UINT_32 j = i + 1; j < numRbTotalLog2; j++) { if (rbEquation[j].remove(co)) { // if we actually removed something from this bit, then add the remaining // rb bits, as these can be removed for this bit for (UINT_32 k = 0; k < rbEquation[i].getsize(); k++) { if (rbEquation[i][k] != co) { rbEquation[j].add(rbEquation[i][k]); rbAppendedWithPipeBits[j] |= rbAppendedWithPipeBits[i]; } } } } } } // capture the size of the metaaddr UINT_32 metaSize = pMetaEq->getsize(); // resize to 49 bits...make this a nibble address pMetaEq->resize(49); // Concatenate the macro address above the current address for (UINT_32 i = metaSize, j = 0; i < 49; i++, j++) { co.set(DIM_M, j); (*pMetaEq)[i].add(co); } // Multiply by meta element size (in nibbles) if (dataSurfaceType == Gfx9DataColor) { pMetaEq->shift(1); } else if (dataSurfaceType == Gfx9DataDepthStencil) { pMetaEq->shift(3); } //------------------------------------------------------------------------------------------ // Note the pipeInterleaveLog2+1 is because address is a nibble address // Shift up from pipe interleave number of channel // and rb bits left, and uncompressed fragments //------------------------------------------------------------------------------------------ pMetaEq->shift(numPipeTotalLog2 + rbBitsLeft + uncompFragLog2, pipeInterleaveLog2 + 1); // Put in the channel bits for (UINT_32 i = 0; i < numPipeTotalLog2; i++) { origPipeEquation[i].copyto((*pMetaEq)[pipeInterleaveLog2+1 + i]); } // Put in remaining rb bits for (UINT_32 i = 0, j = 0; j < rbBitsLeft; i = (i + 1) % numRbTotalLog2) { BOOL_32 isRbEqAppended = FALSE; if (m_settings.applyAliasFix) { isRbEqAppended = (rbEquation[i].getsize() > (rbAppendedWithPipeBits[i] ? 1 : 0)); } else { isRbEqAppended = (rbEquation[i].getsize() > 0); } if (isRbEqAppended) { origRbEquation[i].copyto((*pMetaEq)[pipeInterleaveLog2 + 1 + numPipeTotalLog2 + j]); // Mark any rb bit we add in to the rb mask j++; } } //------------------------------------------------------------------------------------------ // Put in the uncompressed fragment bits //------------------------------------------------------------------------------------------ for (UINT_32 i = 0; i < uncompFragLog2; i++) { co.set(DIM_S, compFragLog2 + i); (*pMetaEq)[pipeInterleaveLog2 + 1 + numPipeTotalLog2 + rbBitsLeft + i].add(co); } } } /** ************************************************************************************************************************ * Gfx9Lib::IsEquationSupported * * @brief * Check if equation is supported for given swizzle mode and resource type. * * @return * TRUE if supported ************************************************************************************************************************ */ BOOL_32 Gfx9Lib::IsEquationSupported( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2) const { BOOL_32 supported = (elementBytesLog2 < MaxElementBytesLog2) && (IsValidSwMode(swMode) == TRUE) && (IsLinear(swMode) == FALSE) && (((IsTex2d(rsrcType) == TRUE) && ((elementBytesLog2 < 4) || ((IsRotateSwizzle(swMode) == FALSE) && (IsZOrderSwizzle(swMode) == FALSE)))) || ((IsTex3d(rsrcType) == TRUE) && (IsRotateSwizzle(swMode) == FALSE) && (IsBlock256b(swMode) == FALSE))); return supported; } /** ************************************************************************************************************************ * Gfx9Lib::InitEquationTable * * @brief * Initialize Equation table. * * @return * N/A ************************************************************************************************************************ */ VOID Gfx9Lib::InitEquationTable() { memset(m_equationTable, 0, sizeof(m_equationTable)); // Loop all possible resource type (2D/3D) for (UINT_32 rsrcTypeIdx = 0; rsrcTypeIdx < MaxRsrcType; rsrcTypeIdx++) { AddrResourceType rsrcType = static_cast(rsrcTypeIdx + ADDR_RSRC_TEX_2D); // Loop all possible swizzle mode for (UINT_32 swModeIdx = 0; swModeIdx < MaxSwModeType; swModeIdx++) { AddrSwizzleMode swMode = static_cast(swModeIdx); // Loop all possible bpp for (UINT_32 bppIdx = 0; bppIdx < MaxElementBytesLog2; bppIdx++) { UINT_32 equationIndex = ADDR_INVALID_EQUATION_INDEX; // Check if the input is supported if (IsEquationSupported(rsrcType, swMode, bppIdx)) { ADDR_EQUATION equation; ADDR_E_RETURNCODE retCode; memset(&equation, 0, sizeof(ADDR_EQUATION)); // Generate the equation if (IsBlock256b(swMode) && IsTex2d(rsrcType)) { retCode = ComputeBlock256Equation(rsrcType, swMode, bppIdx, &equation); } else if (IsThin(rsrcType, swMode)) { retCode = ComputeThinEquation(rsrcType, swMode, bppIdx, &equation); } else { retCode = ComputeThickEquation(rsrcType, swMode, bppIdx, &equation); } // Only fill the equation into the table if the return code is ADDR_OK, // otherwise if the return code is not ADDR_OK, it indicates this is not // a valid input, we do nothing but just fill invalid equation index // into the lookup table. if (retCode == ADDR_OK) { equationIndex = m_numEquations; ADDR_ASSERT(equationIndex < EquationTableSize); m_equationTable[equationIndex] = equation; m_numEquations++; } else { ADDR_ASSERT_ALWAYS(); } } // Fill the index into the lookup table, if the combination is not supported // fill the invalid equation index m_equationLookupTable[rsrcTypeIdx][swModeIdx][bppIdx] = equationIndex; } } } } /** ************************************************************************************************************************ * Gfx9Lib::HwlGetEquationIndex * * @brief * Interface function stub of GetEquationIndex * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ UINT_32 Gfx9Lib::HwlGetEquationIndex( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ) const { AddrResourceType rsrcType = pIn->resourceType; AddrSwizzleMode swMode = pIn->swizzleMode; UINT_32 elementBytesLog2 = Log2(pIn->bpp >> 3); UINT_32 index = ADDR_INVALID_EQUATION_INDEX; if (IsEquationSupported(rsrcType, swMode, elementBytesLog2)) { UINT_32 rsrcTypeIdx = static_cast(rsrcType) - 1; UINT_32 swModeIdx = static_cast(swMode); index = m_equationLookupTable[rsrcTypeIdx][swModeIdx][elementBytesLog2]; } if (pOut->pMipInfo != NULL) { for (UINT_32 i = 0; i < pIn->numMipLevels; i++) { pOut->pMipInfo[i].equationIndex = index; } } return index; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeBlock256Equation * * @brief * Interface function stub of ComputeBlock256Equation * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeBlock256Equation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_E_RETURNCODE ret = ADDR_OK; pEquation->numBits = 8; UINT_32 i = 0; for (; i < elementBytesLog2; i++) { InitChannel(1, 0 , i, &pEquation->addr[i]); } ADDR_CHANNEL_SETTING* pixelBit = &pEquation->addr[elementBytesLog2]; const UINT_32 maxBitsUsed = 4; ADDR_CHANNEL_SETTING x[maxBitsUsed] = {}; ADDR_CHANNEL_SETTING y[maxBitsUsed] = {}; for (i = 0; i < maxBitsUsed; i++) { InitChannel(1, 0, elementBytesLog2 + i, &x[i]); InitChannel(1, 1, i, &y[i]); } if (IsStandardSwizzle(rsrcType, swMode)) { switch (elementBytesLog2) { case 0: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = x[2]; pixelBit[3] = x[3]; pixelBit[4] = y[0]; pixelBit[5] = y[1]; pixelBit[6] = y[2]; pixelBit[7] = y[3]; break; case 1: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = x[2]; pixelBit[3] = y[0]; pixelBit[4] = y[1]; pixelBit[5] = y[2]; pixelBit[6] = x[3]; break; case 2: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = y[0]; pixelBit[3] = y[1]; pixelBit[4] = y[2]; pixelBit[5] = x[2]; break; case 3: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = y[1]; pixelBit[3] = x[1]; pixelBit[4] = x[2]; break; case 4: pixelBit[0] = y[0]; pixelBit[1] = y[1]; pixelBit[2] = x[0]; pixelBit[3] = x[1]; break; default: ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; break; } } else if (IsDisplaySwizzle(rsrcType, swMode)) { switch (elementBytesLog2) { case 0: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = x[2]; pixelBit[3] = y[1]; pixelBit[4] = y[0]; pixelBit[5] = y[2]; pixelBit[6] = x[3]; pixelBit[7] = y[3]; break; case 1: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = x[2]; pixelBit[3] = y[0]; pixelBit[4] = y[1]; pixelBit[5] = y[2]; pixelBit[6] = x[3]; break; case 2: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = y[0]; pixelBit[3] = x[2]; pixelBit[4] = y[1]; pixelBit[5] = y[2]; break; case 3: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = x[1]; pixelBit[3] = x[2]; pixelBit[4] = y[1]; break; case 4: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = x[1]; pixelBit[3] = y[1]; break; default: ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; break; } } else if (IsRotateSwizzle(swMode)) { switch (elementBytesLog2) { case 0: pixelBit[0] = y[0]; pixelBit[1] = y[1]; pixelBit[2] = y[2]; pixelBit[3] = x[1]; pixelBit[4] = x[0]; pixelBit[5] = x[2]; pixelBit[6] = x[3]; pixelBit[7] = y[3]; break; case 1: pixelBit[0] = y[0]; pixelBit[1] = y[1]; pixelBit[2] = y[2]; pixelBit[3] = x[0]; pixelBit[4] = x[1]; pixelBit[5] = x[2]; pixelBit[6] = x[3]; break; case 2: pixelBit[0] = y[0]; pixelBit[1] = y[1]; pixelBit[2] = x[0]; pixelBit[3] = y[2]; pixelBit[4] = x[1]; pixelBit[5] = x[2]; break; case 3: pixelBit[0] = y[0]; pixelBit[1] = x[0]; pixelBit[2] = y[1]; pixelBit[3] = x[1]; pixelBit[4] = x[2]; break; default: ADDR_ASSERT_ALWAYS(); case 4: ret = ADDR_INVALIDPARAMS; break; } } else { ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; } // Post validation if (ret == ADDR_OK) { ASSERTED Dim2d microBlockDim = Block256_2d[elementBytesLog2]; ADDR_ASSERT((2u << GetMaxValidChannelIndex(pEquation->addr, 8, 0)) == (microBlockDim.w * (1 << elementBytesLog2))); ADDR_ASSERT((2u << GetMaxValidChannelIndex(pEquation->addr, 8, 1)) == microBlockDim.h); } return ret; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeThinEquation * * @brief * Interface function stub of ComputeThinEquation * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeThinEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_E_RETURNCODE ret = ADDR_OK; UINT_32 blockSizeLog2 = GetBlockSizeLog2(swMode); UINT_32 maxXorBits = blockSizeLog2; if (IsNonPrtXor(swMode)) { // For non-prt-xor, maybe need to initialize some more bits for xor // The highest xor bit used in equation will be max the following 3 items: // 1. m_pipeInterleaveLog2 + 2 * pipeXorBits // 2. m_pipeInterleaveLog2 + pipeXorBits + 2 * bankXorBits // 3. blockSizeLog2 maxXorBits = Max(maxXorBits, m_pipeInterleaveLog2 + 2 * GetPipeXorBits(blockSizeLog2)); maxXorBits = Max(maxXorBits, m_pipeInterleaveLog2 + GetPipeXorBits(blockSizeLog2) + 2 * GetBankXorBits(blockSizeLog2)); } const UINT_32 maxBitsUsed = 14; ADDR_ASSERT((2 * maxBitsUsed) >= maxXorBits); ADDR_CHANNEL_SETTING x[maxBitsUsed] = {}; ADDR_CHANNEL_SETTING y[maxBitsUsed] = {}; const UINT_32 extraXorBits = 16; ADDR_ASSERT(extraXorBits >= maxXorBits - blockSizeLog2); ADDR_CHANNEL_SETTING xorExtra[extraXorBits] = {}; for (UINT_32 i = 0; i < maxBitsUsed; i++) { InitChannel(1, 0, elementBytesLog2 + i, &x[i]); InitChannel(1, 1, i, &y[i]); } ADDR_CHANNEL_SETTING* pixelBit = pEquation->addr; for (UINT_32 i = 0; i < elementBytesLog2; i++) { InitChannel(1, 0 , i, &pixelBit[i]); } UINT_32 xIdx = 0; UINT_32 yIdx = 0; UINT_32 lowBits = 0; if (IsZOrderSwizzle(swMode)) { if (elementBytesLog2 <= 3) { for (UINT_32 i = elementBytesLog2; i < 6; i++) { pixelBit[i] = (((i - elementBytesLog2) & 1) == 0) ? x[xIdx++] : y[yIdx++]; } lowBits = 6; } else { ret = ADDR_INVALIDPARAMS; } } else { ret = HwlComputeBlock256Equation(rsrcType, swMode, elementBytesLog2, pEquation); if (ret == ADDR_OK) { Dim2d microBlockDim = Block256_2d[elementBytesLog2]; xIdx = Log2(microBlockDim.w); yIdx = Log2(microBlockDim.h); lowBits = 8; } } if (ret == ADDR_OK) { for (UINT_32 i = lowBits; i < blockSizeLog2; i++) { pixelBit[i] = ((i & 1) == 0) ? y[yIdx++] : x[xIdx++]; } for (UINT_32 i = blockSizeLog2; i < maxXorBits; i++) { xorExtra[i - blockSizeLog2] = ((i & 1) == 0) ? y[yIdx++] : x[xIdx++]; } if (IsXor(swMode)) { // Fill XOR bits UINT_32 pipeStart = m_pipeInterleaveLog2; UINT_32 pipeXorBits = GetPipeXorBits(blockSizeLog2); UINT_32 bankStart = pipeStart + pipeXorBits; UINT_32 bankXorBits = GetBankXorBits(blockSizeLog2); for (UINT_32 i = 0; i < pipeXorBits; i++) { UINT_32 xor1BitPos = pipeStart + 2 * pipeXorBits - 1 - i; ADDR_CHANNEL_SETTING* pXor1Src = (xor1BitPos < blockSizeLog2) ? &pEquation->addr[xor1BitPos] : &xorExtra[xor1BitPos - blockSizeLog2]; InitChannel(&pEquation->xor1[pipeStart + i], pXor1Src); } for (UINT_32 i = 0; i < bankXorBits; i++) { UINT_32 xor1BitPos = bankStart + 2 * bankXorBits - 1 - i; ADDR_CHANNEL_SETTING* pXor1Src = (xor1BitPos < blockSizeLog2) ? &pEquation->addr[xor1BitPos] : &xorExtra[xor1BitPos - blockSizeLog2]; InitChannel(&pEquation->xor1[bankStart + i], pXor1Src); } if (IsPrt(swMode) == FALSE) { for (UINT_32 i = 0; i < pipeXorBits; i++) { InitChannel(1, 2, pipeXorBits - i - 1, &pEquation->xor2[pipeStart + i]); } for (UINT_32 i = 0; i < bankXorBits; i++) { InitChannel(1, 2, bankXorBits - i - 1 + pipeXorBits, &pEquation->xor2[bankStart + i]); } } } pEquation->numBits = blockSizeLog2; } return ret; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeThickEquation * * @brief * Interface function stub of ComputeThickEquation * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeThickEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const { ADDR_E_RETURNCODE ret = ADDR_OK; ADDR_ASSERT(IsTex3d(rsrcType)); UINT_32 blockSizeLog2 = GetBlockSizeLog2(swMode); UINT_32 maxXorBits = blockSizeLog2; if (IsNonPrtXor(swMode)) { // For non-prt-xor, maybe need to initialize some more bits for xor // The highest xor bit used in equation will be max the following 3: // 1. m_pipeInterleaveLog2 + 3 * pipeXorBits // 2. m_pipeInterleaveLog2 + pipeXorBits + 3 * bankXorBits // 3. blockSizeLog2 maxXorBits = Max(maxXorBits, m_pipeInterleaveLog2 + 3 * GetPipeXorBits(blockSizeLog2)); maxXorBits = Max(maxXorBits, m_pipeInterleaveLog2 + GetPipeXorBits(blockSizeLog2) + 3 * GetBankXorBits(blockSizeLog2)); } for (UINT_32 i = 0; i < elementBytesLog2; i++) { InitChannel(1, 0 , i, &pEquation->addr[i]); } ADDR_CHANNEL_SETTING* pixelBit = &pEquation->addr[elementBytesLog2]; const UINT_32 maxBitsUsed = 12; ADDR_ASSERT((3 * maxBitsUsed) >= maxXorBits); ADDR_CHANNEL_SETTING x[maxBitsUsed] = {}; ADDR_CHANNEL_SETTING y[maxBitsUsed] = {}; ADDR_CHANNEL_SETTING z[maxBitsUsed] = {}; const UINT_32 extraXorBits = 24; ADDR_ASSERT(extraXorBits >= maxXorBits - blockSizeLog2); ADDR_CHANNEL_SETTING xorExtra[extraXorBits] = {}; for (UINT_32 i = 0; i < maxBitsUsed; i++) { InitChannel(1, 0, elementBytesLog2 + i, &x[i]); InitChannel(1, 1, i, &y[i]); InitChannel(1, 2, i, &z[i]); } if (IsZOrderSwizzle(swMode)) { switch (elementBytesLog2) { case 0: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = x[1]; pixelBit[3] = y[1]; pixelBit[4] = z[0]; pixelBit[5] = z[1]; pixelBit[6] = x[2]; pixelBit[7] = z[2]; pixelBit[8] = y[2]; pixelBit[9] = x[3]; break; case 1: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = x[1]; pixelBit[3] = y[1]; pixelBit[4] = z[0]; pixelBit[5] = z[1]; pixelBit[6] = z[2]; pixelBit[7] = y[2]; pixelBit[8] = x[2]; break; case 2: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = x[1]; pixelBit[3] = z[0]; pixelBit[4] = y[1]; pixelBit[5] = z[1]; pixelBit[6] = y[2]; pixelBit[7] = x[2]; break; case 3: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = z[0]; pixelBit[3] = x[1]; pixelBit[4] = z[1]; pixelBit[5] = y[1]; pixelBit[6] = x[2]; break; case 4: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = z[0]; pixelBit[3] = z[1]; pixelBit[4] = y[1]; pixelBit[5] = x[1]; break; default: ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; break; } } else if (IsStandardSwizzle(rsrcType, swMode)) { switch (elementBytesLog2) { case 0: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = x[2]; pixelBit[3] = x[3]; pixelBit[4] = y[0]; pixelBit[5] = y[1]; pixelBit[6] = z[0]; pixelBit[7] = z[1]; pixelBit[8] = z[2]; pixelBit[9] = y[2]; break; case 1: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = x[2]; pixelBit[3] = y[0]; pixelBit[4] = y[1]; pixelBit[5] = z[0]; pixelBit[6] = z[1]; pixelBit[7] = z[2]; pixelBit[8] = y[2]; break; case 2: pixelBit[0] = x[0]; pixelBit[1] = x[1]; pixelBit[2] = y[0]; pixelBit[3] = y[1]; pixelBit[4] = z[0]; pixelBit[5] = z[1]; pixelBit[6] = y[2]; pixelBit[7] = x[2]; break; case 3: pixelBit[0] = x[0]; pixelBit[1] = y[0]; pixelBit[2] = y[1]; pixelBit[3] = z[0]; pixelBit[4] = z[1]; pixelBit[5] = x[1]; pixelBit[6] = x[2]; break; case 4: pixelBit[0] = y[0]; pixelBit[1] = y[1]; pixelBit[2] = z[0]; pixelBit[3] = z[1]; pixelBit[4] = x[0]; pixelBit[5] = x[1]; break; default: ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; break; } } else { ADDR_ASSERT_ALWAYS(); ret = ADDR_INVALIDPARAMS; } if (ret == ADDR_OK) { Dim3d microBlockDim = Block1K_3d[elementBytesLog2]; UINT_32 xIdx = Log2(microBlockDim.w); UINT_32 yIdx = Log2(microBlockDim.h); UINT_32 zIdx = Log2(microBlockDim.d); pixelBit = pEquation->addr; const UINT_32 lowBits = 10; ADDR_ASSERT(pEquation->addr[lowBits - 1].valid == 1); ADDR_ASSERT(pEquation->addr[lowBits].valid == 0); for (UINT_32 i = lowBits; i < blockSizeLog2; i++) { if ((i % 3) == 0) { pixelBit[i] = x[xIdx++]; } else if ((i % 3) == 1) { pixelBit[i] = z[zIdx++]; } else { pixelBit[i] = y[yIdx++]; } } for (UINT_32 i = blockSizeLog2; i < maxXorBits; i++) { if ((i % 3) == 0) { xorExtra[i - blockSizeLog2] = x[xIdx++]; } else if ((i % 3) == 1) { xorExtra[i - blockSizeLog2] = z[zIdx++]; } else { xorExtra[i - blockSizeLog2] = y[yIdx++]; } } if (IsXor(swMode)) { // Fill XOR bits UINT_32 pipeStart = m_pipeInterleaveLog2; UINT_32 pipeXorBits = GetPipeXorBits(blockSizeLog2); for (UINT_32 i = 0; i < pipeXorBits; i++) { UINT_32 xor1BitPos = pipeStart + (3 * pipeXorBits) - 1 - (2 * i); ADDR_CHANNEL_SETTING* pXor1Src = (xor1BitPos < blockSizeLog2) ? &pEquation->addr[xor1BitPos] : &xorExtra[xor1BitPos - blockSizeLog2]; InitChannel(&pEquation->xor1[pipeStart + i], pXor1Src); UINT_32 xor2BitPos = pipeStart + (3 * pipeXorBits) - 2 - (2 * i); ADDR_CHANNEL_SETTING* pXor2Src = (xor2BitPos < blockSizeLog2) ? &pEquation->addr[xor2BitPos] : &xorExtra[xor2BitPos - blockSizeLog2]; InitChannel(&pEquation->xor2[pipeStart + i], pXor2Src); } UINT_32 bankStart = pipeStart + pipeXorBits; UINT_32 bankXorBits = GetBankXorBits(blockSizeLog2); for (UINT_32 i = 0; i < bankXorBits; i++) { UINT_32 xor1BitPos = bankStart + (3 * bankXorBits) - 1 - (2 * i); ADDR_CHANNEL_SETTING* pXor1Src = (xor1BitPos < blockSizeLog2) ? &pEquation->addr[xor1BitPos] : &xorExtra[xor1BitPos - blockSizeLog2]; InitChannel(&pEquation->xor1[bankStart + i], pXor1Src); UINT_32 xor2BitPos = bankStart + (3 * bankXorBits) - 2 - (2 * i); ADDR_CHANNEL_SETTING* pXor2Src = (xor2BitPos < blockSizeLog2) ? &pEquation->addr[xor2BitPos] : &xorExtra[xor2BitPos - blockSizeLog2]; InitChannel(&pEquation->xor2[bankStart + i], pXor2Src); } } pEquation->numBits = blockSizeLog2; } return ret; } /** ************************************************************************************************************************ * Gfx9Lib::IsValidDisplaySwizzleMode * * @brief * Check if a swizzle mode is supported by display engine * * @return * TRUE is swizzle mode is supported by display engine ************************************************************************************************************************ */ BOOL_32 Gfx9Lib::IsValidDisplaySwizzleMode( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { BOOL_32 support = FALSE; if (m_settings.isDce12) { switch (pIn->swizzleMode) { case ADDR_SW_256B_D: case ADDR_SW_256B_R: support = (pIn->bpp == 32); break; case ADDR_SW_LINEAR: case ADDR_SW_4KB_D: case ADDR_SW_4KB_R: case ADDR_SW_64KB_D: case ADDR_SW_64KB_R: case ADDR_SW_4KB_D_X: case ADDR_SW_4KB_R_X: case ADDR_SW_64KB_D_X: case ADDR_SW_64KB_R_X: support = (pIn->bpp <= 64); break; default: break; } } else if (m_settings.isDcn1) { switch (pIn->swizzleMode) { case ADDR_SW_4KB_D: case ADDR_SW_64KB_D: case ADDR_SW_64KB_D_T: case ADDR_SW_4KB_D_X: case ADDR_SW_64KB_D_X: support = (pIn->bpp == 64); break; case ADDR_SW_LINEAR: case ADDR_SW_4KB_S: case ADDR_SW_64KB_S: case ADDR_SW_64KB_S_T: case ADDR_SW_4KB_S_X: case ADDR_SW_64KB_S_X: support = (pIn->bpp <= 64); break; default: break; } } else { ADDR_NOT_IMPLEMENTED(); } return support; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputePipeBankXor * * @brief * Generate a PipeBankXor value to be ORed into bits above pipeInterleaveBits of address * * @return * PipeBankXor value ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputePipeBankXor( const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut) const { if (IsXor(pIn->swizzleMode)) { UINT_32 macroBlockBits = GetBlockSizeLog2(pIn->swizzleMode); UINT_32 pipeBits = GetPipeXorBits(macroBlockBits); UINT_32 bankBits = GetBankXorBits(macroBlockBits); UINT_32 pipeXor = 0; UINT_32 bankXor = 0; const UINT_32 bankMask = (1 << bankBits) - 1; const UINT_32 index = pIn->surfIndex & bankMask; const UINT_32 bpp = pIn->flags.fmask ? GetFmaskBpp(pIn->numSamples, pIn->numFrags) : GetElemLib()->GetBitsPerPixel(pIn->format); if (bankBits == 4) { static const UINT_32 BankXorSmallBpp[] = {0, 7, 4, 3, 8, 15, 12, 11, 1, 6, 5, 2, 9, 14, 13, 10}; static const UINT_32 BankXorLargeBpp[] = {0, 7, 8, 15, 4, 3, 12, 11, 1, 6, 9, 14, 5, 2, 13, 10}; bankXor = (bpp <= 32) ? BankXorSmallBpp[index] : BankXorLargeBpp[index]; } else if (bankBits > 0) { UINT_32 bankIncrease = (1 << (bankBits - 1)) - 1; bankIncrease = (bankIncrease == 0) ? 1 : bankIncrease; bankXor = (index * bankIncrease) & bankMask; } pOut->pipeBankXor = (bankXor << pipeBits) | pipeXor; } else { pOut->pipeBankXor = 0; } return ADDR_OK; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeSlicePipeBankXor * * @brief * Generate slice PipeBankXor value based on base PipeBankXor value and slice id * * @return * PipeBankXor value ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeSlicePipeBankXor( const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut) const { UINT_32 macroBlockBits = GetBlockSizeLog2(pIn->swizzleMode); UINT_32 pipeBits = GetPipeXorBits(macroBlockBits); UINT_32 bankBits = GetBankXorBits(macroBlockBits); UINT_32 pipeXor = ReverseBitVector(pIn->slice, pipeBits); UINT_32 bankXor = ReverseBitVector(pIn->slice >> pipeBits, bankBits); pOut->pipeBankXor = pIn->basePipeBankXor ^ (pipeXor | (bankXor << pipeBits)); return ADDR_OK; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeSubResourceOffsetForSwizzlePattern * * @brief * Compute sub resource offset to support swizzle pattern * * @return * Offset ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeSubResourceOffsetForSwizzlePattern( const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut) const { ADDR_ASSERT(IsThin(pIn->resourceType, pIn->swizzleMode)); UINT_32 macroBlockBits = GetBlockSizeLog2(pIn->swizzleMode); UINT_32 pipeBits = GetPipeXorBits(macroBlockBits); UINT_32 bankBits = GetBankXorBits(macroBlockBits); UINT_32 pipeXor = ReverseBitVector(pIn->slice, pipeBits); UINT_32 bankXor = ReverseBitVector(pIn->slice >> pipeBits, bankBits); UINT_32 pipeBankXor = ((pipeXor | (bankXor << pipeBits)) ^ (pIn->pipeBankXor)) << m_pipeInterleaveLog2; pOut->offset = pIn->slice * pIn->sliceSize + pIn->macroBlockOffset + (pIn->mipTailOffset ^ pipeBankXor) - static_cast(pipeBankXor); return ADDR_OK; } /** ************************************************************************************************************************ * Gfx9Lib::ValidateNonSwModeParams * * @brief * Validate compute surface info params except swizzle mode * * @return * TRUE if parameters are valid, FALSE otherwise ************************************************************************************************************************ */ BOOL_32 Gfx9Lib::ValidateNonSwModeParams( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { BOOL_32 valid = TRUE; if ((pIn->bpp == 0) || (pIn->bpp > 128) || (pIn->width == 0) || (pIn->numFrags > 8) || (pIn->numSamples > 16)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } if (pIn->resourceType >= ADDR_RSRC_MAX_TYPE) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } const BOOL_32 mipmap = (pIn->numMipLevels > 1); const BOOL_32 msaa = (pIn->numFrags > 1); const BOOL_32 isBc = ElemLib::IsBlockCompressed(pIn->format); const AddrResourceType rsrcType = pIn->resourceType; const BOOL_32 tex3d = IsTex3d(rsrcType); const BOOL_32 tex2d = IsTex2d(rsrcType); const BOOL_32 tex1d = IsTex1d(rsrcType); const ADDR2_SURFACE_FLAGS flags = pIn->flags; const BOOL_32 zbuffer = flags.depth || flags.stencil; const BOOL_32 display = flags.display || flags.rotated; const BOOL_32 stereo = flags.qbStereo; const BOOL_32 fmask = flags.fmask; // Resource type check if (tex1d) { if (msaa || zbuffer || display || stereo || isBc || fmask) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (tex2d) { if ((msaa && mipmap) || (stereo && msaa) || (stereo && mipmap)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (tex3d) { if (msaa || zbuffer || display || stereo || fmask) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else { ADDR_ASSERT_ALWAYS(); valid = FALSE; } return valid; } /** ************************************************************************************************************************ * Gfx9Lib::ValidateSwModeParams * * @brief * Validate compute surface info related to swizzle mode * * @return * TRUE if parameters are valid, FALSE otherwise ************************************************************************************************************************ */ BOOL_32 Gfx9Lib::ValidateSwModeParams( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { BOOL_32 valid = TRUE; if ((pIn->swizzleMode >= ADDR_SW_MAX_TYPE) || (IsValidSwMode(pIn->swizzleMode) == FALSE)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } const BOOL_32 mipmap = (pIn->numMipLevels > 1); const BOOL_32 msaa = (pIn->numFrags > 1); const BOOL_32 isBc = ElemLib::IsBlockCompressed(pIn->format); const BOOL_32 is422 = ElemLib::IsMacroPixelPacked(pIn->format); const AddrResourceType rsrcType = pIn->resourceType; const BOOL_32 tex3d = IsTex3d(rsrcType); const BOOL_32 tex2d = IsTex2d(rsrcType); const BOOL_32 tex1d = IsTex1d(rsrcType); const AddrSwizzleMode swizzle = pIn->swizzleMode; const BOOL_32 linear = IsLinear(swizzle); const BOOL_32 blk256B = IsBlock256b(swizzle); const BOOL_32 isNonPrtXor = IsNonPrtXor(swizzle); const ADDR2_SURFACE_FLAGS flags = pIn->flags; const BOOL_32 zbuffer = flags.depth || flags.stencil; const BOOL_32 color = flags.color; const BOOL_32 texture = flags.texture; const BOOL_32 display = flags.display || flags.rotated; const BOOL_32 prt = flags.prt; const BOOL_32 fmask = flags.fmask; const BOOL_32 thin3d = tex3d && flags.view3dAs2dArray; const BOOL_32 zMaxMip = tex3d && mipmap && (pIn->numSlices >= pIn->width) && (pIn->numSlices >= pIn->height); // Misc check if (msaa && (GetBlockSize(swizzle) < (m_pipeInterleaveBytes * pIn->numFrags))) { // MSAA surface must have blk_bytes/pipe_interleave >= num_samples ADDR_ASSERT_ALWAYS(); valid = FALSE; } if (display && (IsValidDisplaySwizzleMode(pIn) == FALSE)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } if ((pIn->bpp == 96) && (linear == FALSE)) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } if (prt && isNonPrtXor) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } // Resource type check if (tex1d) { if (linear == FALSE) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } // Swizzle type check if (linear) { if (((tex1d == FALSE) && prt) || zbuffer || msaa || (pIn->bpp == 0) || ((pIn->bpp % 8) != 0) || (isBc && texture) || fmask) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (IsZOrderSwizzle(swizzle)) { if ((color && msaa) || thin3d || isBc || is422 || (tex2d && (pIn->bpp > 64)) || (msaa && (pIn->bpp > 32))) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (IsStandardSwizzle(swizzle)) { if (zbuffer || thin3d || (tex3d && (pIn->bpp == 128) && color) || fmask) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (IsDisplaySwizzle(swizzle)) { if (zbuffer || (prt && tex3d) || fmask || zMaxMip) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else if (IsRotateSwizzle(swizzle)) { if (zbuffer || (pIn->bpp > 64) || tex3d || isBc || fmask) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } else { ADDR_ASSERT_ALWAYS(); valid = FALSE; } // Block type check if (blk256B) { if (prt || zbuffer || tex3d || mipmap || msaa) { ADDR_ASSERT_ALWAYS(); valid = FALSE; } } return valid; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeSurfaceInfoSanityCheck * * @brief * Compute surface info sanity check * * @return * ADDR_OK if parameters are valid, ADDR_INVALIDPARAMS otherwise ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeSurfaceInfoSanityCheck( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const { return ValidateNonSwModeParams(pIn) && ValidateSwModeParams(pIn) ? ADDR_OK : ADDR_INVALIDPARAMS; } /** ************************************************************************************************************************ * Gfx9Lib::HwlGetPreferredSurfaceSetting * * @brief * Internal function to get suggested surface information for cliet to use * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlGetPreferredSurfaceSetting( const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut) const { ADDR_E_RETURNCODE returnCode = ADDR_INVALIDPARAMS; ElemLib* pElemLib = GetElemLib(); UINT_32 bpp = pIn->bpp; UINT_32 width = Max(pIn->width, 1u); UINT_32 height = Max(pIn->height, 1u); UINT_32 numSamples = Max(pIn->numSamples, 1u); UINT_32 numFrags = (pIn->numFrags == 0) ? numSamples : pIn->numFrags; if (pIn->flags.fmask) { bpp = GetFmaskBpp(numSamples, numFrags); numFrags = 1; numSamples = 1; pOut->resourceType = ADDR_RSRC_TEX_2D; } else { // Set format to INVALID will skip this conversion if (pIn->format != ADDR_FMT_INVALID) { UINT_32 expandX, expandY; // Don't care for this case ElemMode elemMode = ADDR_UNCOMPRESSED; // Get compression/expansion factors and element mode which indicates compression/expansion bpp = pElemLib->GetBitsPerPixel(pIn->format, &elemMode, &expandX, &expandY); UINT_32 basePitch = 0; GetElemLib()->AdjustSurfaceInfo(elemMode, expandX, expandY, &bpp, &basePitch, &width, &height); } // The output may get changed for volume(3D) texture resource in future pOut->resourceType = pIn->resourceType; } const UINT_32 numSlices = Max(pIn->numSlices, 1u); const UINT_32 numMipLevels = Max(pIn->numMipLevels, 1u); const BOOL_32 msaa = (numFrags > 1) || (numSamples > 1); const BOOL_32 displayRsrc = pIn->flags.display || pIn->flags.rotated; // Pre sanity check on non swizzle mode parameters ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = {}; localIn.flags = pIn->flags; localIn.resourceType = pOut->resourceType; localIn.format = pIn->format; localIn.bpp = bpp; localIn.width = width; localIn.height = height; localIn.numSlices = numSlices; localIn.numMipLevels = numMipLevels; localIn.numSamples = numSamples; localIn.numFrags = numFrags; if (ValidateNonSwModeParams(&localIn)) { // Forbid swizzle mode(s) by client setting ADDR2_SWMODE_SET allowedSwModeSet = {}; allowedSwModeSet.value |= pIn->forbiddenBlock.linear ? 0 : Gfx9LinearSwModeMask; allowedSwModeSet.value |= pIn->forbiddenBlock.micro ? 0 : Gfx9Blk256BSwModeMask; allowedSwModeSet.value |= pIn->forbiddenBlock.macroThin4KB ? 0 : ((pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx9Rsrc3dThin4KBSwModeMask : Gfx9Blk4KBSwModeMask); allowedSwModeSet.value |= pIn->forbiddenBlock.macroThick4KB ? 0 : ((pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx9Rsrc3dThick4KBSwModeMask : 0); allowedSwModeSet.value |= pIn->forbiddenBlock.macroThin64KB ? 0 : ((pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx9Rsrc3dThin64KBSwModeMask : Gfx9Blk64KBSwModeMask); allowedSwModeSet.value |= pIn->forbiddenBlock.macroThick64KB ? 0 : ((pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx9Rsrc3dThick64KBSwModeMask : 0); if (pIn->preferredSwSet.value != 0) { allowedSwModeSet.value &= pIn->preferredSwSet.sw_Z ? ~0 : ~Gfx9ZSwModeMask; allowedSwModeSet.value &= pIn->preferredSwSet.sw_S ? ~0 : ~Gfx9StandardSwModeMask; allowedSwModeSet.value &= pIn->preferredSwSet.sw_D ? ~0 : ~Gfx9DisplaySwModeMask; allowedSwModeSet.value &= pIn->preferredSwSet.sw_R ? ~0 : ~Gfx9RotateSwModeMask; } if (pIn->noXor) { allowedSwModeSet.value &= ~Gfx9XorSwModeMask; } if (pIn->maxAlign > 0) { if (pIn->maxAlign < Size64K) { allowedSwModeSet.value &= ~Gfx9Blk64KBSwModeMask; } if (pIn->maxAlign < Size4K) { allowedSwModeSet.value &= ~Gfx9Blk4KBSwModeMask; } if (pIn->maxAlign < Size256) { allowedSwModeSet.value &= ~Gfx9Blk256BSwModeMask; } } // Filter out invalid swizzle mode(s) by image attributes and HW restrictions switch (pOut->resourceType) { case ADDR_RSRC_TEX_1D: allowedSwModeSet.value &= Gfx9Rsrc1dSwModeMask; break; case ADDR_RSRC_TEX_2D: allowedSwModeSet.value &= pIn->flags.prt ? Gfx9Rsrc2dPrtSwModeMask : Gfx9Rsrc2dSwModeMask; if (bpp > 64) { allowedSwModeSet.value &= ~(Gfx9RotateSwModeMask | Gfx9ZSwModeMask); } break; case ADDR_RSRC_TEX_3D: allowedSwModeSet.value &= pIn->flags.prt ? Gfx9Rsrc3dPrtSwModeMask : Gfx9Rsrc3dSwModeMask; if ((numMipLevels > 1) && (numSlices >= width) && (numSlices >= height)) { // SW_*_D for 3D mipmaps (maxmip > 0) is only supported for Xmajor or Ymajor mipmap // When depth (Z) is the maximum dimension then must use one of the SW_*_S // or SW_*_Z modes if mipmapping is desired on a 3D surface allowedSwModeSet.value &= ~Gfx9DisplaySwModeMask; } if ((bpp == 128) && pIn->flags.color) { allowedSwModeSet.value &= ~Gfx9StandardSwModeMask; } if (pIn->flags.view3dAs2dArray) { allowedSwModeSet.value &= Gfx9Rsrc3dThinSwModeMask | Gfx9LinearSwModeMask; } break; default: ADDR_ASSERT_ALWAYS(); allowedSwModeSet.value = 0; break; } if (pIn->format == ADDR_FMT_32_32_32) { allowedSwModeSet.value &= Gfx9LinearSwModeMask; } if (ElemLib::IsBlockCompressed(pIn->format)) { if (pIn->flags.texture) { allowedSwModeSet.value &= Gfx9StandardSwModeMask | Gfx9DisplaySwModeMask; } else { allowedSwModeSet.value &= Gfx9StandardSwModeMask | Gfx9DisplaySwModeMask | Gfx9LinearSwModeMask; } } if (ElemLib::IsMacroPixelPacked(pIn->format) || (msaa && ((bpp > 32) || pIn->flags.color || pIn->flags.unordered))) { allowedSwModeSet.value &= ~Gfx9ZSwModeMask; } if (pIn->flags.fmask || pIn->flags.depth || pIn->flags.stencil) { allowedSwModeSet.value &= Gfx9ZSwModeMask; if (pIn->flags.noMetadata == FALSE) { if (pIn->flags.depth && pIn->flags.texture && (((bpp == 16) && (numFrags >= 4)) || ((bpp == 32) && (numFrags >= 2)))) { // When _X/_T swizzle mode was used for MSAA depth texture, TC will get zplane // equation from wrong address within memory range a tile covered and use the // garbage data for compressed Z reading which finally leads to corruption. allowedSwModeSet.value &= ~Gfx9XorSwModeMask; } if (m_settings.htileCacheRbConflict && (pIn->flags.depth || pIn->flags.stencil) && (numSlices > 1) && (pIn->flags.metaRbUnaligned == FALSE) && (pIn->flags.metaPipeUnaligned == FALSE)) { // Z_X 2D array with Rb/Pipe aligned HTile won't have metadata cache coherency allowedSwModeSet.value &= ~Gfx9XSwModeMask; } } } if (msaa) { allowedSwModeSet.value &= Gfx9MsaaSwModeMask; } if ((numFrags > 1) && (Size4K < (m_pipeInterleaveBytes * numFrags))) { // MSAA surface must have blk_bytes/pipe_interleave >= num_samples allowedSwModeSet.value &= Gfx9Blk64KBSwModeMask; } if (numMipLevels > 1) { allowedSwModeSet.value &= ~Gfx9Blk256BSwModeMask; } if (displayRsrc) { if (m_settings.isDce12) { allowedSwModeSet.value &= (bpp == 32) ? Dce12Bpp32SwModeMask : Dce12NonBpp32SwModeMask; } else if (m_settings.isDcn1) { allowedSwModeSet.value &= (bpp == 64) ? Dcn1Bpp64SwModeMask : Dcn1NonBpp64SwModeMask; } else { ADDR_NOT_IMPLEMENTED(); } } if (allowedSwModeSet.value != 0) { #if DEBUG // Post sanity check, at least AddrLib should accept the output generated by its own UINT_32 validateSwModeSet = allowedSwModeSet.value; for (UINT_32 i = 0; validateSwModeSet != 0; i++) { if (validateSwModeSet & 1) { localIn.swizzleMode = static_cast(i); ADDR_ASSERT(ValidateSwModeParams(&localIn)); } validateSwModeSet >>= 1; } #endif pOut->validSwModeSet = allowedSwModeSet; pOut->canXor = (allowedSwModeSet.value & Gfx9XorSwModeMask) ? TRUE : FALSE; pOut->validBlockSet = GetAllowedBlockSet(allowedSwModeSet, pOut->resourceType); pOut->validSwTypeSet = GetAllowedSwSet(allowedSwModeSet); pOut->clientPreferredSwSet = pIn->preferredSwSet; if (pOut->clientPreferredSwSet.value == 0) { pOut->clientPreferredSwSet.value = AddrSwSetAll; } // Apply optional restrictions if (pIn->flags.needEquation) { FilterInvalidEqSwizzleMode(allowedSwModeSet, pIn->resourceType, Log2(bpp >> 3)); } if (allowedSwModeSet.value == Gfx9LinearSwModeMask) { pOut->swizzleMode = ADDR_SW_LINEAR; } else { // Always ignore linear swizzle mode if there is other choice. allowedSwModeSet.swLinear = 0; ADDR2_BLOCK_SET allowedBlockSet = GetAllowedBlockSet(allowedSwModeSet, pOut->resourceType); // Determine block size if there is 2 or more block type candidates if (IsPow2(allowedBlockSet.value) == FALSE) { AddrSwizzleMode swMode[AddrBlockMaxTiledType] = { ADDR_SW_LINEAR }; swMode[AddrBlockMicro] = ADDR_SW_256B_D; swMode[AddrBlockThin4KB] = ADDR_SW_4KB_D; swMode[AddrBlockThin64KB] = ADDR_SW_64KB_D; if (pOut->resourceType == ADDR_RSRC_TEX_3D) { swMode[AddrBlockThick4KB] = ADDR_SW_4KB_S; swMode[AddrBlockThick64KB] = ADDR_SW_64KB_S; } Dim3d blkDim[AddrBlockMaxTiledType] = {{0}, {0}, {0}, {0}, {0}, {0}}; Dim3d padDim[AddrBlockMaxTiledType] = {{0}, {0}, {0}, {0}, {0}, {0}}; UINT_64 padSize[AddrBlockMaxTiledType] = {0}; const UINT_32 ratioLow = pIn->flags.minimizeAlign ? 1 : (pIn->flags.opt4space ? 3 : 2); const UINT_32 ratioHi = pIn->flags.minimizeAlign ? 1 : (pIn->flags.opt4space ? 2 : 1); const UINT_64 sizeAlignInElement = Max(NextPow2(pIn->minSizeAlign) / (bpp >> 3), 1u); UINT_32 minSizeBlk = AddrBlockMicro; UINT_64 minSize = 0; for (UINT_32 i = AddrBlockMicro; i < AddrBlockMaxTiledType; i++) { if (allowedBlockSet.value & (1 << i)) { ComputeBlockDimensionForSurf(&blkDim[i].w, &blkDim[i].h, &blkDim[i].d, bpp, numFrags, pOut->resourceType, swMode[i]); if (displayRsrc) { blkDim[i].w = PowTwoAlign(blkDim[i].w, 32); } padSize[i] = ComputePadSize(&blkDim[i], width, height, numSlices, &padDim[i]); padSize[i] = PowTwoAlign(padSize[i] * numFrags, sizeAlignInElement); if ((minSize == 0) || ((padSize[i] * ratioHi) <= (minSize * ratioLow))) { minSize = padSize[i]; minSizeBlk = i; } } } if ((allowedBlockSet.micro == TRUE) && (width <= blkDim[AddrBlockMicro].w) && (height <= blkDim[AddrBlockMicro].h) && (NextPow2(pIn->minSizeAlign) <= Size256)) { minSizeBlk = AddrBlockMicro; } if (minSizeBlk == AddrBlockMicro) { ADDR_ASSERT(pOut->resourceType != ADDR_RSRC_TEX_3D); allowedSwModeSet.value &= Gfx9Blk256BSwModeMask; } else if (minSizeBlk == AddrBlockThick4KB) { ADDR_ASSERT(pOut->resourceType == ADDR_RSRC_TEX_3D); allowedSwModeSet.value &= Gfx9Rsrc3dThick4KBSwModeMask; } else if (minSizeBlk == AddrBlockThin4KB) { allowedSwModeSet.value &= (pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx9Rsrc3dThin4KBSwModeMask : Gfx9Blk4KBSwModeMask; } else if (minSizeBlk == AddrBlockThick64KB) { ADDR_ASSERT(pOut->resourceType == ADDR_RSRC_TEX_3D); allowedSwModeSet.value &= Gfx9Rsrc3dThick64KBSwModeMask; } else { ADDR_ASSERT(minSizeBlk == AddrBlockThin64KB); allowedSwModeSet.value &= (pOut->resourceType == ADDR_RSRC_TEX_3D) ? Gfx9Rsrc3dThin64KBSwModeMask : Gfx9Blk64KBSwModeMask; } } // Block type should be determined. ADDR_ASSERT(IsPow2(GetAllowedBlockSet(allowedSwModeSet, pOut->resourceType).value)); ADDR2_SWTYPE_SET allowedSwSet = GetAllowedSwSet(allowedSwModeSet); // Determine swizzle type if there is 2 or more swizzle type candidates if (IsPow2(allowedSwSet.value) == FALSE) { if (ElemLib::IsBlockCompressed(pIn->format)) { if (allowedSwSet.sw_D) { allowedSwModeSet.value &= Gfx9DisplaySwModeMask; } else { ADDR_ASSERT(allowedSwSet.sw_S); allowedSwModeSet.value &= Gfx9StandardSwModeMask; } } else if (ElemLib::IsMacroPixelPacked(pIn->format)) { if (allowedSwSet.sw_S) { allowedSwModeSet.value &= Gfx9StandardSwModeMask; } else if (allowedSwSet.sw_D) { allowedSwModeSet.value &= Gfx9DisplaySwModeMask; } else { ADDR_ASSERT(allowedSwSet.sw_R); allowedSwModeSet.value &= Gfx9RotateSwModeMask; } } else if (pOut->resourceType == ADDR_RSRC_TEX_3D) { if (pIn->flags.color && allowedSwSet.sw_D) { allowedSwModeSet.value &= Gfx9DisplaySwModeMask; } else if (allowedSwSet.sw_Z) { allowedSwModeSet.value &= Gfx9ZSwModeMask; } else { ADDR_ASSERT(allowedSwSet.sw_S); allowedSwModeSet.value &= Gfx9StandardSwModeMask; } } else { if (pIn->flags.rotated && allowedSwSet.sw_R) { allowedSwModeSet.value &= Gfx9RotateSwModeMask; } else if (allowedSwSet.sw_D) { allowedSwModeSet.value &= Gfx9DisplaySwModeMask; } else if (allowedSwSet.sw_S) { allowedSwModeSet.value &= Gfx9StandardSwModeMask; } else { ADDR_ASSERT(allowedSwSet.sw_Z); allowedSwModeSet.value &= Gfx9ZSwModeMask; } } } // Swizzle type should be determined. ADDR_ASSERT(IsPow2(GetAllowedSwSet(allowedSwModeSet).value)); // Determine swizzle mode now. Always select the "largest" swizzle mode for a given block type + swizzle // type combination. For example, for AddrBlockThin64KB + ADDR_SW_S, select SW_64KB_S_X(25) if it's // available, or otherwise select SW_64KB_S_T(17) if it's available, or otherwise select SW_64KB_S(9). pOut->swizzleMode = static_cast(Log2NonPow2(allowedSwModeSet.value)); } returnCode = ADDR_OK; } else { // Invalid combination... ADDR_ASSERT_ALWAYS(); } } else { // Invalid combination... ADDR_ASSERT_ALWAYS(); } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::ComputeStereoInfo * * @brief * Compute height alignment and right eye pipeBankXor for stereo surface * * @return * Error code * ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::ComputeStereoInfo( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut, UINT_32* pHeightAlign ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; UINT_32 eqIndex = HwlGetEquationIndex(pIn, pOut); if (eqIndex < m_numEquations) { if (IsXor(pIn->swizzleMode)) { const UINT_32 blkSizeLog2 = GetBlockSizeLog2(pIn->swizzleMode); const UINT_32 numPipeBits = GetPipeXorBits(blkSizeLog2); const UINT_32 numBankBits = GetBankXorBits(blkSizeLog2); const UINT_32 bppLog2 = Log2(pIn->bpp >> 3); const UINT_32 maxYCoordBlock256 = Log2(Block256_2d[bppLog2].h) - 1; const ADDR_EQUATION *pEqToCheck = &m_equationTable[eqIndex]; ADDR_ASSERT(maxYCoordBlock256 == GetMaxValidChannelIndex(&pEqToCheck->addr[0], Log2Size256, 1)); const UINT_32 maxYCoordInBaseEquation = (blkSizeLog2 - Log2Size256) / 2 + maxYCoordBlock256; ADDR_ASSERT(maxYCoordInBaseEquation == GetMaxValidChannelIndex(&pEqToCheck->addr[0], blkSizeLog2, 1)); const UINT_32 maxYCoordInPipeXor = (numPipeBits == 0) ? 0 : maxYCoordBlock256 + numPipeBits; ADDR_ASSERT(maxYCoordInPipeXor == GetMaxValidChannelIndex(&pEqToCheck->xor1[m_pipeInterleaveLog2], numPipeBits, 1)); const UINT_32 maxYCoordInBankXor = (numBankBits == 0) ? 0 : maxYCoordBlock256 + (numPipeBits + 1) / 2 + numBankBits; ADDR_ASSERT(maxYCoordInBankXor == GetMaxValidChannelIndex(&pEqToCheck->xor1[m_pipeInterleaveLog2 + numPipeBits], numBankBits, 1)); const UINT_32 maxYCoordInPipeBankXor = Max(maxYCoordInPipeXor, maxYCoordInBankXor); if (maxYCoordInPipeBankXor > maxYCoordInBaseEquation) { *pHeightAlign = 1u << maxYCoordInPipeBankXor; if (pOut->pStereoInfo != NULL) { pOut->pStereoInfo->rightSwizzle = 0; if ((PowTwoAlign(pIn->height, *pHeightAlign) % (*pHeightAlign * 2)) != 0) { if (maxYCoordInPipeXor == maxYCoordInPipeBankXor) { pOut->pStereoInfo->rightSwizzle |= (1u << 1); } if (maxYCoordInBankXor == maxYCoordInPipeBankXor) { pOut->pStereoInfo->rightSwizzle |= 1u << ((numPipeBits % 2) ? numPipeBits : numPipeBits + 1); } ADDR_ASSERT(pOut->pStereoInfo->rightSwizzle == GetCoordActiveMask(&pEqToCheck->xor1[m_pipeInterleaveLog2], numPipeBits + numBankBits, 1, maxYCoordInPipeBankXor)); } } } } } else { ADDR_ASSERT_ALWAYS(); returnCode = ADDR_ERROR; } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeSurfaceInfoTiled * * @brief * Internal function to calculate alignment for tiled surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeSurfaceInfoTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ComputeBlockDimensionForSurf(&pOut->blockWidth, &pOut->blockHeight, &pOut->blockSlices, pIn->bpp, pIn->numFrags, pIn->resourceType, pIn->swizzleMode); if (returnCode == ADDR_OK) { UINT_32 pitchAlignInElement = pOut->blockWidth; if ((IsTex2d(pIn->resourceType) == TRUE) && (pIn->flags.display || pIn->flags.rotated) && (pIn->numMipLevels <= 1) && (pIn->numSamples <= 1) && (pIn->numFrags <= 1)) { // Display engine needs pitch align to be at least 32 pixels. pitchAlignInElement = PowTwoAlign(pitchAlignInElement, 32); } pOut->pitch = PowTwoAlign(pIn->width, pitchAlignInElement); if ((pIn->numMipLevels <= 1) && (pIn->pitchInElement > 0)) { if ((pIn->pitchInElement % pitchAlignInElement) != 0) { returnCode = ADDR_INVALIDPARAMS; } else if (pIn->pitchInElement < pOut->pitch) { returnCode = ADDR_INVALIDPARAMS; } else { pOut->pitch = pIn->pitchInElement; } } UINT_32 heightAlign = 0; if (pIn->flags.qbStereo) { returnCode = ComputeStereoInfo(pIn, pOut, &heightAlign); } if (returnCode == ADDR_OK) { pOut->height = PowTwoAlign(pIn->height, pOut->blockHeight); if (heightAlign > 1) { pOut->height = PowTwoAlign(pOut->height, heightAlign); } pOut->numSlices = PowTwoAlign(pIn->numSlices, pOut->blockSlices); pOut->epitchIsHeight = FALSE; pOut->mipChainInTail = FALSE; pOut->firstMipIdInTail = pIn->numMipLevels; pOut->mipChainPitch = pOut->pitch; pOut->mipChainHeight = pOut->height; pOut->mipChainSlice = pOut->numSlices; if (pIn->numMipLevels > 1) { pOut->firstMipIdInTail = GetMipChainInfo(pIn->resourceType, pIn->swizzleMode, pIn->bpp, pIn->width, pIn->height, pIn->numSlices, pOut->blockWidth, pOut->blockHeight, pOut->blockSlices, pIn->numMipLevels, pOut->pMipInfo); const UINT_32 endingMipId = Min(pOut->firstMipIdInTail, pIn->numMipLevels - 1); if (endingMipId == 0) { const Dim3d tailMaxDim = GetMipTailDim(pIn->resourceType, pIn->swizzleMode, pOut->blockWidth, pOut->blockHeight, pOut->blockSlices); pOut->epitchIsHeight = TRUE; pOut->pitch = tailMaxDim.w; pOut->height = tailMaxDim.h; pOut->numSlices = IsThick(pIn->resourceType, pIn->swizzleMode) ? tailMaxDim.d : pIn->numSlices; pOut->mipChainInTail = TRUE; } else { UINT_32 mip0WidthInBlk = pOut->pitch / pOut->blockWidth; UINT_32 mip0HeightInBlk = pOut->height / pOut->blockHeight; AddrMajorMode majorMode = GetMajorMode(pIn->resourceType, pIn->swizzleMode, mip0WidthInBlk, mip0HeightInBlk, pOut->numSlices / pOut->blockSlices); if (majorMode == ADDR_MAJOR_Y) { UINT_32 mip1WidthInBlk = RoundHalf(mip0WidthInBlk); if ((mip1WidthInBlk == 1) && (endingMipId > 2)) { mip1WidthInBlk++; } pOut->mipChainPitch += (mip1WidthInBlk * pOut->blockWidth); pOut->epitchIsHeight = FALSE; } else { UINT_32 mip1HeightInBlk = RoundHalf(mip0HeightInBlk); if ((mip1HeightInBlk == 1) && (endingMipId > 2)) { mip1HeightInBlk++; } pOut->mipChainHeight += (mip1HeightInBlk * pOut->blockHeight); pOut->epitchIsHeight = TRUE; } } if (pOut->pMipInfo != NULL) { UINT_32 elementBytesLog2 = Log2(pIn->bpp >> 3); for (UINT_32 i = 0; i < pIn->numMipLevels; i++) { Dim3d mipStartPos = {0}; UINT_32 mipTailOffsetInBytes = 0; mipStartPos = GetMipStartPos(pIn->resourceType, pIn->swizzleMode, pOut->pitch, pOut->height, pOut->numSlices, pOut->blockWidth, pOut->blockHeight, pOut->blockSlices, i, elementBytesLog2, &mipTailOffsetInBytes); UINT_32 pitchInBlock = pOut->mipChainPitch / pOut->blockWidth; UINT_32 sliceInBlock = (pOut->mipChainHeight / pOut->blockHeight) * pitchInBlock; UINT_64 blockIndex = mipStartPos.d * sliceInBlock + mipStartPos.h * pitchInBlock + mipStartPos.w; UINT_64 macroBlockOffset = blockIndex << GetBlockSizeLog2(pIn->swizzleMode); pOut->pMipInfo[i].macroBlockOffset = macroBlockOffset; pOut->pMipInfo[i].mipTailOffset = mipTailOffsetInBytes; } } } else if (pOut->pMipInfo != NULL) { pOut->pMipInfo[0].pitch = pOut->pitch; pOut->pMipInfo[0].height = pOut->height; pOut->pMipInfo[0].depth = IsTex3d(pIn->resourceType)? pOut->numSlices : 1; pOut->pMipInfo[0].offset = 0; } pOut->sliceSize = static_cast(pOut->mipChainPitch) * pOut->mipChainHeight * (pIn->bpp >> 3) * pIn->numFrags; pOut->surfSize = pOut->sliceSize * pOut->mipChainSlice; pOut->baseAlign = ComputeSurfaceBaseAlignTiled(pIn->swizzleMode); if ((IsBlock256b(pIn->swizzleMode) == FALSE) && (pIn->flags.color || pIn->flags.depth || pIn->flags.stencil || pIn->flags.fmask) && (pIn->flags.texture == TRUE) && (pIn->flags.noMetadata == FALSE) && (pIn->flags.metaPipeUnaligned == FALSE)) { // Assume client requires pipe aligned metadata, which is TcCompatible and will be accessed by TC... // Then we need extra padding for base surface. Otherwise, metadata and data surface for same pixel will // be flushed to different pipes, but texture engine only uses pipe id of data surface to fetch both of // them, which may cause invalid metadata to be fetched. pOut->baseAlign = Max(pOut->baseAlign, m_pipeInterleaveBytes * m_pipes * m_se); } if (pIn->flags.prt) { pOut->baseAlign = Max(pOut->baseAlign, PrtAlignment); } } } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeSurfaceInfoLinear * * @brief * Internal function to calculate alignment for linear surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeSurfaceInfoLinear( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; UINT_32 pitch = 0; UINT_32 actualHeight = 0; UINT_32 elementBytes = pIn->bpp >> 3; const UINT_32 alignment = pIn->flags.prt ? PrtAlignment : 256; if (IsTex1d(pIn->resourceType)) { if (pIn->height > 1) { returnCode = ADDR_INVALIDPARAMS; } else { const UINT_32 pitchAlignInElement = alignment / elementBytes; pitch = PowTwoAlign(pIn->width, pitchAlignInElement); actualHeight = pIn->numMipLevels; if (pIn->flags.prt == FALSE) { returnCode = ApplyCustomizedPitchHeight(pIn, elementBytes, pitchAlignInElement, &pitch, &actualHeight); } if (returnCode == ADDR_OK) { if (pOut->pMipInfo != NULL) { for (UINT_32 i = 0; i < pIn->numMipLevels; i++) { pOut->pMipInfo[i].offset = pitch * elementBytes * i; pOut->pMipInfo[i].pitch = pitch; pOut->pMipInfo[i].height = 1; pOut->pMipInfo[i].depth = 1; } } } } } else { returnCode = ComputeSurfaceLinearPadding(pIn, &pitch, &actualHeight, pOut->pMipInfo); } if ((pitch == 0) || (actualHeight == 0)) { returnCode = ADDR_INVALIDPARAMS; } if (returnCode == ADDR_OK) { pOut->pitch = pitch; pOut->height = pIn->height; pOut->numSlices = pIn->numSlices; pOut->mipChainPitch = pitch; pOut->mipChainHeight = actualHeight; pOut->mipChainSlice = pOut->numSlices; pOut->epitchIsHeight = (pIn->numMipLevels > 1) ? TRUE : FALSE; pOut->sliceSize = static_cast(pOut->pitch) * actualHeight * elementBytes; pOut->surfSize = pOut->sliceSize * pOut->numSlices; pOut->baseAlign = (pIn->swizzleMode == ADDR_SW_LINEAR_GENERAL) ? (pIn->bpp / 8) : alignment; pOut->blockWidth = (pIn->swizzleMode == ADDR_SW_LINEAR_GENERAL) ? 1 : (256 / elementBytes); pOut->blockHeight = 1; pOut->blockSlices = 1; } // Post calculation validate ADDR_ASSERT(pOut->sliceSize > 0); return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::GetMipChainInfo * * @brief * Internal function to get out information about mip chain * * @return * Smaller value between Id of first mip fitted in mip tail and max Id of mip being created ************************************************************************************************************************ */ UINT_32 Gfx9Lib::GetMipChainInfo( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 bpp, UINT_32 mip0Width, UINT_32 mip0Height, UINT_32 mip0Depth, UINT_32 blockWidth, UINT_32 blockHeight, UINT_32 blockDepth, UINT_32 numMipLevel, ADDR2_MIP_INFO* pMipInfo) const { const Dim3d tailMaxDim = GetMipTailDim(resourceType, swizzleMode, blockWidth, blockHeight, blockDepth); UINT_32 mipPitch = mip0Width; UINT_32 mipHeight = mip0Height; UINT_32 mipDepth = IsTex3d(resourceType) ? mip0Depth : 1; UINT_32 offset = 0; UINT_32 firstMipIdInTail = numMipLevel; BOOL_32 inTail = FALSE; BOOL_32 finalDim = FALSE; BOOL_32 is3dThick = IsThick(resourceType, swizzleMode); BOOL_32 is3dThin = IsTex3d(resourceType) && (is3dThick == FALSE); for (UINT_32 mipId = 0; mipId < numMipLevel; mipId++) { if (inTail) { if (finalDim == FALSE) { UINT_32 mipSize; if (is3dThick) { mipSize = mipPitch * mipHeight * mipDepth * (bpp >> 3); } else { mipSize = mipPitch * mipHeight * (bpp >> 3); } if (mipSize <= 256) { UINT_32 index = Log2(bpp >> 3); if (is3dThick) { mipPitch = Block256_3dZ[index].w; mipHeight = Block256_3dZ[index].h; mipDepth = Block256_3dZ[index].d; } else { mipPitch = Block256_2d[index].w; mipHeight = Block256_2d[index].h; } finalDim = TRUE; } } } else { inTail = IsInMipTail(resourceType, swizzleMode, tailMaxDim, mipPitch, mipHeight, mipDepth); if (inTail) { firstMipIdInTail = mipId; mipPitch = tailMaxDim.w; mipHeight = tailMaxDim.h; if (is3dThick) { mipDepth = tailMaxDim.d; } } else { mipPitch = PowTwoAlign(mipPitch, blockWidth); mipHeight = PowTwoAlign(mipHeight, blockHeight); if (is3dThick) { mipDepth = PowTwoAlign(mipDepth, blockDepth); } } } if (pMipInfo != NULL) { pMipInfo[mipId].pitch = mipPitch; pMipInfo[mipId].height = mipHeight; pMipInfo[mipId].depth = mipDepth; pMipInfo[mipId].offset = offset; } offset += (mipPitch * mipHeight * mipDepth * (bpp >> 3)); if (finalDim) { if (is3dThin) { mipDepth = Max(mipDepth >> 1, 1u); } } else { mipPitch = Max(mipPitch >> 1, 1u); mipHeight = Max(mipHeight >> 1, 1u); if (is3dThick || is3dThin) { mipDepth = Max(mipDepth >> 1, 1u); } } } return firstMipIdInTail; } /** ************************************************************************************************************************ * Gfx9Lib::GetMetaMiptailInfo * * @brief * Get mip tail coordinate information. * * @return * N/A ************************************************************************************************************************ */ VOID Gfx9Lib::GetMetaMiptailInfo( ADDR2_META_MIP_INFO* pInfo, ///< [out] output structure to store per mip coord Dim3d mipCoord, ///< [in] mip tail base coord UINT_32 numMipInTail, ///< [in] number of mips in tail Dim3d* pMetaBlkDim ///< [in] meta block width/height/depth ) const { BOOL_32 isThick = (pMetaBlkDim->d > 1); UINT_32 mipWidth = pMetaBlkDim->w; UINT_32 mipHeight = pMetaBlkDim->h >> 1; UINT_32 mipDepth = pMetaBlkDim->d; UINT_32 minInc; if (isThick) { minInc = (pMetaBlkDim->h >= 512) ? 128 : ((pMetaBlkDim->h == 256) ? 64 : 32); } else if (pMetaBlkDim->h >= 1024) { minInc = 256; } else if (pMetaBlkDim->h == 512) { minInc = 128; } else { minInc = 64; } UINT_32 blk32MipId = 0xFFFFFFFF; for (UINT_32 mip = 0; mip < numMipInTail; mip++) { pInfo[mip].inMiptail = TRUE; pInfo[mip].startX = mipCoord.w; pInfo[mip].startY = mipCoord.h; pInfo[mip].startZ = mipCoord.d; pInfo[mip].width = mipWidth; pInfo[mip].height = mipHeight; pInfo[mip].depth = mipDepth; if (mipWidth <= 32) { if (blk32MipId == 0xFFFFFFFF) { blk32MipId = mip; } mipCoord.w = pInfo[blk32MipId].startX; mipCoord.h = pInfo[blk32MipId].startY; mipCoord.d = pInfo[blk32MipId].startZ; switch (mip - blk32MipId) { case 0: mipCoord.w += 32; // 16x16 break; case 1: mipCoord.h += 32; // 8x8 break; case 2: mipCoord.h += 32; // 4x4 mipCoord.w += 16; break; case 3: mipCoord.h += 32; // 2x2 mipCoord.w += 32; break; case 4: mipCoord.h += 32; // 1x1 mipCoord.w += 48; break; // The following are for BC/ASTC formats case 5: mipCoord.h += 48; // 1/2 x 1/2 break; case 6: mipCoord.h += 48; // 1/4 x 1/4 mipCoord.w += 16; break; case 7: mipCoord.h += 48; // 1/8 x 1/8 mipCoord.w += 32; break; case 8: mipCoord.h += 48; // 1/16 x 1/16 mipCoord.w += 48; break; default: ADDR_ASSERT_ALWAYS(); break; } mipWidth = ((mip - blk32MipId) == 0) ? 16 : 8; mipHeight = mipWidth; if (isThick) { mipDepth = mipWidth; } } else { if (mipWidth <= minInc) { // if we're below the minimal increment... if (isThick) { // For 3d, just go in z direction mipCoord.d += mipDepth; } else { // For 2d, first go across, then down if ((mipWidth * 2) == minInc) { // if we're 2 mips below, that's when we go back in x, and down in y mipCoord.w -= minInc; mipCoord.h += minInc; } else { // otherwise, just go across in x mipCoord.w += minInc; } } } else { // On even mip, go down, otherwise, go across if (mip & 1) { mipCoord.w += mipWidth; } else { mipCoord.h += mipHeight; } } // Divide the width by 2 mipWidth >>= 1; // After the first mip in tail, the mip is always a square mipHeight = mipWidth; // ...or for 3d, a cube if (isThick) { mipDepth = mipWidth; } } } } /** ************************************************************************************************************************ * Gfx9Lib::GetMipStartPos * * @brief * Internal function to get out information about mip logical start position * * @return * logical start position in macro block width/heith/depth of one mip level within one slice ************************************************************************************************************************ */ Dim3d Gfx9Lib::GetMipStartPos( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 width, UINT_32 height, UINT_32 depth, UINT_32 blockWidth, UINT_32 blockHeight, UINT_32 blockDepth, UINT_32 mipId, UINT_32 log2ElementBytes, UINT_32* pMipTailBytesOffset) const { Dim3d mipStartPos = {0}; const Dim3d tailMaxDim = GetMipTailDim(resourceType, swizzleMode, blockWidth, blockHeight, blockDepth); // Report mip in tail if Mip0 is already in mip tail BOOL_32 inMipTail = IsInMipTail(resourceType, swizzleMode, tailMaxDim, width, height, depth); UINT_32 log2BlkSize = GetBlockSizeLog2(swizzleMode); UINT_32 mipIndexInTail = mipId; if (inMipTail == FALSE) { // Mip 0 dimension, unit in block UINT_32 mipWidthInBlk = width / blockWidth; UINT_32 mipHeightInBlk = height / blockHeight; UINT_32 mipDepthInBlk = depth / blockDepth; AddrMajorMode majorMode = GetMajorMode(resourceType, swizzleMode, mipWidthInBlk, mipHeightInBlk, mipDepthInBlk); UINT_32 endingMip = mipId + 1; for (UINT_32 i = 1; i <= mipId; i++) { if ((i == 1) || (i == 3)) { if (majorMode == ADDR_MAJOR_Y) { mipStartPos.w += mipWidthInBlk; } else { mipStartPos.h += mipHeightInBlk; } } else { if (majorMode == ADDR_MAJOR_X) { mipStartPos.w += mipWidthInBlk; } else if (majorMode == ADDR_MAJOR_Y) { mipStartPos.h += mipHeightInBlk; } else { mipStartPos.d += mipDepthInBlk; } } BOOL_32 inTail = FALSE; if (IsThick(resourceType, swizzleMode)) { UINT_32 dim = log2BlkSize % 3; if (dim == 0) { inTail = (mipWidthInBlk <= 2) && (mipHeightInBlk == 1) && (mipDepthInBlk <= 2); } else if (dim == 1) { inTail = (mipWidthInBlk == 1) && (mipHeightInBlk <= 2) && (mipDepthInBlk <= 2); } else { inTail = (mipWidthInBlk <= 2) && (mipHeightInBlk <= 2) && (mipDepthInBlk == 1); } } else { if (log2BlkSize & 1) { inTail = (mipWidthInBlk <= 2) && (mipHeightInBlk == 1); } else { inTail = (mipWidthInBlk == 1) && (mipHeightInBlk <= 2); } } if (inTail) { endingMip = i; break; } mipWidthInBlk = RoundHalf(mipWidthInBlk); mipHeightInBlk = RoundHalf(mipHeightInBlk); mipDepthInBlk = RoundHalf(mipDepthInBlk); } if (mipId >= endingMip) { inMipTail = TRUE; mipIndexInTail = mipId - endingMip; } } if (inMipTail) { UINT_32 index = mipIndexInTail + MaxMacroBits - log2BlkSize; ADDR_ASSERT(index < sizeof(MipTailOffset256B) / sizeof(UINT_32)); *pMipTailBytesOffset = MipTailOffset256B[index] << 8; } return mipStartPos; } /** ************************************************************************************************************************ * Gfx9Lib::HwlComputeSurfaceAddrFromCoordTiled * * @brief * Internal function to calculate address from coord for tiled swizzle surface * * @return * ADDR_E_RETURNCODE ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeSurfaceAddrFromCoordTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR2_COMPUTE_SURFACE_INFO_INPUT localIn = {0}; localIn.swizzleMode = pIn->swizzleMode; localIn.flags = pIn->flags; localIn.resourceType = pIn->resourceType; localIn.bpp = pIn->bpp; localIn.width = Max(pIn->unalignedWidth, 1u); localIn.height = Max(pIn->unalignedHeight, 1u); localIn.numSlices = Max(pIn->numSlices, 1u); localIn.numMipLevels = Max(pIn->numMipLevels, 1u); localIn.numSamples = Max(pIn->numSamples, 1u); localIn.numFrags = Max(pIn->numFrags, 1u); if (localIn.numMipLevels <= 1) { localIn.pitchInElement = pIn->pitchInElement; } ADDR2_COMPUTE_SURFACE_INFO_OUTPUT localOut = {0}; ADDR_E_RETURNCODE returnCode = ComputeSurfaceInfoTiled(&localIn, &localOut); BOOL_32 valid = (returnCode == ADDR_OK) && (IsThin(pIn->resourceType, pIn->swizzleMode) || IsThick(pIn->resourceType, pIn->swizzleMode)) && ((pIn->pipeBankXor == 0) || (IsXor(pIn->swizzleMode))); if (valid) { UINT_32 log2ElementBytes = Log2(pIn->bpp >> 3); Dim3d mipStartPos = {0}; UINT_32 mipTailBytesOffset = 0; if (pIn->numMipLevels > 1) { // Mip-map chain cannot be MSAA surface ADDR_ASSERT((pIn->numSamples <= 1) && (pIn->numFrags<= 1)); mipStartPos = GetMipStartPos(pIn->resourceType, pIn->swizzleMode, localOut.pitch, localOut.height, localOut.numSlices, localOut.blockWidth, localOut.blockHeight, localOut.blockSlices, pIn->mipId, log2ElementBytes, &mipTailBytesOffset); } UINT_32 interleaveOffset = 0; UINT_32 pipeBits = 0; UINT_32 pipeXor = 0; UINT_32 bankBits = 0; UINT_32 bankXor = 0; if (IsThin(pIn->resourceType, pIn->swizzleMode)) { UINT_32 blockOffset = 0; UINT_32 log2BlkSize = GetBlockSizeLog2(pIn->swizzleMode); if (IsZOrderSwizzle(pIn->swizzleMode)) { // Morton generation if ((log2ElementBytes == 0) || (log2ElementBytes == 2)) { UINT_32 totalLowBits = 6 - log2ElementBytes; UINT_32 mortBits = totalLowBits / 2; UINT_32 lowBitsValue = MortonGen2d(pIn->y, pIn->x, mortBits); // Are 9 bits enough? UINT_32 highBitsValue = MortonGen2d(pIn->x >> mortBits, pIn->y >> mortBits, 9) << totalLowBits; blockOffset = lowBitsValue | highBitsValue; ADDR_ASSERT(blockOffset == lowBitsValue + highBitsValue); } else { blockOffset = MortonGen2d(pIn->y, pIn->x, 13); } // Fill LSBs with sample bits if (pIn->numSamples > 1) { blockOffset *= pIn->numSamples; blockOffset |= pIn->sample; } // Shift according to BytesPP blockOffset <<= log2ElementBytes; } else { // Micro block offset UINT_32 microBlockOffset = ComputeSurface2DMicroBlockOffset(pIn); blockOffset = microBlockOffset; // Micro block dimension ADDR_ASSERT(log2ElementBytes < MaxNumOfBpp); Dim2d microBlockDim = Block256_2d[log2ElementBytes]; // Morton generation, does 12 bit enough? blockOffset |= MortonGen2d((pIn->x / microBlockDim.w), (pIn->y / microBlockDim.h), 12) << 8; // Sample bits start location UINT_32 sampleStart = log2BlkSize - Log2(pIn->numSamples); // Join sample bits information to the highest Macro block bits if (IsNonPrtXor(pIn->swizzleMode)) { // Non-prt-Xor : xor highest Macro block bits with sample bits blockOffset = blockOffset ^ (pIn->sample << sampleStart); } else { // Non-Xor or prt-Xor: replace highest Macro block bits with sample bits // after this op, the blockOffset only contains log2 Macro block size bits blockOffset %= (1 << sampleStart); blockOffset |= (pIn->sample << sampleStart); ADDR_ASSERT((blockOffset >> log2BlkSize) == 0); } } if (IsXor(pIn->swizzleMode)) { // Mask off bits above Macro block bits to keep page synonyms working for prt if (IsPrt(pIn->swizzleMode)) { blockOffset &= ((1 << log2BlkSize) - 1); } // Preserve offset inside pipe interleave interleaveOffset = blockOffset & ((1 << m_pipeInterleaveLog2) - 1); blockOffset >>= m_pipeInterleaveLog2; // Pipe/Se xor bits pipeBits = GetPipeXorBits(log2BlkSize); // Pipe xor pipeXor = FoldXor2d(blockOffset, pipeBits); blockOffset >>= pipeBits; // Bank xor bits bankBits = GetBankXorBits(log2BlkSize); // Bank Xor bankXor = FoldXor2d(blockOffset, bankBits); blockOffset >>= bankBits; // Put all the part back together blockOffset <<= bankBits; blockOffset |= bankXor; blockOffset <<= pipeBits; blockOffset |= pipeXor; blockOffset <<= m_pipeInterleaveLog2; blockOffset |= interleaveOffset; } ADDR_ASSERT((blockOffset | mipTailBytesOffset) == (blockOffset + mipTailBytesOffset)); ADDR_ASSERT((mipTailBytesOffset == 0u) || (blockOffset < (1u << log2BlkSize))); blockOffset |= mipTailBytesOffset; if (IsNonPrtXor(pIn->swizzleMode) && (pIn->numSamples <= 1)) { // Apply slice xor if not MSAA/PRT blockOffset ^= (ReverseBitVector(pIn->slice, pipeBits) << m_pipeInterleaveLog2); blockOffset ^= (ReverseBitVector(pIn->slice >> pipeBits, bankBits) << (m_pipeInterleaveLog2 + pipeBits)); } returnCode = ApplyCustomerPipeBankXor(pIn->swizzleMode, pIn->pipeBankXor, bankBits, pipeBits, &blockOffset); blockOffset %= (1 << log2BlkSize); UINT_32 pitchInMacroBlock = localOut.mipChainPitch / localOut.blockWidth; UINT_32 paddedHeightInMacroBlock = localOut.mipChainHeight / localOut.blockHeight; UINT_32 sliceSizeInMacroBlock = pitchInMacroBlock * paddedHeightInMacroBlock; UINT_64 macroBlockIndex = (pIn->slice + mipStartPos.d) * sliceSizeInMacroBlock + ((pIn->y / localOut.blockHeight) + mipStartPos.h) * pitchInMacroBlock + ((pIn->x / localOut.blockWidth) + mipStartPos.w); pOut->addr = blockOffset | (macroBlockIndex << log2BlkSize); } else { UINT_32 log2BlkSize = GetBlockSizeLog2(pIn->swizzleMode); Dim3d microBlockDim = Block1K_3d[log2ElementBytes]; UINT_32 blockOffset = MortonGen3d((pIn->x / microBlockDim.w), (pIn->y / microBlockDim.h), (pIn->slice / microBlockDim.d), 8); blockOffset <<= 10; blockOffset |= ComputeSurface3DMicroBlockOffset(pIn); if (IsXor(pIn->swizzleMode)) { // Mask off bits above Macro block bits to keep page synonyms working for prt if (IsPrt(pIn->swizzleMode)) { blockOffset &= ((1 << log2BlkSize) - 1); } // Preserve offset inside pipe interleave interleaveOffset = blockOffset & ((1 << m_pipeInterleaveLog2) - 1); blockOffset >>= m_pipeInterleaveLog2; // Pipe/Se xor bits pipeBits = GetPipeXorBits(log2BlkSize); // Pipe xor pipeXor = FoldXor3d(blockOffset, pipeBits); blockOffset >>= pipeBits; // Bank xor bits bankBits = GetBankXorBits(log2BlkSize); // Bank Xor bankXor = FoldXor3d(blockOffset, bankBits); blockOffset >>= bankBits; // Put all the part back together blockOffset <<= bankBits; blockOffset |= bankXor; blockOffset <<= pipeBits; blockOffset |= pipeXor; blockOffset <<= m_pipeInterleaveLog2; blockOffset |= interleaveOffset; } ADDR_ASSERT((blockOffset | mipTailBytesOffset) == (blockOffset + mipTailBytesOffset)); ADDR_ASSERT((mipTailBytesOffset == 0u) || (blockOffset < (1u << log2BlkSize))); blockOffset |= mipTailBytesOffset; returnCode = ApplyCustomerPipeBankXor(pIn->swizzleMode, pIn->pipeBankXor, bankBits, pipeBits, &blockOffset); blockOffset %= (1 << log2BlkSize); UINT_32 xb = pIn->x / localOut.blockWidth + mipStartPos.w; UINT_32 yb = pIn->y / localOut.blockHeight + mipStartPos.h; UINT_32 zb = pIn->slice / localOut.blockSlices + + mipStartPos.d; UINT_32 pitchInBlock = localOut.mipChainPitch / localOut.blockWidth; UINT_32 sliceSizeInBlock = (localOut.mipChainHeight / localOut.blockHeight) * pitchInBlock; UINT_64 blockIndex = zb * sliceSizeInBlock + yb * pitchInBlock + xb; pOut->addr = blockOffset | (blockIndex << log2BlkSize); } } else { returnCode = ADDR_INVALIDPARAMS; } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::ComputeSurfaceInfoLinear * * @brief * Internal function to calculate padding for linear swizzle 2D/3D surface * * @return * N/A ************************************************************************************************************************ */ ADDR_E_RETURNCODE Gfx9Lib::ComputeSurfaceLinearPadding( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input srtucture UINT_32* pMipmap0PaddedWidth, ///< [out] padded width in element UINT_32* pSlice0PaddedHeight, ///< [out] padded height for HW ADDR2_MIP_INFO* pMipInfo ///< [out] per mip information ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; UINT_32 elementBytes = pIn->bpp >> 3; UINT_32 pitchAlignInElement = 0; if (pIn->swizzleMode == ADDR_SW_LINEAR_GENERAL) { ADDR_ASSERT(pIn->numMipLevels <= 1); ADDR_ASSERT(pIn->numSlices <= 1); pitchAlignInElement = 1; } else { pitchAlignInElement = (256 / elementBytes); } UINT_32 mipChainWidth = PowTwoAlign(pIn->width, pitchAlignInElement); UINT_32 slice0PaddedHeight = pIn->height; returnCode = ApplyCustomizedPitchHeight(pIn, elementBytes, pitchAlignInElement, &mipChainWidth, &slice0PaddedHeight); if (returnCode == ADDR_OK) { UINT_32 mipChainHeight = 0; UINT_32 mipHeight = pIn->height; UINT_32 mipDepth = (pIn->resourceType == ADDR_RSRC_TEX_3D) ? pIn->numSlices : 1; for (UINT_32 i = 0; i < pIn->numMipLevels; i++) { if (pMipInfo != NULL) { pMipInfo[i].offset = mipChainWidth * mipChainHeight * elementBytes; pMipInfo[i].pitch = mipChainWidth; pMipInfo[i].height = mipHeight; pMipInfo[i].depth = mipDepth; } mipChainHeight += mipHeight; mipHeight = RoundHalf(mipHeight); mipHeight = Max(mipHeight, 1u); } *pMipmap0PaddedWidth = mipChainWidth; *pSlice0PaddedHeight = (pIn->numMipLevels > 1) ? mipChainHeight : slice0PaddedHeight; } return returnCode; } /** ************************************************************************************************************************ * Gfx9Lib::ComputeThinBlockDimension * * @brief * Internal function to get thin block width/height/depth in element from surface input params. * * @return * N/A ************************************************************************************************************************ */ VOID Gfx9Lib::ComputeThinBlockDimension( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, UINT_32 numSamples, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { ADDR_ASSERT(IsThin(resourceType, swizzleMode)); const UINT_32 log2BlkSize = GetBlockSizeLog2(swizzleMode); const UINT_32 eleBytes = bpp >> 3; const UINT_32 microBlockSizeTableIndex = Log2(eleBytes); const UINT_32 log2blkSizeIn256B = log2BlkSize - 8; const UINT_32 widthAmp = log2blkSizeIn256B / 2; const UINT_32 heightAmp = log2blkSizeIn256B - widthAmp; ADDR_ASSERT(microBlockSizeTableIndex < sizeof(Block256_2d) / sizeof(Block256_2d[0])); *pWidth = (Block256_2d[microBlockSizeTableIndex].w << widthAmp); *pHeight = (Block256_2d[microBlockSizeTableIndex].h << heightAmp); *pDepth = 1; if (numSamples > 1) { const UINT_32 log2sample = Log2(numSamples); const UINT_32 q = log2sample >> 1; const UINT_32 r = log2sample & 1; if (log2BlkSize & 1) { *pWidth >>= q; *pHeight >>= (q + r); } else { *pWidth >>= (q + r); *pHeight >>= q; } } } } // V2 } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/gfx9/gfx9addrlib.h000066400000000000000000000614621420110115200241220ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** ************************************************************************************************************************ * @file gfx9addrlib.h * @brief Contgfx9ns the Gfx9Lib class definition. ************************************************************************************************************************ */ #ifndef __GFX9_ADDR_LIB_H__ #define __GFX9_ADDR_LIB_H__ #include "addrlib2.h" #include "coord.h" namespace rocr { namespace Addr { namespace V2 { /** ************************************************************************************************************************ * @brief GFX9 specific settings structure. ************************************************************************************************************************ */ struct Gfx9ChipSettings { struct { // Asic/Generation name UINT_32 isArcticIsland : 1; UINT_32 isVega10 : 1; UINT_32 isRaven : 1; UINT_32 isVega12 : 1; UINT_32 isVega20 : 1; UINT_32 reserved0 : 27; // Display engine IP version name UINT_32 isDce12 : 1; UINT_32 isDcn1 : 1; UINT_32 reserved1 : 30; // Misc configuration bits UINT_32 metaBaseAlignFix : 1; UINT_32 depthPipeXorDisable : 1; UINT_32 htileAlignFix : 1; UINT_32 applyAliasFix : 1; UINT_32 htileCacheRbConflict: 1; UINT_32 reserved2 : 27; }; }; /** ************************************************************************************************************************ * @brief GFX9 data surface type. ************************************************************************************************************************ */ enum Gfx9DataType { Gfx9DataColor, Gfx9DataDepthStencil, Gfx9DataFmask }; const UINT_32 Gfx9LinearSwModeMask = (1u << ADDR_SW_LINEAR); const UINT_32 Gfx9Blk256BSwModeMask = (1u << ADDR_SW_256B_S) | (1u << ADDR_SW_256B_D) | (1u << ADDR_SW_256B_R); const UINT_32 Gfx9Blk4KBSwModeMask = (1u << ADDR_SW_4KB_Z) | (1u << ADDR_SW_4KB_S) | (1u << ADDR_SW_4KB_D) | (1u << ADDR_SW_4KB_R) | (1u << ADDR_SW_4KB_Z_X) | (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_4KB_D_X) | (1u << ADDR_SW_4KB_R_X); const UINT_32 Gfx9Blk64KBSwModeMask = (1u << ADDR_SW_64KB_Z) | (1u << ADDR_SW_64KB_S) | (1u << ADDR_SW_64KB_D) | (1u << ADDR_SW_64KB_R) | (1u << ADDR_SW_64KB_Z_T) | (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_64KB_D_T) | (1u << ADDR_SW_64KB_R_T) | (1u << ADDR_SW_64KB_Z_X) | (1u << ADDR_SW_64KB_S_X) | (1u << ADDR_SW_64KB_D_X) | (1u << ADDR_SW_64KB_R_X); const UINT_32 Gfx9ZSwModeMask = (1u << ADDR_SW_4KB_Z) | (1u << ADDR_SW_64KB_Z) | (1u << ADDR_SW_64KB_Z_T) | (1u << ADDR_SW_4KB_Z_X) | (1u << ADDR_SW_64KB_Z_X); const UINT_32 Gfx9StandardSwModeMask = (1u << ADDR_SW_256B_S) | (1u << ADDR_SW_4KB_S) | (1u << ADDR_SW_64KB_S) | (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_64KB_S_X); const UINT_32 Gfx9DisplaySwModeMask = (1u << ADDR_SW_256B_D) | (1u << ADDR_SW_4KB_D) | (1u << ADDR_SW_64KB_D) | (1u << ADDR_SW_64KB_D_T) | (1u << ADDR_SW_4KB_D_X) | (1u << ADDR_SW_64KB_D_X); const UINT_32 Gfx9RotateSwModeMask = (1u << ADDR_SW_256B_R) | (1u << ADDR_SW_4KB_R) | (1u << ADDR_SW_64KB_R) | (1u << ADDR_SW_64KB_R_T) | (1u << ADDR_SW_4KB_R_X) | (1u << ADDR_SW_64KB_R_X); const UINT_32 Gfx9XSwModeMask = (1u << ADDR_SW_4KB_Z_X) | (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_4KB_D_X) | (1u << ADDR_SW_4KB_R_X) | (1u << ADDR_SW_64KB_Z_X) | (1u << ADDR_SW_64KB_S_X) | (1u << ADDR_SW_64KB_D_X) | (1u << ADDR_SW_64KB_R_X); const UINT_32 Gfx9TSwModeMask = (1u << ADDR_SW_64KB_Z_T) | (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_64KB_D_T) | (1u << ADDR_SW_64KB_R_T); const UINT_32 Gfx9XorSwModeMask = Gfx9XSwModeMask | Gfx9TSwModeMask; const UINT_32 Gfx9AllSwModeMask = Gfx9LinearSwModeMask | Gfx9ZSwModeMask | Gfx9StandardSwModeMask | Gfx9DisplaySwModeMask | Gfx9RotateSwModeMask; const UINT_32 Gfx9Rsrc1dSwModeMask = Gfx9LinearSwModeMask; const UINT_32 Gfx9Rsrc2dSwModeMask = Gfx9AllSwModeMask; const UINT_32 Gfx9Rsrc3dSwModeMask = Gfx9AllSwModeMask & ~Gfx9Blk256BSwModeMask & ~Gfx9RotateSwModeMask; const UINT_32 Gfx9Rsrc2dPrtSwModeMask = (Gfx9Blk4KBSwModeMask | Gfx9Blk64KBSwModeMask) & ~Gfx9XSwModeMask; const UINT_32 Gfx9Rsrc3dPrtSwModeMask = Gfx9Rsrc2dPrtSwModeMask & ~Gfx9RotateSwModeMask & ~Gfx9DisplaySwModeMask; const UINT_32 Gfx9Rsrc3dThinSwModeMask = Gfx9DisplaySwModeMask & ~Gfx9Blk256BSwModeMask; const UINT_32 Gfx9Rsrc3dThin4KBSwModeMask = Gfx9Rsrc3dThinSwModeMask & Gfx9Blk4KBSwModeMask; const UINT_32 Gfx9Rsrc3dThin64KBSwModeMask = Gfx9Rsrc3dThinSwModeMask & Gfx9Blk64KBSwModeMask; const UINT_32 Gfx9Rsrc3dThickSwModeMask = Gfx9Rsrc3dSwModeMask & ~(Gfx9Rsrc3dThinSwModeMask | Gfx9LinearSwModeMask); const UINT_32 Gfx9Rsrc3dThick4KBSwModeMask = Gfx9Rsrc3dThickSwModeMask & Gfx9Blk4KBSwModeMask; const UINT_32 Gfx9Rsrc3dThick64KBSwModeMask = Gfx9Rsrc3dThickSwModeMask & Gfx9Blk64KBSwModeMask; const UINT_32 Gfx9MsaaSwModeMask = Gfx9AllSwModeMask & ~Gfx9Blk256BSwModeMask & ~Gfx9LinearSwModeMask; const UINT_32 Dce12NonBpp32SwModeMask = (1u << ADDR_SW_LINEAR) | (1u << ADDR_SW_4KB_D) | (1u << ADDR_SW_4KB_R) | (1u << ADDR_SW_64KB_D) | (1u << ADDR_SW_64KB_R) | (1u << ADDR_SW_4KB_D_X) | (1u << ADDR_SW_4KB_R_X) | (1u << ADDR_SW_64KB_D_X) | (1u << ADDR_SW_64KB_R_X); const UINT_32 Dce12Bpp32SwModeMask = (1u << ADDR_SW_256B_D) | (1u << ADDR_SW_256B_R) | Dce12NonBpp32SwModeMask; const UINT_32 Dcn1NonBpp64SwModeMask = (1u << ADDR_SW_LINEAR) | (1u << ADDR_SW_4KB_S) | (1u << ADDR_SW_64KB_S) | (1u << ADDR_SW_64KB_S_T) | (1u << ADDR_SW_4KB_S_X) | (1u << ADDR_SW_64KB_S_X); const UINT_32 Dcn1Bpp64SwModeMask = (1u << ADDR_SW_4KB_D) | (1u << ADDR_SW_64KB_D) | (1u << ADDR_SW_64KB_D_T) | (1u << ADDR_SW_4KB_D_X) | (1u << ADDR_SW_64KB_D_X) | Dcn1NonBpp64SwModeMask; /** ************************************************************************************************************************ * @brief GFX9 meta equation parameters ************************************************************************************************************************ */ struct MetaEqParams { UINT_32 maxMip; UINT_32 elementBytesLog2; UINT_32 numSamplesLog2; ADDR2_META_FLAGS metaFlag; Gfx9DataType dataSurfaceType; AddrSwizzleMode swizzleMode; AddrResourceType resourceType; UINT_32 metaBlkWidthLog2; UINT_32 metaBlkHeightLog2; UINT_32 metaBlkDepthLog2; UINT_32 compBlkWidthLog2; UINT_32 compBlkHeightLog2; UINT_32 compBlkDepthLog2; }; /** ************************************************************************************************************************ * @brief This class is the GFX9 specific address library * function set. ************************************************************************************************************************ */ class Gfx9Lib : public Lib { public: /// Creates Gfx9Lib object static Addr::Lib* CreateObj(const Client* pClient) { VOID* pMem = Object::ClientAlloc(sizeof(Gfx9Lib), pClient); return (pMem != NULL) ? new (pMem) Gfx9Lib(pClient) : NULL; } protected: Gfx9Lib(const Client* pClient); virtual ~Gfx9Lib(); virtual BOOL_32 HwlIsStandardSwizzle( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return m_swizzleModeTable[swizzleMode].isStd || (IsTex3d(resourceType) && m_swizzleModeTable[swizzleMode].isDisp); } virtual BOOL_32 HwlIsDisplaySwizzle( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return IsTex2d(resourceType) && m_swizzleModeTable[swizzleMode].isDisp; } virtual BOOL_32 HwlIsThin( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return ((IsTex2d(resourceType) == TRUE) || ((IsTex3d(resourceType) == TRUE) && (m_swizzleModeTable[swizzleMode].isZ == FALSE) && (m_swizzleModeTable[swizzleMode].isStd == FALSE))); } virtual BOOL_32 HwlIsThick( AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const { return (IsTex3d(resourceType) && (m_swizzleModeTable[swizzleMode].isZ || m_swizzleModeTable[swizzleMode].isStd)); } virtual ADDR_E_RETURNCODE HwlComputeHtileInfo( const ADDR2_COMPUTE_HTILE_INFO_INPUT* pIn, ADDR2_COMPUTE_HTILE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeCmaskInfo( const ADDR2_COMPUTE_CMASK_INFO_INPUT* pIn, ADDR2_COMPUTE_CMASK_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeDccInfo( const ADDR2_COMPUTE_DCCINFO_INPUT* pIn, ADDR2_COMPUTE_DCCINFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeCmaskAddrFromCoord( const ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut); virtual ADDR_E_RETURNCODE HwlComputeHtileAddrFromCoord( const ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut); virtual ADDR_E_RETURNCODE HwlComputeHtileCoordFromAddr( const ADDR2_COMPUTE_HTILE_COORDFROMADDR_INPUT* pIn, ADDR2_COMPUTE_HTILE_COORDFROMADDR_OUTPUT* pOut); virtual ADDR_E_RETURNCODE HwlComputeDccAddrFromCoord( const ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT* pOut); virtual UINT_32 HwlGetEquationIndex( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeBlock256Equation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const; virtual ADDR_E_RETURNCODE HwlComputeThinEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const; virtual ADDR_E_RETURNCODE HwlComputeThickEquation( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2, ADDR_EQUATION* pEquation) const; // Get equation table pointer and number of equations virtual UINT_32 HwlGetEquationTableInfo(const ADDR_EQUATION** ppEquationTable) const { *ppEquationTable = m_equationTable; return m_numEquations; } virtual BOOL_32 IsEquationSupported( AddrResourceType rsrcType, AddrSwizzleMode swMode, UINT_32 elementBytesLog2) const; virtual ADDR_E_RETURNCODE HwlComputePipeBankXor( const ADDR2_COMPUTE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_PIPEBANKXOR_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSlicePipeBankXor( const ADDR2_COMPUTE_SLICE_PIPEBANKXOR_INPUT* pIn, ADDR2_COMPUTE_SLICE_PIPEBANKXOR_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSubResourceOffsetForSwizzlePattern( const ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_INPUT* pIn, ADDR2_COMPUTE_SUBRESOURCE_OFFSET_FORSWIZZLEPATTERN_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlGetPreferredSurfaceSetting( const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT* pIn, ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoSanityCheck( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoTiled( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfoLinear( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceAddrFromCoordTiled( const ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR2_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; virtual UINT_32 HwlComputeMaxBaseAlignments() const; virtual UINT_32 HwlComputeMaxMetaBaseAlignments() const; virtual BOOL_32 HwlInitGlobalParams(const ADDR_CREATE_INPUT* pCreateIn); virtual ChipFamily HwlConvertChipFamily(UINT_32 uChipFamily, UINT_32 uChipRevision); virtual VOID ComputeThinBlockDimension( UINT_32* pWidth, UINT_32* pHeight, UINT_32* pDepth, UINT_32 bpp, UINT_32 numSamples, AddrResourceType resourceType, AddrSwizzleMode swizzleMode) const; private: VOID GetRbEquation(CoordEq* pRbEq, UINT_32 rbPerSeLog2, UINT_32 seLog2) const; VOID GetDataEquation(CoordEq* pDataEq, Gfx9DataType dataSurfaceType, AddrSwizzleMode swizzleMode, AddrResourceType resourceType, UINT_32 elementBytesLog2, UINT_32 numSamplesLog2) const; VOID GetPipeEquation(CoordEq* pPipeEq, CoordEq* pDataEq, UINT_32 pipeInterleaveLog2, UINT_32 numPipesLog2, UINT_32 numSamplesLog2, Gfx9DataType dataSurfaceType, AddrSwizzleMode swizzleMode, AddrResourceType resourceType) const; VOID GenMetaEquation(CoordEq* pMetaEq, UINT_32 maxMip, UINT_32 elementBytesLog2, UINT_32 numSamplesLog2, ADDR2_META_FLAGS metaFlag, Gfx9DataType dataSurfaceType, AddrSwizzleMode swizzleMode, AddrResourceType resourceType, UINT_32 metaBlkWidthLog2, UINT_32 metaBlkHeightLog2, UINT_32 metaBlkDepthLog2, UINT_32 compBlkWidthLog2, UINT_32 compBlkHeightLog2, UINT_32 compBlkDepthLog2) const; const CoordEq* GetMetaEquation(const MetaEqParams& metaEqParams); VOID GetMetaMipInfo(UINT_32 numMipLevels, Dim3d* pMetaBlkDim, BOOL_32 dataThick, ADDR2_META_MIP_INFO* pInfo, UINT_32 mip0Width, UINT_32 mip0Height, UINT_32 mip0Depth, UINT_32* pNumMetaBlkX, UINT_32* pNumMetaBlkY, UINT_32* pNumMetaBlkZ) const; BOOL_32 IsValidDisplaySwizzleMode(const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; ADDR_E_RETURNCODE ComputeSurfaceLinearPadding( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32* pMipmap0PaddedWidth, UINT_32* pSlice0PaddedHeight, ADDR2_MIP_INFO* pMipInfo = NULL) const; static ADDR2_BLOCK_SET GetAllowedBlockSet(ADDR2_SWMODE_SET allowedSwModeSet, AddrResourceType rsrcType) { ADDR2_BLOCK_SET allowedBlockSet = {}; allowedBlockSet.micro = (allowedSwModeSet.value & Gfx9Blk256BSwModeMask) ? TRUE : FALSE; allowedBlockSet.linear = (allowedSwModeSet.value & Gfx9LinearSwModeMask) ? TRUE : FALSE; if (rsrcType == ADDR_RSRC_TEX_3D) { allowedBlockSet.macroThin4KB = (allowedSwModeSet.value & Gfx9Rsrc3dThin4KBSwModeMask) ? TRUE : FALSE; allowedBlockSet.macroThick4KB = (allowedSwModeSet.value & Gfx9Rsrc3dThick4KBSwModeMask) ? TRUE : FALSE; allowedBlockSet.macroThin64KB = (allowedSwModeSet.value & Gfx9Rsrc3dThin64KBSwModeMask) ? TRUE : FALSE; allowedBlockSet.macroThick64KB = (allowedSwModeSet.value & Gfx9Rsrc3dThick64KBSwModeMask) ? TRUE : FALSE; } else { allowedBlockSet.macroThin4KB = (allowedSwModeSet.value & Gfx9Blk4KBSwModeMask) ? TRUE : FALSE; allowedBlockSet.macroThin64KB = (allowedSwModeSet.value & Gfx9Blk64KBSwModeMask) ? TRUE : FALSE; } return allowedBlockSet; } static ADDR2_SWTYPE_SET GetAllowedSwSet(ADDR2_SWMODE_SET allowedSwModeSet) { ADDR2_SWTYPE_SET allowedSwSet = {}; allowedSwSet.sw_Z = (allowedSwModeSet.value & Gfx9ZSwModeMask) ? TRUE : FALSE; allowedSwSet.sw_S = (allowedSwModeSet.value & Gfx9StandardSwModeMask) ? TRUE : FALSE; allowedSwSet.sw_D = (allowedSwModeSet.value & Gfx9DisplaySwModeMask) ? TRUE : FALSE; allowedSwSet.sw_R = (allowedSwModeSet.value & Gfx9RotateSwModeMask) ? TRUE : FALSE; return allowedSwSet; } BOOL_32 IsInMipTail( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, Dim3d mipTailDim, UINT_32 width, UINT_32 height, UINT_32 depth) const { BOOL_32 inTail = ((width <= mipTailDim.w) && (height <= mipTailDim.h) && (IsThin(resourceType, swizzleMode) || (depth <= mipTailDim.d))); return inTail; } BOOL_32 ValidateNonSwModeParams(const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; BOOL_32 ValidateSwModeParams(const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn) const; UINT_32 GetBankXorBits(UINT_32 macroBlockBits) const { UINT_32 pipeBits = GetPipeXorBits(macroBlockBits); // Bank xor bits UINT_32 bankBits = Min(macroBlockBits - pipeBits - m_pipeInterleaveLog2, m_banksLog2); return bankBits; } UINT_32 ComputeSurfaceBaseAlignTiled(AddrSwizzleMode swizzleMode) const { UINT_32 baseAlign; if (IsXor(swizzleMode)) { baseAlign = GetBlockSize(swizzleMode); } else { baseAlign = 256; } return baseAlign; } // Initialize equation table VOID InitEquationTable(); ADDR_E_RETURNCODE ComputeStereoInfo( const ADDR2_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT* pOut, UINT_32* pHeightAlign) const; UINT_32 GetMipChainInfo( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 bpp, UINT_32 mip0Width, UINT_32 mip0Height, UINT_32 mip0Depth, UINT_32 blockWidth, UINT_32 blockHeight, UINT_32 blockDepth, UINT_32 numMipLevel, ADDR2_MIP_INFO* pMipInfo) const; VOID GetMetaMiptailInfo( ADDR2_META_MIP_INFO* pInfo, Dim3d mipCoord, UINT_32 numMipInTail, Dim3d* pMetaBlkDim) const; Dim3d GetMipStartPos( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 width, UINT_32 height, UINT_32 depth, UINT_32 blockWidth, UINT_32 blockHeight, UINT_32 blockDepth, UINT_32 mipId, UINT_32 log2ElementBytes, UINT_32* pMipTailBytesOffset) const; AddrMajorMode GetMajorMode( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 mip0WidthInBlk, UINT_32 mip0HeightInBlk, UINT_32 mip0DepthInBlk) const { BOOL_32 yMajor = (mip0WidthInBlk < mip0HeightInBlk); BOOL_32 xMajor = (yMajor == FALSE); if (IsThick(resourceType, swizzleMode)) { yMajor = yMajor && (mip0HeightInBlk >= mip0DepthInBlk); xMajor = xMajor && (mip0WidthInBlk >= mip0DepthInBlk); } AddrMajorMode majorMode; if (xMajor) { majorMode = ADDR_MAJOR_X; } else if (yMajor) { majorMode = ADDR_MAJOR_Y; } else { majorMode = ADDR_MAJOR_Z; } return majorMode; } Dim3d GetDccCompressBlk( AddrResourceType resourceType, AddrSwizzleMode swizzleMode, UINT_32 bpp) const { UINT_32 index = Log2(bpp >> 3); Dim3d compressBlkDim; if (IsThin(resourceType, swizzleMode)) { compressBlkDim.w = Block256_2d[index].w; compressBlkDim.h = Block256_2d[index].h; compressBlkDim.d = 1; } else if (IsStandardSwizzle(resourceType, swizzleMode)) { compressBlkDim = Block256_3dS[index]; } else { compressBlkDim = Block256_3dZ[index]; } return compressBlkDim; } static const UINT_32 MaxSeLog2 = 3; static const UINT_32 MaxRbPerSeLog2 = 2; static const Dim3d Block256_3dS[MaxNumOfBpp]; static const Dim3d Block256_3dZ[MaxNumOfBpp]; static const UINT_32 MipTailOffset256B[]; static const SwizzleModeFlags SwizzleModeTable[ADDR_SW_MAX_TYPE]; static const UINT_32 MaxCachedMetaEq = 2; Gfx9ChipSettings m_settings; CoordEq m_cachedMetaEq[MaxCachedMetaEq]; MetaEqParams m_cachedMetaEqKey[MaxCachedMetaEq]; UINT_32 m_metaEqOverrideIndex; }; } // V2 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/r800/000077500000000000000000000000001420110115200213555ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/r800/ciaddrlib.cpp000066400000000000000000002327511420110115200240100ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file ciaddrlib.cpp * @brief Contains the implementation for the CiLib class. **************************************************************************************************** */ #include "ciaddrlib.h" #include "si_gb_reg.h" #include "amdgpu_asic_addr.h" //////////////////////////////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////////////////////////////// namespace rocr { namespace Addr { /** **************************************************************************************************** * CiHwlInit * * @brief * Creates an CiLib object. * * @return * Returns an CiLib object pointer. **************************************************************************************************** */ Lib* CiHwlInit(const Client* pClient) { return V1::CiLib::CreateObj(pClient); } namespace V1 { /** **************************************************************************************************** * Mask * * @brief * Gets a mask of "width" * @return * Bit mask **************************************************************************************************** */ static UINT_64 Mask( UINT_32 width) ///< Width of bits { UINT_64 ret; if (width >= sizeof(UINT_64)*8) { ret = ~((UINT_64) 0); } else { return (((UINT_64) 1) << width) - 1; } return ret; } /** **************************************************************************************************** * GetBits * * @brief * Gets bits within a range of [msb, lsb] * @return * Bits of this range **************************************************************************************************** */ static UINT_64 GetBits( UINT_64 bits, ///< Source bits UINT_32 msb, ///< Most signicant bit UINT_32 lsb) ///< Least signicant bit { UINT_64 ret = 0; if (msb >= lsb) { ret = (bits >> lsb) & (Mask(1 + msb - lsb)); } return ret; } /** **************************************************************************************************** * RemoveBits * * @brief * Removes bits within the range of [msb, lsb] * @return * Modified bits **************************************************************************************************** */ static UINT_64 RemoveBits( UINT_64 bits, ///< Source bits UINT_32 msb, ///< Most signicant bit UINT_32 lsb) ///< Least signicant bit { UINT_64 ret = bits; if (msb >= lsb) { ret = GetBits(bits, lsb - 1, 0) // low bits | (GetBits(bits, 8 * sizeof(bits) - 1, msb + 1) << lsb); //high bits } return ret; } /** **************************************************************************************************** * InsertBits * * @brief * Inserts new bits into the range of [msb, lsb] * @return * Modified bits **************************************************************************************************** */ static UINT_64 InsertBits( UINT_64 bits, ///< Source bits UINT_64 newBits, ///< New bits to be inserted UINT_32 msb, ///< Most signicant bit UINT_32 lsb) ///< Least signicant bit { UINT_64 ret = bits; if (msb >= lsb) { ret = GetBits(bits, lsb - 1, 0) // old low bitss | (GetBits(newBits, msb - lsb, 0) << lsb) //new bits | (GetBits(bits, 8 * sizeof(bits) - 1, lsb) << (msb + 1)); //old high bits } return ret; } /** **************************************************************************************************** * CiLib::CiLib * * @brief * Constructor * **************************************************************************************************** */ CiLib::CiLib(const Client* pClient) : SiLib(pClient), m_noOfMacroEntries(0), m_allowNonDispThickModes(FALSE) { m_class = CI_ADDRLIB; } /** **************************************************************************************************** * CiLib::~CiLib * * @brief * Destructor **************************************************************************************************** */ CiLib::~CiLib() { } /** **************************************************************************************************** * CiLib::HwlComputeDccInfo * * @brief * Compute DCC key size, base alignment * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE CiLib::HwlComputeDccInfo( const ADDR_COMPUTE_DCCINFO_INPUT* pIn, ADDR_COMPUTE_DCCINFO_OUTPUT* pOut) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; if (SupportDccAndTcCompatibility() && IsMacroTiled(pIn->tileMode)) { UINT_64 dccFastClearSize = pIn->colorSurfSize >> 8; ADDR_ASSERT(0 == (pIn->colorSurfSize & 0xff)); if (pIn->numSamples > 1) { UINT_32 tileSizePerSample = BITS_TO_BYTES(pIn->bpp * MicroTileWidth * MicroTileHeight); UINT_32 samplesPerSplit = pIn->tileInfo.tileSplitBytes / tileSizePerSample; if (samplesPerSplit < pIn->numSamples) { UINT_32 numSplits = pIn->numSamples / samplesPerSplit; UINT_32 fastClearBaseAlign = HwlGetPipes(&pIn->tileInfo) * m_pipeInterleaveBytes; ADDR_ASSERT(IsPow2(fastClearBaseAlign)); dccFastClearSize /= numSplits; if (0 != (dccFastClearSize & (fastClearBaseAlign - 1))) { // Disable dcc fast clear // if key size of fisrt sample split is not pipe*interleave aligned dccFastClearSize = 0; } } } pOut->dccRamSize = pIn->colorSurfSize >> 8; pOut->dccRamBaseAlign = pIn->tileInfo.banks * HwlGetPipes(&pIn->tileInfo) * m_pipeInterleaveBytes; pOut->dccFastClearSize = dccFastClearSize; pOut->dccRamSizeAligned = TRUE; ADDR_ASSERT(IsPow2(pOut->dccRamBaseAlign)); if (0 == (pOut->dccRamSize & (pOut->dccRamBaseAlign - 1))) { pOut->subLvlCompressible = TRUE; } else { UINT_64 dccRamSizeAlign = HwlGetPipes(&pIn->tileInfo) * m_pipeInterleaveBytes; if (pOut->dccRamSize == pOut->dccFastClearSize) { pOut->dccFastClearSize = PowTwoAlign(pOut->dccRamSize, dccRamSizeAlign); } if ((pOut->dccRamSize & (dccRamSizeAlign - 1)) != 0) { pOut->dccRamSizeAligned = FALSE; } pOut->dccRamSize = PowTwoAlign(pOut->dccRamSize, dccRamSizeAlign); pOut->subLvlCompressible = FALSE; } } else { returnCode = ADDR_NOTSUPPORTED; } return returnCode; } /** **************************************************************************************************** * CiLib::HwlComputeCmaskAddrFromCoord * * @brief * Compute tc compatible Cmask address from fmask ram address * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE CiLib::HwlComputeCmaskAddrFromCoord( const ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] fmask addr/bpp/tile input ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut ///< [out] cmask address ) const { ADDR_E_RETURNCODE returnCode = ADDR_NOTSUPPORTED; if ((SupportDccAndTcCompatibility() == TRUE) && (pIn->flags.tcCompatible == TRUE)) { UINT_32 numOfPipes = HwlGetPipes(pIn->pTileInfo); UINT_32 numOfBanks = pIn->pTileInfo->banks; UINT_64 fmaskAddress = pIn->fmaskAddr; UINT_32 elemBits = pIn->bpp; UINT_32 blockByte = 64 * elemBits / 8; UINT_64 metaNibbleAddress = HwlComputeMetadataNibbleAddress(fmaskAddress, 0, 0, 4, // cmask 4 bits elemBits, blockByte, m_pipeInterleaveBytes, numOfPipes, numOfBanks, 1); pOut->addr = (metaNibbleAddress >> 1); pOut->bitPosition = (metaNibbleAddress % 2) ? 4 : 0; returnCode = ADDR_OK; } return returnCode; } /** **************************************************************************************************** * CiLib::HwlComputeHtileAddrFromCoord * * @brief * Compute tc compatible Htile address from depth/stencil address * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE CiLib::HwlComputeHtileAddrFromCoord( const ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ///< [in] depth/stencil addr/bpp/tile input ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] htile address ) const { ADDR_E_RETURNCODE returnCode = ADDR_NOTSUPPORTED; if ((SupportDccAndTcCompatibility() == TRUE) && (pIn->flags.tcCompatible == TRUE)) { UINT_32 numOfPipes = HwlGetPipes(pIn->pTileInfo); UINT_32 numOfBanks = pIn->pTileInfo->banks; UINT_64 zStencilAddr = pIn->zStencilAddr; UINT_32 elemBits = pIn->bpp; UINT_32 blockByte = 64 * elemBits / 8; UINT_64 metaNibbleAddress = HwlComputeMetadataNibbleAddress(zStencilAddr, 0, 0, 32, // htile 32 bits elemBits, blockByte, m_pipeInterleaveBytes, numOfPipes, numOfBanks, 1); pOut->addr = (metaNibbleAddress >> 1); pOut->bitPosition = 0; returnCode = ADDR_OK; } return returnCode; } /** **************************************************************************************************** * CiLib::HwlConvertChipFamily * * @brief * Convert familyID defined in atiid.h to ChipFamily and set m_chipFamily/m_chipRevision * @return * ChipFamily **************************************************************************************************** */ ChipFamily CiLib::HwlConvertChipFamily( UINT_32 uChipFamily, ///< [in] chip family defined in atiih.h UINT_32 uChipRevision) ///< [in] chip revision defined in "asic_family"_id.h { ChipFamily family = ADDR_CHIP_FAMILY_CI; switch (uChipFamily) { case FAMILY_CI: m_settings.isSeaIsland = 1; m_settings.isBonaire = ASICREV_IS_BONAIRE_M(uChipRevision); m_settings.isHawaii = ASICREV_IS_HAWAII_P(uChipRevision); break; case FAMILY_KV: m_settings.isKaveri = 1; m_settings.isSpectre = ASICREV_IS_SPECTRE(uChipRevision); m_settings.isSpooky = ASICREV_IS_SPOOKY(uChipRevision); m_settings.isKalindi = ASICREV_IS_KALINDI(uChipRevision); break; case FAMILY_VI: m_settings.isVolcanicIslands = 1; m_settings.isIceland = ASICREV_IS_ICELAND_M(uChipRevision); m_settings.isTonga = ASICREV_IS_TONGA_P(uChipRevision); m_settings.isFiji = ASICREV_IS_FIJI_P(uChipRevision); m_settings.isPolaris10 = ASICREV_IS_POLARIS10_P(uChipRevision); m_settings.isPolaris11 = ASICREV_IS_POLARIS11_M(uChipRevision); m_settings.isPolaris12 = ASICREV_IS_POLARIS12_V(uChipRevision); m_settings.isVegaM = ASICREV_IS_VEGAM_P(uChipRevision); family = ADDR_CHIP_FAMILY_VI; break; case FAMILY_CZ: m_settings.isCarrizo = 1; m_settings.isVolcanicIslands = 1; family = ADDR_CHIP_FAMILY_VI; break; default: ADDR_ASSERT(!"This should be a unexpected Fusion"); break; } return family; } /** **************************************************************************************************** * CiLib::HwlInitGlobalParams * * @brief * Initializes global parameters * * @return * TRUE if all settings are valid * **************************************************************************************************** */ BOOL_32 CiLib::HwlInitGlobalParams( const ADDR_CREATE_INPUT* pCreateIn) ///< [in] create input { BOOL_32 valid = TRUE; const ADDR_REGISTER_VALUE* pRegValue = &pCreateIn->regValue; valid = DecodeGbRegs(pRegValue); // The following assignments for m_pipes is only for fail-safe, InitTileSettingTable should // read the correct pipes from tile mode table if (m_settings.isHawaii) { m_pipes = 16; } else if (m_settings.isBonaire || m_settings.isSpectre) { m_pipes = 4; } else // Treat other KV asics to be 2-pipe { m_pipes = 2; } // @todo: VI // Move this to VI code path once created if (m_settings.isTonga || m_settings.isPolaris10) { m_pipes = 8; } else if (m_settings.isIceland) { m_pipes = 2; } else if (m_settings.isFiji) { m_pipes = 16; } else if (m_settings.isPolaris11 || m_settings.isPolaris12) { m_pipes = 4; } else if (m_settings.isVegaM) { m_pipes = 16; } if (valid) { valid = InitTileSettingTable(pRegValue->pTileConfig, pRegValue->noOfEntries); } if (valid) { valid = InitMacroTileCfgTable(pRegValue->pMacroTileConfig, pRegValue->noOfMacroEntries); } if (valid) { InitEquationTable(); } return valid; } /** **************************************************************************************************** * CiLib::HwlPostCheckTileIndex * * @brief * Map a tile setting to index if curIndex is invalid, otherwise check if curIndex matches * tile mode/type/info and change the index if needed * @return * Tile index. **************************************************************************************************** */ INT_32 CiLib::HwlPostCheckTileIndex( const ADDR_TILEINFO* pInfo, ///< [in] Tile Info AddrTileMode mode, ///< [in] Tile mode AddrTileType type, ///< [in] Tile type INT curIndex ///< [in] Current index assigned in HwlSetupTileInfo ) const { INT_32 index = curIndex; if (mode == ADDR_TM_LINEAR_GENERAL) { index = TileIndexLinearGeneral; } else { BOOL_32 macroTiled = IsMacroTiled(mode); // We need to find a new index if either of them is true // 1. curIndex is invalid // 2. tile mode is changed // 3. tile info does not match for macro tiled if ((index == TileIndexInvalid) || (mode != m_tileTable[index].mode) || (macroTiled && pInfo->pipeConfig != m_tileTable[index].info.pipeConfig)) { for (index = 0; index < static_cast(m_noOfEntries); index++) { if (macroTiled) { // macro tile modes need all to match if ((pInfo->pipeConfig == m_tileTable[index].info.pipeConfig) && (mode == m_tileTable[index].mode) && (type == m_tileTable[index].type)) { // tileSplitBytes stored in m_tileTable is only valid for depth entries if (type == ADDR_DEPTH_SAMPLE_ORDER) { if (Min(m_tileTable[index].info.tileSplitBytes, m_rowSize) == pInfo->tileSplitBytes) { break; } } else // other entries are determined by other 3 fields { break; } } } else if (mode == ADDR_TM_LINEAR_ALIGNED) { // linear mode only needs tile mode to match if (mode == m_tileTable[index].mode) { break; } } else { // micro tile modes only need tile mode and tile type to match if (mode == m_tileTable[index].mode && type == m_tileTable[index].type) { break; } } } } } ADDR_ASSERT(index < static_cast(m_noOfEntries)); if (index >= static_cast(m_noOfEntries)) { index = TileIndexInvalid; } return index; } /** **************************************************************************************************** * CiLib::HwlSetupTileCfg * * @brief * Map tile index to tile setting. * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE CiLib::HwlSetupTileCfg( UINT_32 bpp, ///< Bits per pixel INT_32 index, ///< Tile index INT_32 macroModeIndex, ///< Index in macro tile mode table(CI) ADDR_TILEINFO* pInfo, ///< [out] Tile Info AddrTileMode* pMode, ///< [out] Tile mode AddrTileType* pType ///< [out] Tile type ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; // Global flag to control usage of tileIndex if (UseTileIndex(index)) { if (index == TileIndexLinearGeneral) { pInfo->banks = 2; pInfo->bankWidth = 1; pInfo->bankHeight = 1; pInfo->macroAspectRatio = 1; pInfo->tileSplitBytes = 64; pInfo->pipeConfig = ADDR_PIPECFG_P2; } else if (static_cast(index) >= m_noOfEntries) { returnCode = ADDR_INVALIDPARAMS; } else { const TileConfig* pCfgTable = GetTileSetting(index); if (pInfo != NULL) { if (IsMacroTiled(pCfgTable->mode)) { ADDR_ASSERT((macroModeIndex != TileIndexInvalid) && (macroModeIndex != TileIndexNoMacroIndex)); UINT_32 tileSplit; *pInfo = m_macroTileTable[macroModeIndex]; if (pCfgTable->type == ADDR_DEPTH_SAMPLE_ORDER) { tileSplit = pCfgTable->info.tileSplitBytes; } else { if (bpp > 0) { UINT_32 thickness = Thickness(pCfgTable->mode); UINT_32 tileBytes1x = BITS_TO_BYTES(bpp * MicroTilePixels * thickness); // Non-depth entries store a split factor UINT_32 sampleSplit = m_tileTable[index].info.tileSplitBytes; tileSplit = Max(256u, sampleSplit * tileBytes1x); } else { // Return tileBytes instead if not enough info tileSplit = pInfo->tileSplitBytes; } } // Clamp to row_size pInfo->tileSplitBytes = Min(m_rowSize, tileSplit); pInfo->pipeConfig = pCfgTable->info.pipeConfig; } else // 1D and linear modes, we return default value stored in table { *pInfo = pCfgTable->info; } } if (pMode != NULL) { *pMode = pCfgTable->mode; } if (pType != NULL) { *pType = pCfgTable->type; } } } return returnCode; } /** **************************************************************************************************** * CiLib::HwlComputeSurfaceInfo * * @brief * Entry of CI's ComputeSurfaceInfo * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE CiLib::HwlComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { // If tileIndex is invalid, force macroModeIndex to be invalid, too if (pIn->tileIndex == TileIndexInvalid) { pOut->macroModeIndex = TileIndexInvalid; } ADDR_E_RETURNCODE retCode = SiLib::HwlComputeSurfaceInfo(pIn, pOut); if ((pIn->mipLevel > 0) && (pOut->tcCompatible == TRUE) && (pOut->tileMode != pIn->tileMode) && (SupportDccAndTcCompatibility() == TRUE)) { pOut->tcCompatible = CheckTcCompatibility(pOut->pTileInfo, pIn->bpp, pOut->tileMode, pOut->tileType, pOut); } if (pOut->macroModeIndex == TileIndexNoMacroIndex) { pOut->macroModeIndex = TileIndexInvalid; } if ((pIn->flags.matchStencilTileCfg == TRUE) && (pIn->flags.depth == TRUE)) { pOut->stencilTileIdx = TileIndexInvalid; if ((MinDepth2DThinIndex <= pOut->tileIndex) && (MaxDepth2DThinIndex >= pOut->tileIndex)) { BOOL_32 depthStencil2DTileConfigMatch = DepthStencilTileCfgMatch(pIn, pOut); if ((depthStencil2DTileConfigMatch == FALSE) && (pOut->tcCompatible == TRUE)) { pOut->macroModeIndex = TileIndexInvalid; ADDR_COMPUTE_SURFACE_INFO_INPUT localIn = *pIn; localIn.tileIndex = TileIndexInvalid; localIn.pTileInfo = NULL; localIn.flags.tcCompatible = FALSE; SiLib::HwlComputeSurfaceInfo(&localIn, pOut); ADDR_ASSERT((MinDepth2DThinIndex <= pOut->tileIndex) && (MaxDepth2DThinIndex >= pOut->tileIndex)); depthStencil2DTileConfigMatch = DepthStencilTileCfgMatch(pIn, pOut); } if ((depthStencil2DTileConfigMatch == FALSE) && (pIn->numSamples <= 1)) { pOut->macroModeIndex = TileIndexInvalid; ADDR_COMPUTE_SURFACE_INFO_INPUT localIn = *pIn; localIn.tileMode = ADDR_TM_1D_TILED_THIN1; localIn.tileIndex = TileIndexInvalid; localIn.pTileInfo = NULL; retCode = SiLib::HwlComputeSurfaceInfo(&localIn, pOut); } } if (pOut->tileIndex == Depth1DThinIndex) { pOut->stencilTileIdx = Depth1DThinIndex; } } return retCode; } /** **************************************************************************************************** * CiLib::HwlFmaskSurfaceInfo * @brief * Entry of r800's ComputeFmaskInfo * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE CiLib::HwlComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut ///< [out] output structure ) { ADDR_E_RETURNCODE retCode = ADDR_OK; ADDR_TILEINFO tileInfo = {0}; ADDR_COMPUTE_FMASK_INFO_INPUT fmaskIn; fmaskIn = *pIn; AddrTileMode tileMode = pIn->tileMode; // Use internal tile info if pOut does not have a valid pTileInfo if (pOut->pTileInfo == NULL) { pOut->pTileInfo = &tileInfo; } ADDR_ASSERT(tileMode == ADDR_TM_2D_TILED_THIN1 || tileMode == ADDR_TM_3D_TILED_THIN1 || tileMode == ADDR_TM_PRT_TILED_THIN1 || tileMode == ADDR_TM_PRT_2D_TILED_THIN1 || tileMode == ADDR_TM_PRT_3D_TILED_THIN1); ADDR_ASSERT(m_tileTable[14].mode == ADDR_TM_2D_TILED_THIN1); ADDR_ASSERT(m_tileTable[15].mode == ADDR_TM_3D_TILED_THIN1); // The only valid tile modes for fmask are 2D_THIN1 and 3D_THIN1 plus non-displayable INT_32 tileIndex = tileMode == ADDR_TM_2D_TILED_THIN1 ? 14 : 15; ADDR_SURFACE_FLAGS flags = {{0}}; flags.fmask = 1; INT_32 macroModeIndex = TileIndexInvalid; UINT_32 numSamples = pIn->numSamples; UINT_32 numFrags = pIn->numFrags == 0 ? numSamples : pIn->numFrags; UINT_32 bpp = QLog2(numFrags); // EQAA needs one more bit if (numSamples > numFrags) { bpp++; } if (bpp == 3) { bpp = 4; } bpp = Max(8u, bpp * numSamples); macroModeIndex = HwlComputeMacroModeIndex(tileIndex, flags, bpp, numSamples, pOut->pTileInfo); fmaskIn.tileIndex = tileIndex; fmaskIn.pTileInfo = pOut->pTileInfo; pOut->macroModeIndex = macroModeIndex; pOut->tileIndex = tileIndex; retCode = DispatchComputeFmaskInfo(&fmaskIn, pOut); if (retCode == ADDR_OK) { pOut->tileIndex = HwlPostCheckTileIndex(pOut->pTileInfo, pIn->tileMode, ADDR_NON_DISPLAYABLE, pOut->tileIndex); } // Resets pTileInfo to NULL if the internal tile info is used if (pOut->pTileInfo == &tileInfo) { pOut->pTileInfo = NULL; } return retCode; } /** **************************************************************************************************** * CiLib::HwlFmaskPreThunkSurfInfo * * @brief * Some preparation before thunking a ComputeSurfaceInfo call for Fmask * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ VOID CiLib::HwlFmaskPreThunkSurfInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pFmaskIn, ///< [in] Input of fmask info const ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut, ///< [in] Output of fmask info ADDR_COMPUTE_SURFACE_INFO_INPUT* pSurfIn, ///< [out] Input of thunked surface info ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut ///< [out] Output of thunked surface info ) const { pSurfIn->tileIndex = pFmaskIn->tileIndex; pSurfOut->macroModeIndex = pFmaskOut->macroModeIndex; } /** **************************************************************************************************** * CiLib::HwlFmaskPostThunkSurfInfo * * @brief * Copy hwl extra field after calling thunked ComputeSurfaceInfo * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ VOID CiLib::HwlFmaskPostThunkSurfInfo( const ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut, ///< [in] Output of surface info ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut ///< [out] Output of fmask info ) const { pFmaskOut->tileIndex = pSurfOut->tileIndex; pFmaskOut->macroModeIndex = pSurfOut->macroModeIndex; } /** **************************************************************************************************** * CiLib::HwlDegradeThickTileMode * * @brief * Degrades valid tile mode for thick modes if needed * * @return * Suitable tile mode **************************************************************************************************** */ AddrTileMode CiLib::HwlDegradeThickTileMode( AddrTileMode baseTileMode, ///< [in] base tile mode UINT_32 numSlices, ///< [in] current number of slices UINT_32* pBytesPerTile ///< [in,out] pointer to bytes per slice ) const { return baseTileMode; } /** **************************************************************************************************** * CiLib::HwlOptimizeTileMode * * @brief * Optimize tile mode on CI * * @return * N/A * **************************************************************************************************** */ VOID CiLib::HwlOptimizeTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in,out] input output structure ) const { AddrTileMode tileMode = pInOut->tileMode; // Override 2D/3D macro tile mode to PRT_* tile mode if // client driver requests this surface is equation compatible if (IsMacroTiled(tileMode) == TRUE) { if ((pInOut->flags.needEquation == TRUE) && (pInOut->numSamples <= 1) && (IsPrtTileMode(tileMode) == FALSE)) { if ((pInOut->numSlices > 1) && ((pInOut->maxBaseAlign == 0) || (pInOut->maxBaseAlign >= Block64K))) { UINT_32 thickness = Thickness(tileMode); if (thickness == 1) { tileMode = ADDR_TM_PRT_TILED_THIN1; } else { static const UINT_32 PrtTileBytes = 0x10000; // First prt thick tile index in the tile mode table static const UINT_32 PrtThickTileIndex = 22; ADDR_TILEINFO tileInfo = {0}; HwlComputeMacroModeIndex(PrtThickTileIndex, pInOut->flags, pInOut->bpp, pInOut->numSamples, &tileInfo); UINT_32 macroTileBytes = ((pInOut->bpp) >> 3) * 64 * pInOut->numSamples * thickness * HwlGetPipes(&tileInfo) * tileInfo.banks * tileInfo.bankWidth * tileInfo.bankHeight; if (macroTileBytes <= PrtTileBytes) { tileMode = ADDR_TM_PRT_TILED_THICK; } else { tileMode = ADDR_TM_PRT_TILED_THIN1; } } } } if (pInOut->maxBaseAlign != 0) { pInOut->flags.dccPipeWorkaround = FALSE; } } if (tileMode != pInOut->tileMode) { pInOut->tileMode = tileMode; } } /** **************************************************************************************************** * CiLib::HwlOverrideTileMode * * @brief * Override THICK to THIN, for specific formats on CI * * @return * N/A * **************************************************************************************************** */ VOID CiLib::HwlOverrideTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in,out] input output structure ) const { AddrTileMode tileMode = pInOut->tileMode; AddrTileType tileType = pInOut->tileType; // currently, all CI/VI family do not // support ADDR_TM_PRT_2D_TILED_THICK,ADDR_TM_PRT_3D_TILED_THICK and // ADDR_TM_PRT_2D_TILED_THIN1, ADDR_TM_PRT_3D_TILED_THIN1 switch (tileMode) { case ADDR_TM_PRT_2D_TILED_THICK: case ADDR_TM_PRT_3D_TILED_THICK: tileMode = ADDR_TM_PRT_TILED_THICK; break; case ADDR_TM_PRT_2D_TILED_THIN1: case ADDR_TM_PRT_3D_TILED_THIN1: tileMode = ADDR_TM_PRT_TILED_THIN1; break; default: break; } // UBTS#404321, we do not need such overriding, as THICK+THICK entries removed from the tile-mode table if (!m_settings.isBonaire) { UINT_32 thickness = Thickness(tileMode); // tile_thickness = (array_mode == XTHICK) ? 8 : ((array_mode == THICK) ? 4 : 1) if (thickness > 1) { switch (pInOut->format) { // tcpError("Thick micro tiling is not supported for format... case ADDR_FMT_X24_8_32_FLOAT: case ADDR_FMT_32_AS_8: case ADDR_FMT_32_AS_8_8: case ADDR_FMT_32_AS_32_32_32_32: // packed formats case ADDR_FMT_GB_GR: case ADDR_FMT_BG_RG: case ADDR_FMT_1_REVERSED: case ADDR_FMT_1: case ADDR_FMT_BC1: case ADDR_FMT_BC2: case ADDR_FMT_BC3: case ADDR_FMT_BC4: case ADDR_FMT_BC5: case ADDR_FMT_BC6: case ADDR_FMT_BC7: switch (tileMode) { case ADDR_TM_1D_TILED_THICK: tileMode = ADDR_TM_1D_TILED_THIN1; break; case ADDR_TM_2D_TILED_XTHICK: case ADDR_TM_2D_TILED_THICK: tileMode = ADDR_TM_2D_TILED_THIN1; break; case ADDR_TM_3D_TILED_XTHICK: case ADDR_TM_3D_TILED_THICK: tileMode = ADDR_TM_3D_TILED_THIN1; break; case ADDR_TM_PRT_TILED_THICK: tileMode = ADDR_TM_PRT_TILED_THIN1; break; case ADDR_TM_PRT_2D_TILED_THICK: tileMode = ADDR_TM_PRT_2D_TILED_THIN1; break; case ADDR_TM_PRT_3D_TILED_THICK: tileMode = ADDR_TM_PRT_3D_TILED_THIN1; break; default: break; } // Switch tile type from thick to thin if (tileMode != pInOut->tileMode) { // see tileIndex: 13-18 tileType = ADDR_NON_DISPLAYABLE; } break; default: break; } } } if (tileMode != pInOut->tileMode) { pInOut->tileMode = tileMode; pInOut->tileType = tileType; } } /** **************************************************************************************************** * CiLib::HwlSelectTileMode * * @brief * Select tile modes. * * @return * N/A * **************************************************************************************************** */ VOID CiLib::HwlSelectTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in,out] input output structure ) const { AddrTileMode tileMode; AddrTileType tileType; if (pInOut->flags.rotateDisplay) { tileMode = ADDR_TM_2D_TILED_THIN1; tileType = ADDR_ROTATED; } else if (pInOut->flags.volume) { BOOL_32 bThin = (m_settings.isBonaire == TRUE) || ((m_allowNonDispThickModes == TRUE) && (pInOut->flags.color == TRUE)); if (pInOut->numSlices >= 8) { tileMode = ADDR_TM_2D_TILED_XTHICK; tileType = (bThin == TRUE) ? ADDR_NON_DISPLAYABLE : ADDR_THICK; } else if (pInOut->numSlices >= 4) { tileMode = ADDR_TM_2D_TILED_THICK; tileType = (bThin == TRUE) ? ADDR_NON_DISPLAYABLE : ADDR_THICK; } else { tileMode = ADDR_TM_2D_TILED_THIN1; tileType = ADDR_NON_DISPLAYABLE; } } else { tileMode = ADDR_TM_2D_TILED_THIN1; if (pInOut->flags.depth || pInOut->flags.stencil) { tileType = ADDR_DEPTH_SAMPLE_ORDER; } else if ((pInOut->bpp <= 32) || (pInOut->flags.display == TRUE) || (pInOut->flags.overlay == TRUE)) { tileType = ADDR_DISPLAYABLE; } else { tileType = ADDR_NON_DISPLAYABLE; } } if (pInOut->flags.prt) { if (Thickness(tileMode) > 1) { tileMode = ADDR_TM_PRT_TILED_THICK; tileType = (m_settings.isBonaire == TRUE) ? ADDR_NON_DISPLAYABLE : ADDR_THICK; } else { tileMode = ADDR_TM_PRT_TILED_THIN1; } } pInOut->tileMode = tileMode; pInOut->tileType = tileType; if ((pInOut->flags.dccCompatible == FALSE) && (pInOut->flags.tcCompatible == FALSE)) { pInOut->flags.opt4Space = TRUE; pInOut->maxBaseAlign = Block64K; } // Optimize tile mode if possible OptimizeTileMode(pInOut); HwlOverrideTileMode(pInOut); } /** **************************************************************************************************** * CiLib::HwlSetPrtTileMode * * @brief * Set PRT tile mode. * * @return * N/A * **************************************************************************************************** */ VOID CiLib::HwlSetPrtTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in,out] input output structure ) const { AddrTileMode tileMode = pInOut->tileMode; AddrTileType tileType = pInOut->tileType; if (Thickness(tileMode) > 1) { tileMode = ADDR_TM_PRT_TILED_THICK; tileType = (m_settings.isBonaire == TRUE) ? ADDR_NON_DISPLAYABLE : ADDR_THICK; } else { tileMode = ADDR_TM_PRT_TILED_THIN1; tileType = (tileType == ADDR_THICK) ? ADDR_NON_DISPLAYABLE : tileType; } pInOut->tileMode = tileMode; pInOut->tileType = tileType; } /** **************************************************************************************************** * CiLib::HwlSetupTileInfo * * @brief * Setup default value of tile info for SI **************************************************************************************************** */ VOID CiLib::HwlSetupTileInfo( AddrTileMode tileMode, ///< [in] Tile mode ADDR_SURFACE_FLAGS flags, ///< [in] Surface type flags UINT_32 bpp, ///< [in] Bits per pixel UINT_32 pitch, ///< [in] Pitch in pixels UINT_32 height, ///< [in] Height in pixels UINT_32 numSamples, ///< [in] Number of samples ADDR_TILEINFO* pTileInfoIn, ///< [in] Tile info input: NULL for default ADDR_TILEINFO* pTileInfoOut, ///< [out] Tile info output AddrTileType inTileType, ///< [in] Tile type ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] Output ) const { UINT_32 thickness = Thickness(tileMode); ADDR_TILEINFO* pTileInfo = pTileInfoOut; INT index = TileIndexInvalid; INT macroModeIndex = TileIndexInvalid; // Fail-safe code if (IsLinear(tileMode) == FALSE) { // Thick tile modes must use thick micro tile mode but Bonaire does not support due to // old derived netlists (UBTS 404321) if (thickness > 1) { if (m_settings.isBonaire) { inTileType = ADDR_NON_DISPLAYABLE; } else if ((m_allowNonDispThickModes == FALSE) || (inTileType != ADDR_NON_DISPLAYABLE) || // There is no PRT_THICK + THIN entry in tile mode table except Bonaire (IsPrtTileMode(tileMode) == TRUE)) { inTileType = ADDR_THICK; } } // 128 bpp tiling must be non-displayable. // Fmask reuse color buffer's entry but bank-height field can be from another entry // To simplify the logic, fmask entry should be picked from non-displayable ones else if (bpp == 128 || flags.fmask) { inTileType = ADDR_NON_DISPLAYABLE; } // These two modes only have non-disp entries though they can be other micro tile modes else if (tileMode == ADDR_TM_3D_TILED_THIN1 || tileMode == ADDR_TM_PRT_3D_TILED_THIN1) { inTileType = ADDR_NON_DISPLAYABLE; } if (flags.depth || flags.stencil) { inTileType = ADDR_DEPTH_SAMPLE_ORDER; } } // tcCompatible flag is only meaningful for gfx8. if (SupportDccAndTcCompatibility() == FALSE) { flags.tcCompatible = FALSE; } if (IsTileInfoAllZero(pTileInfo)) { // See table entries 0-4 if (flags.depth || flags.stencil) { // tileSize = thickness * bpp * numSamples * 8 * 8 / 8 UINT_32 tileSize = thickness * bpp * numSamples * 8; // Turn off tc compatible if row_size is smaller than tile size (tile split occurs). if (m_rowSize < tileSize) { flags.tcCompatible = FALSE; } if (flags.nonSplit | flags.tcCompatible | flags.needEquation) { // Texture readable depth surface should not be split switch (tileSize) { case 64: index = 0; break; case 128: index = 1; break; case 256: index = 2; break; case 512: index = 3; break; default: index = 4; break; } } else { // Depth and stencil need to use the same index, thus the pre-defined tile_split // can meet the requirement to choose the same macro mode index // uncompressed depth/stencil are not supported for now switch (numSamples) { case 1: index = 0; break; case 2: case 4: index = 1; break; case 8: index = 2; break; default: break; } } } // See table entries 5-6 if (inTileType == ADDR_DEPTH_SAMPLE_ORDER) { switch (tileMode) { case ADDR_TM_1D_TILED_THIN1: index = 5; break; case ADDR_TM_PRT_TILED_THIN1: index = 6; break; default: break; } } // See table entries 8-12 if (inTileType == ADDR_DISPLAYABLE) { switch (tileMode) { case ADDR_TM_1D_TILED_THIN1: index = 9; break; case ADDR_TM_2D_TILED_THIN1: index = 10; break; case ADDR_TM_PRT_TILED_THIN1: index = 11; break; default: break; } } // See table entries 13-18 if (inTileType == ADDR_NON_DISPLAYABLE) { switch (tileMode) { case ADDR_TM_1D_TILED_THIN1: index = 13; break; case ADDR_TM_2D_TILED_THIN1: index = 14; break; case ADDR_TM_3D_TILED_THIN1: index = 15; break; case ADDR_TM_PRT_TILED_THIN1: index = 16; break; default: break; } } // See table entries 19-26 if (thickness > 1) { switch (tileMode) { case ADDR_TM_1D_TILED_THICK: // special check for bonaire, for the compatablity between old KMD and new UMD index = ((inTileType == ADDR_THICK) || m_settings.isBonaire) ? 19 : 18; break; case ADDR_TM_2D_TILED_THICK: // special check for bonaire, for the compatablity between old KMD and new UMD index = ((inTileType == ADDR_THICK) || m_settings.isBonaire) ? 20 : 24; break; case ADDR_TM_3D_TILED_THICK: index = 21; break; case ADDR_TM_PRT_TILED_THICK: index = 22; break; case ADDR_TM_2D_TILED_XTHICK: index = 25; break; case ADDR_TM_3D_TILED_XTHICK: index = 26; break; default: break; } } // See table entries 27-30 if (inTileType == ADDR_ROTATED) { switch (tileMode) { case ADDR_TM_1D_TILED_THIN1: index = 27; break; case ADDR_TM_2D_TILED_THIN1: index = 28; break; case ADDR_TM_PRT_TILED_THIN1: index = 29; break; case ADDR_TM_PRT_2D_TILED_THIN1: index = 30; break; default: break; } } if (m_pipes >= 8) { ADDR_ASSERT((index + 1) < static_cast(m_noOfEntries)); // Only do this when tile mode table is updated. if (((tileMode == ADDR_TM_PRT_TILED_THIN1) || (tileMode == ADDR_TM_PRT_TILED_THICK)) && (m_tileTable[index + 1].mode == tileMode)) { static const UINT_32 PrtTileBytes = 0x10000; ADDR_TILEINFO tileInfo = {0}; HwlComputeMacroModeIndex(index, flags, bpp, numSamples, &tileInfo); UINT_32 macroTileBytes = (bpp >> 3) * 64 * numSamples * thickness * HwlGetPipes(&tileInfo) * tileInfo.banks * tileInfo.bankWidth * tileInfo.bankHeight; if (macroTileBytes != PrtTileBytes) { // Switching to next tile mode entry to make sure macro tile size is 64KB index += 1; tileInfo.pipeConfig = m_tileTable[index].info.pipeConfig; macroTileBytes = (bpp >> 3) * 64 * numSamples * thickness * HwlGetPipes(&tileInfo) * tileInfo.banks * tileInfo.bankWidth * tileInfo.bankHeight; ADDR_ASSERT(macroTileBytes == PrtTileBytes); flags.tcCompatible = FALSE; pOut->dccUnsupport = TRUE; } } } } else { // A pre-filled tile info is ready index = pOut->tileIndex; macroModeIndex = pOut->macroModeIndex; // pass tile type back for post tile index compute pOut->tileType = inTileType; if (flags.depth || flags.stencil) { // tileSize = thickness * bpp * numSamples * 8 * 8 / 8 UINT_32 tileSize = thickness * bpp * numSamples * 8; // Turn off tc compatible if row_size is smaller than tile size (tile split occurs). if (m_rowSize < tileSize) { flags.tcCompatible = FALSE; } } UINT_32 numPipes = GetPipePerSurf(pTileInfo->pipeConfig); if (m_pipes != numPipes) { pOut->dccUnsupport = TRUE; } } // We only need to set up tile info if there is a valid index but macroModeIndex is invalid if ((index != TileIndexInvalid) && (macroModeIndex == TileIndexInvalid)) { macroModeIndex = HwlComputeMacroModeIndex(index, flags, bpp, numSamples, pTileInfo); // Copy to pOut->tileType/tileIndex/macroModeIndex pOut->tileIndex = index; pOut->tileType = m_tileTable[index].type; // Or inTileType, the samea pOut->macroModeIndex = macroModeIndex; } else if (tileMode == ADDR_TM_LINEAR_GENERAL) { pOut->tileIndex = TileIndexLinearGeneral; // Copy linear-aligned entry?? *pTileInfo = m_tileTable[8].info; } else if (tileMode == ADDR_TM_LINEAR_ALIGNED) { pOut->tileIndex = 8; *pTileInfo = m_tileTable[8].info; } if (flags.tcCompatible) { flags.tcCompatible = CheckTcCompatibility(pTileInfo, bpp, tileMode, inTileType, pOut); } pOut->tcCompatible = flags.tcCompatible; } /** **************************************************************************************************** * CiLib::ReadGbTileMode * * @brief * Convert GB_TILE_MODE HW value to ADDR_TILE_CONFIG. **************************************************************************************************** */ VOID CiLib::ReadGbTileMode( UINT_32 regValue, ///< [in] GB_TILE_MODE register TileConfig* pCfg ///< [out] output structure ) const { GB_TILE_MODE gbTileMode; gbTileMode.val = regValue; pCfg->type = static_cast(gbTileMode.f.micro_tile_mode_new); pCfg->info.pipeConfig = static_cast(gbTileMode.f.pipe_config + 1); if (pCfg->type == ADDR_DEPTH_SAMPLE_ORDER) { pCfg->info.tileSplitBytes = 64 << gbTileMode.f.tile_split; } else { pCfg->info.tileSplitBytes = 1 << gbTileMode.f.sample_split; } UINT_32 regArrayMode = gbTileMode.f.array_mode; pCfg->mode = static_cast(regArrayMode); switch (regArrayMode) { case 5: pCfg->mode = ADDR_TM_PRT_TILED_THIN1; break; case 6: pCfg->mode = ADDR_TM_PRT_2D_TILED_THIN1; break; case 8: pCfg->mode = ADDR_TM_2D_TILED_XTHICK; break; case 9: pCfg->mode = ADDR_TM_PRT_TILED_THICK; break; case 0xa: pCfg->mode = ADDR_TM_PRT_2D_TILED_THICK; break; case 0xb: pCfg->mode = ADDR_TM_PRT_3D_TILED_THIN1; break; case 0xe: pCfg->mode = ADDR_TM_3D_TILED_XTHICK; break; case 0xf: pCfg->mode = ADDR_TM_PRT_3D_TILED_THICK; break; default: break; } // Fail-safe code for these always convert tile info, as the non-macro modes // return the entry of tile mode table directly without looking up macro mode table if (!IsMacroTiled(pCfg->mode)) { pCfg->info.banks = 2; pCfg->info.bankWidth = 1; pCfg->info.bankHeight = 1; pCfg->info.macroAspectRatio = 1; pCfg->info.tileSplitBytes = 64; } } /** **************************************************************************************************** * CiLib::InitTileSettingTable * * @brief * Initialize the ADDR_TILE_CONFIG table. * @return * TRUE if tile table is correctly initialized **************************************************************************************************** */ BOOL_32 CiLib::InitTileSettingTable( const UINT_32* pCfg, ///< [in] Pointer to table of tile configs UINT_32 noOfEntries ///< [in] Numbe of entries in the table above ) { BOOL_32 initOk = TRUE; ADDR_ASSERT(noOfEntries <= TileTableSize); memset(m_tileTable, 0, sizeof(m_tileTable)); if (noOfEntries != 0) { m_noOfEntries = noOfEntries; } else { m_noOfEntries = TileTableSize; } if (pCfg) // From Client { for (UINT_32 i = 0; i < m_noOfEntries; i++) { ReadGbTileMode(*(pCfg + i), &m_tileTable[i]); } } else { ADDR_ASSERT_ALWAYS(); initOk = FALSE; } if (initOk) { ADDR_ASSERT(m_tileTable[TILEINDEX_LINEAR_ALIGNED].mode == ADDR_TM_LINEAR_ALIGNED); if (m_settings.isBonaire == FALSE) { // Check if entry 18 is "thick+thin" combination if ((m_tileTable[18].mode == ADDR_TM_1D_TILED_THICK) && (m_tileTable[18].type == ADDR_NON_DISPLAYABLE)) { m_allowNonDispThickModes = TRUE; ADDR_ASSERT(m_tileTable[24].mode == ADDR_TM_2D_TILED_THICK); } } else { m_allowNonDispThickModes = TRUE; } // Assume the first entry is always programmed with full pipes m_pipes = HwlGetPipes(&m_tileTable[0].info); } return initOk; } /** **************************************************************************************************** * CiLib::ReadGbMacroTileCfg * * @brief * Convert GB_MACRO_TILE_CFG HW value to ADDR_TILE_CONFIG. **************************************************************************************************** */ VOID CiLib::ReadGbMacroTileCfg( UINT_32 regValue, ///< [in] GB_MACRO_TILE_MODE register ADDR_TILEINFO* pCfg ///< [out] output structure ) const { GB_MACROTILE_MODE gbTileMode; gbTileMode.val = regValue; pCfg->bankHeight = 1 << gbTileMode.f.bank_height; pCfg->bankWidth = 1 << gbTileMode.f.bank_width; pCfg->banks = 1 << (gbTileMode.f.num_banks + 1); pCfg->macroAspectRatio = 1 << gbTileMode.f.macro_tile_aspect; } /** **************************************************************************************************** * CiLib::InitMacroTileCfgTable * * @brief * Initialize the ADDR_MACRO_TILE_CONFIG table. * @return * TRUE if macro tile table is correctly initialized **************************************************************************************************** */ BOOL_32 CiLib::InitMacroTileCfgTable( const UINT_32* pCfg, ///< [in] Pointer to table of tile configs UINT_32 noOfMacroEntries ///< [in] Numbe of entries in the table above ) { BOOL_32 initOk = TRUE; ADDR_ASSERT(noOfMacroEntries <= MacroTileTableSize); memset(m_macroTileTable, 0, sizeof(m_macroTileTable)); if (noOfMacroEntries != 0) { m_noOfMacroEntries = noOfMacroEntries; } else { m_noOfMacroEntries = MacroTileTableSize; } if (pCfg) // From Client { for (UINT_32 i = 0; i < m_noOfMacroEntries; i++) { ReadGbMacroTileCfg(*(pCfg + i), &m_macroTileTable[i]); m_macroTileTable[i].tileSplitBytes = 64 << (i % 8); } } else { ADDR_ASSERT_ALWAYS(); initOk = FALSE; } return initOk; } /** **************************************************************************************************** * CiLib::HwlComputeMacroModeIndex * * @brief * Computes macro tile mode index * @return * TRUE if macro tile table is correctly initialized **************************************************************************************************** */ INT_32 CiLib::HwlComputeMacroModeIndex( INT_32 tileIndex, ///< [in] Tile mode index ADDR_SURFACE_FLAGS flags, ///< [in] Surface flags UINT_32 bpp, ///< [in] Bit per pixel UINT_32 numSamples, ///< [in] Number of samples ADDR_TILEINFO* pTileInfo, ///< [out] Pointer to ADDR_TILEINFO AddrTileMode* pTileMode, ///< [out] Pointer to AddrTileMode AddrTileType* pTileType ///< [out] Pointer to AddrTileType ) const { INT_32 macroModeIndex = TileIndexInvalid; AddrTileMode tileMode = m_tileTable[tileIndex].mode; AddrTileType tileType = m_tileTable[tileIndex].type; UINT_32 thickness = Thickness(tileMode); if (!IsMacroTiled(tileMode)) { *pTileInfo = m_tileTable[tileIndex].info; macroModeIndex = TileIndexNoMacroIndex; } else { UINT_32 tileBytes1x = BITS_TO_BYTES(bpp * MicroTilePixels * thickness); UINT_32 tileSplit; if (m_tileTable[tileIndex].type == ADDR_DEPTH_SAMPLE_ORDER) { // Depth entries store real tileSplitBytes tileSplit = m_tileTable[tileIndex].info.tileSplitBytes; } else { // Non-depth entries store a split factor UINT_32 sampleSplit = m_tileTable[tileIndex].info.tileSplitBytes; UINT_32 colorTileSplit = Max(256u, sampleSplit * tileBytes1x); tileSplit = colorTileSplit; } UINT_32 tileSplitC = Min(m_rowSize, tileSplit); UINT_32 tileBytes; if (flags.fmask) { tileBytes = Min(tileSplitC, tileBytes1x); } else { tileBytes = Min(tileSplitC, numSamples * tileBytes1x); } if (tileBytes < 64) { tileBytes = 64; } macroModeIndex = Log2(tileBytes / 64); if (flags.prt || IsPrtTileMode(tileMode)) { macroModeIndex += PrtMacroModeOffset; *pTileInfo = m_macroTileTable[macroModeIndex]; } else { *pTileInfo = m_macroTileTable[macroModeIndex]; } pTileInfo->pipeConfig = m_tileTable[tileIndex].info.pipeConfig; pTileInfo->tileSplitBytes = tileSplitC; } if (NULL != pTileMode) { *pTileMode = tileMode; } if (NULL != pTileType) { *pTileType = tileType; } return macroModeIndex; } /** **************************************************************************************************** * CiLib::HwlComputeTileDataWidthAndHeightLinear * * @brief * Compute the squared cache shape for per-tile data (CMASK and HTILE) for linear layout * * @note * MacroWidth and macroHeight are measured in pixels **************************************************************************************************** */ VOID CiLib::HwlComputeTileDataWidthAndHeightLinear( UINT_32* pMacroWidth, ///< [out] macro tile width UINT_32* pMacroHeight, ///< [out] macro tile height UINT_32 bpp, ///< [in] bits per pixel ADDR_TILEINFO* pTileInfo ///< [in] tile info ) const { ADDR_ASSERT(pTileInfo != NULL); UINT_32 numTiles; switch (pTileInfo->pipeConfig) { case ADDR_PIPECFG_P16_32x32_8x16: case ADDR_PIPECFG_P16_32x32_16x16: case ADDR_PIPECFG_P8_32x64_32x32: case ADDR_PIPECFG_P8_32x32_16x32: case ADDR_PIPECFG_P8_32x32_16x16: case ADDR_PIPECFG_P8_32x32_8x16: case ADDR_PIPECFG_P4_32x32: numTiles = 8; break; default: numTiles = 4; break; } *pMacroWidth = numTiles * MicroTileWidth; *pMacroHeight = numTiles * MicroTileHeight; } /** **************************************************************************************************** * CiLib::HwlComputeMetadataNibbleAddress * * @brief * calculate meta data address based on input information * * ¶meter * uncompressedDataByteAddress - address of a pixel in color surface * dataBaseByteAddress - base address of color surface * metadataBaseByteAddress - base address of meta ram * metadataBitSize - meta key size, 8 for DCC, 4 for cmask * elementBitSize - element size of color surface * blockByteSize - compression block size, 256 for DCC * pipeInterleaveBytes - pipe interleave size * numOfPipes - number of pipes * numOfBanks - number of banks * numOfSamplesPerSplit - number of samples per tile split * @return * meta data nibble address (nibble address is used to support DCC compatible cmask) * **************************************************************************************************** */ UINT_64 CiLib::HwlComputeMetadataNibbleAddress( UINT_64 uncompressedDataByteAddress, UINT_64 dataBaseByteAddress, UINT_64 metadataBaseByteAddress, UINT_32 metadataBitSize, UINT_32 elementBitSize, UINT_32 blockByteSize, UINT_32 pipeInterleaveBytes, UINT_32 numOfPipes, UINT_32 numOfBanks, UINT_32 numOfSamplesPerSplit) const { ///-------------------------------------------------------------------------------------------- /// Get pipe interleave, bank and pipe bits ///-------------------------------------------------------------------------------------------- UINT_32 pipeInterleaveBits = Log2(pipeInterleaveBytes); UINT_32 pipeBits = Log2(numOfPipes); UINT_32 bankBits = Log2(numOfBanks); ///-------------------------------------------------------------------------------------------- /// Clear pipe and bank swizzles ///-------------------------------------------------------------------------------------------- UINT_32 dataMacrotileBits = pipeInterleaveBits + pipeBits + bankBits; UINT_32 metadataMacrotileBits = pipeInterleaveBits + pipeBits + bankBits; UINT_64 dataMacrotileClearMask = ~((1L << dataMacrotileBits) - 1); UINT_64 metadataMacrotileClearMask = ~((1L << metadataMacrotileBits) - 1); UINT_64 dataBaseByteAddressNoSwizzle = dataBaseByteAddress & dataMacrotileClearMask; UINT_64 metadataBaseByteAddressNoSwizzle = metadataBaseByteAddress & metadataMacrotileClearMask; ///-------------------------------------------------------------------------------------------- /// Modify metadata base before adding in so that when final address is divided by data ratio, /// the base address returns to where it should be ///-------------------------------------------------------------------------------------------- ADDR_ASSERT((0 != metadataBitSize)); UINT_64 metadataBaseShifted = metadataBaseByteAddressNoSwizzle * blockByteSize * 8 / metadataBitSize; UINT_64 offset = uncompressedDataByteAddress - dataBaseByteAddressNoSwizzle + metadataBaseShifted; ///-------------------------------------------------------------------------------------------- /// Save bank data bits ///-------------------------------------------------------------------------------------------- UINT_32 lsb = pipeBits + pipeInterleaveBits; UINT_32 msb = bankBits - 1 + lsb; UINT_64 bankDataBits = GetBits(offset, msb, lsb); ///-------------------------------------------------------------------------------------------- /// Save pipe data bits ///-------------------------------------------------------------------------------------------- lsb = pipeInterleaveBits; msb = pipeBits - 1 + lsb; UINT_64 pipeDataBits = GetBits(offset, msb, lsb); ///-------------------------------------------------------------------------------------------- /// Remove pipe and bank bits ///-------------------------------------------------------------------------------------------- lsb = pipeInterleaveBits; msb = dataMacrotileBits - 1; UINT_64 offsetWithoutPipeBankBits = RemoveBits(offset, msb, lsb); ADDR_ASSERT((0 != blockByteSize)); UINT_64 blockInBankpipe = offsetWithoutPipeBankBits / blockByteSize; UINT_32 tileSize = 8 * 8 * elementBitSize/8 * numOfSamplesPerSplit; UINT_32 blocksInTile = tileSize / blockByteSize; if (0 == blocksInTile) { lsb = 0; } else { lsb = Log2(blocksInTile); } msb = bankBits - 1 + lsb; UINT_64 blockInBankpipeWithBankBits = InsertBits(blockInBankpipe, bankDataBits, msb, lsb); /// NOTE *2 because we are converting to Nibble address in this step UINT_64 metaAddressInPipe = blockInBankpipeWithBankBits * 2 * metadataBitSize / 8; ///-------------------------------------------------------------------------------------------- /// Reinsert pipe bits back into the final address ///-------------------------------------------------------------------------------------------- lsb = pipeInterleaveBits + 1; ///<+1 due to Nibble address now gives interleave bits extra lsb. msb = pipeBits - 1 + lsb; UINT_64 metadataAddress = InsertBits(metaAddressInPipe, pipeDataBits, msb, lsb); return metadataAddress; } /** **************************************************************************************************** * CiLib::HwlComputeSurfaceAlignmentsMacroTiled * * @brief * Hardware layer function to compute alignment request for macro tile mode * **************************************************************************************************** */ VOID CiLib::HwlComputeSurfaceAlignmentsMacroTiled( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 mipLevel, ///< [in] mip level UINT_32 numSamples, ///< [in] number of samples ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [in,out] Surface output ) const { // This is to workaround a H/W limitation that DCC doesn't work when pipe config is switched to // P4. In theory, all asics that have such switching should be patched but we now only know what // to pad for Fiji. if ((m_settings.isFiji == TRUE) && (flags.dccPipeWorkaround == TRUE) && (flags.prt == FALSE) && (mipLevel == 0) && (tileMode == ADDR_TM_PRT_TILED_THIN1) && (pOut->dccUnsupport == TRUE)) { pOut->pitchAlign = PowTwoAlign(pOut->pitchAlign, 256); // In case the client still requests DCC usage. pOut->dccUnsupport = FALSE; } } /** **************************************************************************************************** * CiLib::HwlPadDimensions * * @brief * Helper function to pad dimensions * **************************************************************************************************** */ VOID CiLib::HwlPadDimensions( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 numSamples, ///< [in] number of samples ADDR_TILEINFO* pTileInfo, ///< [in] tile info UINT_32 mipLevel, ///< [in] mip level UINT_32* pPitch, ///< [in,out] pitch in pixels UINT_32* pPitchAlign, ///< [in,out] pitch alignment UINT_32 height, ///< [in] height in pixels UINT_32 heightAlign ///< [in] height alignment ) const { if ((SupportDccAndTcCompatibility() == TRUE) && (flags.dccCompatible == TRUE) && (numSamples > 1) && (mipLevel == 0) && (IsMacroTiled(tileMode) == TRUE)) { UINT_32 tileSizePerSample = BITS_TO_BYTES(bpp * MicroTileWidth * MicroTileHeight); UINT_32 samplesPerSplit = pTileInfo->tileSplitBytes / tileSizePerSample; if (samplesPerSplit < numSamples) { UINT_32 dccFastClearByteAlign = HwlGetPipes(pTileInfo) * m_pipeInterleaveBytes * 256; UINT_32 bytesPerSplit = BITS_TO_BYTES((*pPitch) * height * bpp * samplesPerSplit); ADDR_ASSERT(IsPow2(dccFastClearByteAlign)); if (0 != (bytesPerSplit & (dccFastClearByteAlign - 1))) { UINT_32 dccFastClearPixelAlign = dccFastClearByteAlign / BITS_TO_BYTES(bpp) / samplesPerSplit; UINT_32 macroTilePixelAlign = (*pPitchAlign) * heightAlign; if ((dccFastClearPixelAlign >= macroTilePixelAlign) && ((dccFastClearPixelAlign % macroTilePixelAlign) == 0)) { UINT_32 dccFastClearPitchAlignInMacroTile = dccFastClearPixelAlign / macroTilePixelAlign; UINT_32 heightInMacroTile = height / heightAlign; while ((heightInMacroTile > 1) && ((heightInMacroTile % 2) == 0) && (dccFastClearPitchAlignInMacroTile > 1) && ((dccFastClearPitchAlignInMacroTile % 2) == 0)) { heightInMacroTile >>= 1; dccFastClearPitchAlignInMacroTile >>= 1; } UINT_32 dccFastClearPitchAlignInPixels = (*pPitchAlign) * dccFastClearPitchAlignInMacroTile; if (IsPow2(dccFastClearPitchAlignInPixels)) { *pPitch = PowTwoAlign((*pPitch), dccFastClearPitchAlignInPixels); } else { *pPitch += (dccFastClearPitchAlignInPixels - 1); *pPitch /= dccFastClearPitchAlignInPixels; *pPitch *= dccFastClearPitchAlignInPixels; } *pPitchAlign = dccFastClearPitchAlignInPixels; } } } } } /** **************************************************************************************************** * CiLib::HwlComputeMaxBaseAlignments * * @brief * Gets maximum alignments * @return * maximum alignments **************************************************************************************************** */ UINT_32 CiLib::HwlComputeMaxBaseAlignments() const { const UINT_32 pipes = HwlGetPipes(&m_tileTable[0].info); // Initial size is 64 KiB for PRT. UINT_32 maxBaseAlign = 64 * 1024; for (UINT_32 i = 0; i < m_noOfMacroEntries; i++) { // The maximum tile size is 16 byte-per-pixel and either 8-sample or 8-slice. UINT_32 tileSize = m_macroTileTable[i].tileSplitBytes; UINT_32 baseAlign = tileSize * pipes * m_macroTileTable[i].banks * m_macroTileTable[i].bankWidth * m_macroTileTable[i].bankHeight; if (baseAlign > maxBaseAlign) { maxBaseAlign = baseAlign; } } return maxBaseAlign; } /** **************************************************************************************************** * CiLib::HwlComputeMaxMetaBaseAlignments * * @brief * Gets maximum alignments for metadata * @return * maximum alignments for metadata **************************************************************************************************** */ UINT_32 CiLib::HwlComputeMaxMetaBaseAlignments() const { UINT_32 maxBank = 1; for (UINT_32 i = 0; i < m_noOfMacroEntries; i++) { if (SupportDccAndTcCompatibility() && IsMacroTiled(m_tileTable[i].mode)) { maxBank = Max(maxBank, m_macroTileTable[i].banks); } } return SiLib::HwlComputeMaxMetaBaseAlignments() * maxBank; } /** **************************************************************************************************** * CiLib::DepthStencilTileCfgMatch * * @brief * Try to find a tile index for stencil which makes its tile config parameters matches to depth * @return * TRUE if such tile index for stencil can be found **************************************************************************************************** */ BOOL_32 CiLib::DepthStencilTileCfgMatch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { BOOL_32 depthStencil2DTileConfigMatch = FALSE; for (INT_32 stencilTileIndex = MinDepth2DThinIndex; stencilTileIndex <= MaxDepth2DThinIndex; stencilTileIndex++) { ADDR_TILEINFO tileInfo = {0}; INT_32 stencilMacroIndex = HwlComputeMacroModeIndex(stencilTileIndex, pIn->flags, 8, pIn->numSamples, &tileInfo); if (stencilMacroIndex != TileIndexNoMacroIndex) { if ((m_macroTileTable[stencilMacroIndex].banks == m_macroTileTable[pOut->macroModeIndex].banks) && (m_macroTileTable[stencilMacroIndex].bankWidth == m_macroTileTable[pOut->macroModeIndex].bankWidth) && (m_macroTileTable[stencilMacroIndex].bankHeight == m_macroTileTable[pOut->macroModeIndex].bankHeight) && (m_macroTileTable[stencilMacroIndex].macroAspectRatio == m_macroTileTable[pOut->macroModeIndex].macroAspectRatio) && (m_macroTileTable[stencilMacroIndex].pipeConfig == m_macroTileTable[pOut->macroModeIndex].pipeConfig)) { if ((pOut->tcCompatible == FALSE) || (tileInfo.tileSplitBytes >= MicroTileWidth * MicroTileHeight * pIn->numSamples)) { depthStencil2DTileConfigMatch = TRUE; pOut->stencilTileIdx = stencilTileIndex; break; } } } else { ADDR_ASSERT_ALWAYS(); } } return depthStencil2DTileConfigMatch; } /** **************************************************************************************************** * CiLib::DepthStencilTileCfgMatch * * @brief * Check if tc compatibility is available * @return * If tc compatibility is not available **************************************************************************************************** */ BOOL_32 CiLib::CheckTcCompatibility( const ADDR_TILEINFO* pTileInfo, ///< [in] input tile info UINT_32 bpp, ///< [in] Bits per pixel AddrTileMode tileMode, ///< [in] input tile mode AddrTileType tileType, ///< [in] input tile type const ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [in] output surf info ) const { BOOL_32 tcCompatible = TRUE; if (IsMacroTiled(tileMode)) { if (tileType != ADDR_DEPTH_SAMPLE_ORDER) { // Turn off tcCompatible for color surface if tileSplit happens. Depth/stencil // tileSplit case was handled at tileIndex selecting time. INT_32 tileIndex = pOut->tileIndex; if ((tileIndex == TileIndexInvalid) && (IsTileInfoAllZero(pTileInfo) == FALSE)) { tileIndex = HwlPostCheckTileIndex(pTileInfo, tileMode, tileType, tileIndex); } if (tileIndex != TileIndexInvalid) { UINT_32 thickness = Thickness(tileMode); ADDR_ASSERT(static_cast(tileIndex) < TileTableSize); // Non-depth entries store a split factor UINT_32 sampleSplit = m_tileTable[tileIndex].info.tileSplitBytes; UINT_32 tileBytes1x = BITS_TO_BYTES(bpp * MicroTilePixels * thickness); UINT_32 colorTileSplit = Max(256u, sampleSplit * tileBytes1x); if (m_rowSize < colorTileSplit) { tcCompatible = FALSE; } } } } else { // Client should not enable tc compatible for linear and 1D tile modes. tcCompatible = FALSE; } return tcCompatible; } } // V1 } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/r800/ciaddrlib.h000066400000000000000000000171711420110115200234520ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file ciaddrlib.h * @brief Contains the CiLib class definition. **************************************************************************************************** */ #ifndef __CI_ADDR_LIB_H__ #define __CI_ADDR_LIB_H__ #include "addrlib1.h" #include "siaddrlib.h" namespace rocr { namespace Addr { namespace V1 { /** **************************************************************************************************** * @brief This class is the CI specific address library * function set. **************************************************************************************************** */ class CiLib : public SiLib { public: /// Creates CiLib object static Addr::Lib* CreateObj(const Client* pClient) { VOID* pMem = Object::ClientAlloc(sizeof(CiLib), pClient); return (pMem != NULL) ? new (pMem) CiLib(pClient) : NULL; } private: CiLib(const Client* pClient); virtual ~CiLib(); protected: // Hwl interface - defined in AddrLib1 virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut); virtual ChipFamily HwlConvertChipFamily( UINT_32 uChipFamily, UINT_32 uChipRevision); virtual BOOL_32 HwlInitGlobalParams( const ADDR_CREATE_INPUT* pCreateIn); virtual ADDR_E_RETURNCODE HwlSetupTileCfg( UINT_32 bpp, INT_32 index, INT_32 macroModeIndex, ADDR_TILEINFO* pInfo, AddrTileMode* pMode = 0, AddrTileType* pType = 0) const; virtual VOID HwlComputeTileDataWidthAndHeightLinear( UINT_32* pMacroWidth, UINT_32* pMacroHeight, UINT_32 bpp, ADDR_TILEINFO* pTileInfo) const; virtual INT_32 HwlComputeMacroModeIndex( INT_32 tileIndex, ADDR_SURFACE_FLAGS flags, UINT_32 bpp, UINT_32 numSamples, ADDR_TILEINFO* pTileInfo, AddrTileMode* pTileMode = NULL, AddrTileType* pTileType = NULL ) const; // Sub-hwl interface - defined in EgBasedLib virtual VOID HwlSetupTileInfo( AddrTileMode tileMode, ADDR_SURFACE_FLAGS flags, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, ADDR_TILEINFO* inputTileInfo, ADDR_TILEINFO* outputTileInfo, AddrTileType inTileType, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual INT_32 HwlPostCheckTileIndex( const ADDR_TILEINFO* pInfo, AddrTileMode mode, AddrTileType type, INT curIndex = TileIndexInvalid) const; virtual VOID HwlFmaskPreThunkSurfInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pFmaskIn, const ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut, ADDR_COMPUTE_SURFACE_INFO_INPUT* pSurfIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut) const; virtual VOID HwlFmaskPostThunkSurfInfo( const ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut) const; virtual AddrTileMode HwlDegradeThickTileMode( AddrTileMode baseTileMode, UINT_32 numSlices, UINT_32* pBytesPerTile) const; virtual VOID HwlOverrideTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; virtual VOID HwlOptimizeTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; virtual VOID HwlSelectTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; /// Overwrite tile setting to PRT virtual VOID HwlSetPrtTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; virtual ADDR_E_RETURNCODE HwlComputeDccInfo( const ADDR_COMPUTE_DCCINFO_INPUT* pIn, ADDR_COMPUTE_DCCINFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeCmaskAddrFromCoord( const ADDR_COMPUTE_CMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_CMASK_ADDRFROMCOORD_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeHtileAddrFromCoord( const ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) const; virtual UINT_32 HwlComputeMaxBaseAlignments() const; virtual UINT_32 HwlComputeMaxMetaBaseAlignments() const; virtual VOID HwlPadDimensions( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples, ADDR_TILEINFO* pTileInfo, UINT_32 mipLevel, UINT_32* pPitch, UINT_32 *PitchAlign, UINT_32 height, UINT_32 heightAlign) const; virtual VOID HwlComputeSurfaceAlignmentsMacroTiled( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 mipLevel, UINT_32 numSamples, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; private: VOID ReadGbTileMode( UINT_32 regValue, TileConfig* pCfg) const; VOID ReadGbMacroTileCfg( UINT_32 regValue, ADDR_TILEINFO* pCfg) const; private: BOOL_32 InitTileSettingTable( const UINT_32 *pSetting, UINT_32 noOfEntries); BOOL_32 InitMacroTileCfgTable( const UINT_32 *pSetting, UINT_32 noOfEntries); UINT_64 HwlComputeMetadataNibbleAddress( UINT_64 uncompressedDataByteAddress, UINT_64 dataBaseByteAddress, UINT_64 metadataBaseByteAddress, UINT_32 metadataBitSize, UINT_32 elementBitSize, UINT_32 blockByteSize, UINT_32 pipeInterleaveBytes, UINT_32 numOfPipes, UINT_32 numOfBanks, UINT_32 numOfSamplesPerSplit) const; BOOL_32 DepthStencilTileCfgMatch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; BOOL_32 CheckTcCompatibility(const ADDR_TILEINFO* pTileInfo, UINT_32 bpp, AddrTileMode tileMode, AddrTileType tileType, const ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; BOOL_32 SupportDccAndTcCompatibility() const { return ((m_settings.isVolcanicIslands == TRUE) || (m_configFlags.forceDccAndTcCompat == TRUE)); } static const UINT_32 MacroTileTableSize = 16; static const UINT_32 PrtMacroModeOffset = MacroTileTableSize / 2; static const INT_32 MinDepth2DThinIndex = 0; static const INT_32 MaxDepth2DThinIndex = 4; static const INT_32 Depth1DThinIndex = 5; ADDR_TILEINFO m_macroTileTable[MacroTileTableSize]; UINT_32 m_noOfMacroEntries; BOOL_32 m_allowNonDispThickModes; }; } // V1 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/r800/egbaddrlib.cpp000066400000000000000000004345401420110115200241520ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file egbaddrlib.cpp * @brief Contains the EgBasedLib class implementation. **************************************************************************************************** */ #include "egbaddrlib.h" #include "util/macros.h" namespace rocr { namespace Addr { namespace V1 { /** **************************************************************************************************** * EgBasedLib::EgBasedLib * * @brief * Constructor * * @note * **************************************************************************************************** */ EgBasedLib::EgBasedLib(const Client* pClient) : Lib(pClient), m_ranks(0), m_logicalBanks(0), m_bankInterleave(1) { } /** **************************************************************************************************** * EgBasedLib::~EgBasedLib * * @brief * Destructor **************************************************************************************************** */ EgBasedLib::~EgBasedLib() { } /** **************************************************************************************************** * EgBasedLib::DispatchComputeSurfaceInfo * * @brief * Compute surface sizes include padded pitch,height,slices,total size in bytes, * meanwhile output suitable tile mode and base alignment might be changed in this * call as well. Results are returned through output parameters. * * @return * TRUE if no error occurs **************************************************************************************************** */ BOOL_32 EgBasedLib::DispatchComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { AddrTileMode tileMode = pIn->tileMode; UINT_32 bpp = pIn->bpp; UINT_32 numSamples = pIn->numSamples; UINT_32 numFrags = ((pIn->numFrags == 0) ? numSamples : pIn->numFrags); UINT_32 pitch = pIn->width; UINT_32 height = pIn->height; UINT_32 numSlices = pIn->numSlices; UINT_32 mipLevel = pIn->mipLevel; ADDR_SURFACE_FLAGS flags = pIn->flags; ADDR_TILEINFO tileInfoDef = {0}; ADDR_TILEINFO* pTileInfo = &tileInfoDef; UINT_32 padDims = 0; BOOL_32 valid; if (pIn->flags.disallowLargeThickDegrade == 0) { tileMode = DegradeLargeThickTile(tileMode, bpp); } // Only override numSamples for NI above if (m_chipFamily >= ADDR_CHIP_FAMILY_NI) { if (numFrags != numSamples) // This means EQAA { // The real surface size needed is determined by number of fragments numSamples = numFrags; } // Save altered numSamples in pOut pOut->numSamples = numSamples; } // Caller makes sure pOut->pTileInfo is not NULL, see HwlComputeSurfaceInfo ADDR_ASSERT(pOut->pTileInfo); if (pOut->pTileInfo != NULL) { pTileInfo = pOut->pTileInfo; } // Set default values if (pIn->pTileInfo != NULL) { if (pTileInfo != pIn->pTileInfo) { *pTileInfo = *pIn->pTileInfo; } } else { memset(pTileInfo, 0, sizeof(ADDR_TILEINFO)); } // For macro tile mode, we should calculate default tiling parameters HwlSetupTileInfo(tileMode, flags, bpp, pitch, height, numSamples, pIn->pTileInfo, pTileInfo, pIn->tileType, pOut); if (flags.cube) { if (mipLevel == 0) { padDims = 2; } if (numSlices == 1) { // This is calculating one face, remove cube flag flags.cube = 0; } } switch (tileMode) { case ADDR_TM_LINEAR_GENERAL://fall through case ADDR_TM_LINEAR_ALIGNED: valid = ComputeSurfaceInfoLinear(pIn, pOut, padDims); break; case ADDR_TM_1D_TILED_THIN1://fall through case ADDR_TM_1D_TILED_THICK: valid = ComputeSurfaceInfoMicroTiled(pIn, pOut, padDims, tileMode); break; case ADDR_TM_2D_TILED_THIN1: //fall through case ADDR_TM_2D_TILED_THICK: //fall through case ADDR_TM_3D_TILED_THIN1: //fall through case ADDR_TM_3D_TILED_THICK: //fall through case ADDR_TM_2D_TILED_XTHICK: //fall through case ADDR_TM_3D_TILED_XTHICK: //fall through case ADDR_TM_PRT_TILED_THIN1: //fall through case ADDR_TM_PRT_2D_TILED_THIN1://fall through case ADDR_TM_PRT_3D_TILED_THIN1://fall through case ADDR_TM_PRT_TILED_THICK: //fall through case ADDR_TM_PRT_2D_TILED_THICK://fall through case ADDR_TM_PRT_3D_TILED_THICK: valid = ComputeSurfaceInfoMacroTiled(pIn, pOut, padDims, tileMode); break; default: valid = FALSE; ADDR_ASSERT_ALWAYS(); break; } return valid; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceInfoLinear * * @brief * Compute linear surface sizes include padded pitch, height, slices, total size in * bytes, meanwhile alignments as well. Since it is linear mode, so output tile mode * will not be changed here. Results are returned through output parameters. * * @return * TRUE if no error occurs **************************************************************************************************** */ BOOL_32 EgBasedLib::ComputeSurfaceInfoLinear( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] Input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut, ///< [out] Output structure UINT_32 padDims ///< [in] Dimensions to padd ) const { UINT_32 expPitch = pIn->width; UINT_32 expHeight = pIn->height; UINT_32 expNumSlices = pIn->numSlices; // No linear MSAA on real H/W, keep this for TGL UINT_32 numSamples = pOut->numSamples; const UINT_32 microTileThickness = 1; // // Compute the surface alignments. // ComputeSurfaceAlignmentsLinear(pIn->tileMode, pIn->bpp, pIn->flags, &pOut->baseAlign, &pOut->pitchAlign, &pOut->heightAlign); if ((pIn->tileMode == ADDR_TM_LINEAR_GENERAL) && pIn->flags.color && (pIn->height > 1)) { #if !ALT_TEST // When linear_general surface is accessed in multiple lines, it requires 8 pixels in pitch // alignment since PITCH_TILE_MAX is in unit of 8 pixels. // It is OK if it is accessed per line. ADDR_ASSERT((pIn->width % 8) == 0); #endif } pOut->depthAlign = microTileThickness; expPitch = HwlPreHandleBaseLvl3xPitch(pIn, expPitch); // // Pad pitch and height to the required granularities. // PadDimensions(pIn->tileMode, pIn->bpp, pIn->flags, numSamples, pOut->pTileInfo, padDims, pIn->mipLevel, &expPitch, &pOut->pitchAlign, &expHeight, pOut->heightAlign, &expNumSlices, microTileThickness); expPitch = HwlPostHandleBaseLvl3xPitch(pIn, expPitch); // // Adjust per HWL // UINT_64 logicalSliceSize; logicalSliceSize = HwlGetSizeAdjustmentLinear(pIn->tileMode, pIn->bpp, numSamples, pOut->baseAlign, pOut->pitchAlign, &expPitch, &expHeight, &pOut->heightAlign); if ((pIn->pitchAlign != 0) || (pIn->heightAlign != 0)) { if (pIn->pitchAlign != 0) { ADDR_ASSERT((pIn->pitchAlign % pOut->pitchAlign) == 0); pOut->pitchAlign = pIn->pitchAlign; if (IsPow2(pOut->pitchAlign)) { expPitch = PowTwoAlign(expPitch, pOut->pitchAlign); } else { expPitch += pOut->pitchAlign - 1; expPitch /= pOut->pitchAlign; expPitch *= pOut->pitchAlign; } } if (pIn->heightAlign != 0) { ADDR_ASSERT((pIn->heightAlign % pOut->heightAlign) == 0); pOut->heightAlign = pIn->heightAlign; if (IsPow2(pOut->heightAlign)) { expHeight = PowTwoAlign(expHeight, pOut->heightAlign); } else { expHeight += pOut->heightAlign - 1; expHeight /= pOut->heightAlign; expHeight *= pOut->heightAlign; } } logicalSliceSize = BITS_TO_BYTES(expPitch * expHeight * pIn->bpp); } pOut->pitch = expPitch; pOut->height = expHeight; pOut->depth = expNumSlices; pOut->surfSize = logicalSliceSize * expNumSlices; pOut->tileMode = pIn->tileMode; return TRUE; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceInfoMicroTiled * * @brief * Compute 1D/Micro Tiled surface sizes include padded pitch, height, slices, total * size in bytes, meanwhile alignments as well. Results are returned through output * parameters. * * @return * TRUE if no error occurs **************************************************************************************************** */ BOOL_32 EgBasedLib::ComputeSurfaceInfoMicroTiled( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] Input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut, ///< [out] Output structure UINT_32 padDims, ///< [in] Dimensions to padd AddrTileMode expTileMode ///< [in] Expected tile mode ) const { BOOL_32 valid = TRUE; UINT_32 microTileThickness; UINT_32 expPitch = pIn->width; UINT_32 expHeight = pIn->height; UINT_32 expNumSlices = pIn->numSlices; // No 1D MSAA on real H/W, keep this for TGL UINT_32 numSamples = pOut->numSamples; // // Compute the micro tile thickness. // microTileThickness = Thickness(expTileMode); // // Extra override for mip levels // if (pIn->mipLevel > 0) { // // Reduce tiling mode from thick to thin if the number of slices is less than the // micro tile thickness. // if ((expTileMode == ADDR_TM_1D_TILED_THICK) && (expNumSlices < ThickTileThickness)) { expTileMode = HwlDegradeThickTileMode(ADDR_TM_1D_TILED_THICK, expNumSlices, NULL); if (expTileMode != ADDR_TM_1D_TILED_THICK) { microTileThickness = 1; } } } // // Compute the surface restrictions. // ComputeSurfaceAlignmentsMicroTiled(expTileMode, pIn->bpp, pIn->flags, pIn->mipLevel, numSamples, &pOut->baseAlign, &pOut->pitchAlign, &pOut->heightAlign); pOut->depthAlign = microTileThickness; // // Pad pitch and height to the required granularities. // Compute surface size. // Return parameters. // PadDimensions(expTileMode, pIn->bpp, pIn->flags, numSamples, pOut->pTileInfo, padDims, pIn->mipLevel, &expPitch, &pOut->pitchAlign, &expHeight, pOut->heightAlign, &expNumSlices, microTileThickness); // // Get HWL specific pitch adjustment // UINT_64 logicalSliceSize = HwlGetSizeAdjustmentMicroTiled(microTileThickness, pIn->bpp, pIn->flags, numSamples, pOut->baseAlign, pOut->pitchAlign, &expPitch, &expHeight); pOut->pitch = expPitch; pOut->height = expHeight; pOut->depth = expNumSlices; pOut->surfSize = logicalSliceSize * expNumSlices; pOut->tileMode = expTileMode; return valid; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceInfoMacroTiled * * @brief * Compute 2D/macro tiled surface sizes include padded pitch, height, slices, total * size in bytes, meanwhile output suitable tile mode and alignments might be changed * in this call as well. Results are returned through output parameters. * * @return * TRUE if no error occurs **************************************************************************************************** */ BOOL_32 EgBasedLib::ComputeSurfaceInfoMacroTiled( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] Input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut, ///< [out] Output structure UINT_32 padDims, ///< [in] Dimensions to padd AddrTileMode expTileMode ///< [in] Expected tile mode ) const { BOOL_32 valid = TRUE; AddrTileMode origTileMode = expTileMode; UINT_32 microTileThickness; UINT_32 paddedPitch; UINT_32 paddedHeight; UINT_64 bytesPerSlice; UINT_32 expPitch = pIn->width; UINT_32 expHeight = pIn->height; UINT_32 expNumSlices = pIn->numSlices; UINT_32 numSamples = pOut->numSamples; // // Compute the surface restrictions as base // SanityCheckMacroTiled is called in ComputeSurfaceAlignmentsMacroTiled // valid = ComputeSurfaceAlignmentsMacroTiled(expTileMode, pIn->bpp, pIn->flags, pIn->mipLevel, numSamples, pOut); if (valid) { // // Compute the micro tile thickness. // microTileThickness = Thickness(expTileMode); // // Find the correct tiling mode for mip levels // if (pIn->mipLevel > 0) { // // Try valid tile mode // expTileMode = ComputeSurfaceMipLevelTileMode(expTileMode, pIn->bpp, expPitch, expHeight, expNumSlices, numSamples, pOut->blockWidth, pOut->blockHeight, pOut->pTileInfo); if (!IsMacroTiled(expTileMode)) // Downgraded to micro-tiled { return ComputeSurfaceInfoMicroTiled(pIn, pOut, padDims, expTileMode); } else if (microTileThickness != Thickness(expTileMode)) { // // Re-compute if thickness changed since bank-height may be changed! // return ComputeSurfaceInfoMacroTiled(pIn, pOut, padDims, expTileMode); } } paddedPitch = expPitch; paddedHeight = expHeight; // // Re-cal alignment // if (expTileMode != origTileMode) // Tile mode is changed but still macro-tiled { valid = ComputeSurfaceAlignmentsMacroTiled(expTileMode, pIn->bpp, pIn->flags, pIn->mipLevel, numSamples, pOut); } // // Do padding // PadDimensions(expTileMode, pIn->bpp, pIn->flags, numSamples, pOut->pTileInfo, padDims, pIn->mipLevel, &paddedPitch, &pOut->pitchAlign, &paddedHeight, pOut->heightAlign, &expNumSlices, microTileThickness); if (pIn->flags.qbStereo && (pOut->pStereoInfo != NULL)) { UINT_32 stereoHeightAlign = HwlStereoCheckRightOffsetPadding(pOut->pTileInfo); if (stereoHeightAlign != 0) { paddedHeight = PowTwoAlign(paddedHeight, stereoHeightAlign); } } if ((pIn->flags.needEquation == TRUE) && (m_chipFamily == ADDR_CHIP_FAMILY_SI) && (pIn->numMipLevels > 1) && (pIn->mipLevel == 0)) { BOOL_32 convertTo1D = FALSE; ADDR_ASSERT(Thickness(expTileMode) == 1); for (UINT_32 i = 1; i < pIn->numMipLevels; i++) { UINT_32 mipPitch = Max(1u, paddedPitch >> i); UINT_32 mipHeight = Max(1u, pIn->height >> i); UINT_32 mipSlices = pIn->flags.volume ? Max(1u, pIn->numSlices >> i) : pIn->numSlices; expTileMode = ComputeSurfaceMipLevelTileMode(expTileMode, pIn->bpp, mipPitch, mipHeight, mipSlices, numSamples, pOut->blockWidth, pOut->blockHeight, pOut->pTileInfo); if (IsMacroTiled(expTileMode)) { if (PowTwoAlign(mipPitch, pOut->blockWidth) != PowTwoAlign(mipPitch, pOut->pitchAlign)) { convertTo1D = TRUE; break; } } else { break; } } if (convertTo1D) { return ComputeSurfaceInfoMicroTiled(pIn, pOut, padDims, ADDR_TM_1D_TILED_THIN1); } } pOut->pitch = paddedPitch; // Put this check right here to workaround special mipmap cases which the original height // is needed. // The original height is pre-stored in pOut->height in PostComputeMipLevel and // pOut->pitch is needed in HwlCheckLastMacroTiledLvl, too. if (m_configFlags.checkLast2DLevel && (numSamples == 1)) // Don't check MSAA { // Set a TRUE in pOut if next Level is the first 1D sub level HwlCheckLastMacroTiledLvl(pIn, pOut); } pOut->height = paddedHeight; pOut->depth = expNumSlices; // // Compute the size of a slice. // bytesPerSlice = BITS_TO_BYTES(static_cast(paddedPitch) * paddedHeight * NextPow2(pIn->bpp) * numSamples); pOut->surfSize = bytesPerSlice * expNumSlices; pOut->tileMode = expTileMode; pOut->depthAlign = microTileThickness; } // if (valid) return valid; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceAlignmentsLinear * * @brief * Compute linear surface alignment, calculation results are returned through * output parameters. * * @return * TRUE if no error occurs **************************************************************************************************** */ BOOL_32 EgBasedLib::ComputeSurfaceAlignmentsLinear( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32* pBaseAlign, ///< [out] base address alignment in bytes UINT_32* pPitchAlign, ///< [out] pitch alignment in pixels UINT_32* pHeightAlign ///< [out] height alignment in pixels ) const { BOOL_32 valid = TRUE; switch (tileMode) { case ADDR_TM_LINEAR_GENERAL: // // The required base alignment and pitch and height granularities is to 1 element. // *pBaseAlign = (bpp > 8) ? bpp / 8 : 1; *pPitchAlign = 1; *pHeightAlign = 1; break; case ADDR_TM_LINEAR_ALIGNED: // // The required alignment for base is the pipe interleave size. // The required granularity for pitch is hwl dependent. // The required granularity for height is one row. // *pBaseAlign = m_pipeInterleaveBytes; *pPitchAlign = HwlGetPitchAlignmentLinear(bpp, flags); *pHeightAlign = 1; break; default: *pBaseAlign = 1; *pPitchAlign = 1; *pHeightAlign = 1; ADDR_UNHANDLED_CASE(); break; } AdjustPitchAlignment(flags, pPitchAlign); return valid; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceAlignmentsMicroTiled * * @brief * Compute 1D tiled surface alignment, calculation results are returned through * output parameters. * * @return * TRUE if no error occurs **************************************************************************************************** */ BOOL_32 EgBasedLib::ComputeSurfaceAlignmentsMicroTiled( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 mipLevel, ///< [in] mip level UINT_32 numSamples, ///< [in] number of samples UINT_32* pBaseAlign, ///< [out] base address alignment in bytes UINT_32* pPitchAlign, ///< [out] pitch alignment in pixels UINT_32* pHeightAlign ///< [out] height alignment in pixels ) const { BOOL_32 valid = TRUE; // // The required alignment for base is the pipe interleave size. // *pBaseAlign = m_pipeInterleaveBytes; *pPitchAlign = HwlGetPitchAlignmentMicroTiled(tileMode, bpp, flags, numSamples); *pHeightAlign = MicroTileHeight; AdjustPitchAlignment(flags, pPitchAlign); if (flags.czDispCompatible && (mipLevel == 0)) { *pBaseAlign = PowTwoAlign(*pBaseAlign, 4096); //Base address MOD 4096 = 0 *pPitchAlign = PowTwoAlign(*pPitchAlign, 512 / (BITS_TO_BYTES(bpp))); //(8 lines * pitch * bytes per pixel) MOD 4096 = 0 } // end Carrizo workaround for 1D tilling return valid; } /** **************************************************************************************************** * EgBasedLib::HwlReduceBankWidthHeight * * @brief * Additional checks, reduce bankHeight/bankWidth if needed and possible * tileSize*BANK_WIDTH*BANK_HEIGHT <= ROW_SIZE * * @return * TRUE if no error occurs **************************************************************************************************** */ BOOL_32 EgBasedLib::HwlReduceBankWidthHeight( UINT_32 tileSize, ///< [in] tile size UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 numSamples, ///< [in] number of samples UINT_32 bankHeightAlign, ///< [in] bank height alignment UINT_32 pipes, ///< [in] pipes ADDR_TILEINFO* pTileInfo ///< [in,out] bank structure. ) const { UINT_32 macroAspectAlign; BOOL_32 valid = TRUE; if (tileSize * pTileInfo->bankWidth * pTileInfo->bankHeight > m_rowSize) { BOOL_32 stillGreater = TRUE; // Try reducing bankWidth first if (stillGreater && pTileInfo->bankWidth > 1) { while (stillGreater && pTileInfo->bankWidth > 0) { pTileInfo->bankWidth >>= 1; if (pTileInfo->bankWidth == 0) { pTileInfo->bankWidth = 1; break; } stillGreater = tileSize * pTileInfo->bankWidth * pTileInfo->bankHeight > m_rowSize; } // bankWidth is reduced above, so we need to recalculate bankHeight and ratio bankHeightAlign = Max(1u, m_pipeInterleaveBytes * m_bankInterleave / (tileSize * pTileInfo->bankWidth) ); // We cannot increase bankHeight so just assert this case. ADDR_ASSERT((pTileInfo->bankHeight % bankHeightAlign) == 0); if (numSamples == 1) { macroAspectAlign = Max(1u, m_pipeInterleaveBytes * m_bankInterleave / (tileSize * pipes * pTileInfo->bankWidth) ); pTileInfo->macroAspectRatio = PowTwoAlign(pTileInfo->macroAspectRatio, macroAspectAlign); } } // Early quit bank_height degradation for "64" bit z buffer if (flags.depth && bpp >= 64) { stillGreater = FALSE; } // Then try reducing bankHeight if (stillGreater && pTileInfo->bankHeight > bankHeightAlign) { while (stillGreater && pTileInfo->bankHeight > bankHeightAlign) { pTileInfo->bankHeight >>= 1; if (pTileInfo->bankHeight < bankHeightAlign) { pTileInfo->bankHeight = bankHeightAlign; break; } stillGreater = tileSize * pTileInfo->bankWidth * pTileInfo->bankHeight > m_rowSize; } } valid = !stillGreater; // Generate a warning if we still fail to meet this constraint if (valid == FALSE) { ADDR_WARN( 0, ("TILE_SIZE(%d)*BANK_WIDTH(%d)*BANK_HEIGHT(%d) <= ROW_SIZE(%d)", tileSize, pTileInfo->bankWidth, pTileInfo->bankHeight, m_rowSize)); } } return valid; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceAlignmentsMacroTiled * * @brief * Compute 2D tiled surface alignment, calculation results are returned through * output parameters. * * @return * TRUE if no error occurs **************************************************************************************************** */ BOOL_32 EgBasedLib::ComputeSurfaceAlignmentsMacroTiled( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 mipLevel, ///< [in] mip level UINT_32 numSamples, ///< [in] number of samples ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [in,out] Surface output ) const { ADDR_TILEINFO* pTileInfo = pOut->pTileInfo; BOOL_32 valid = SanityCheckMacroTiled(pTileInfo); if (valid) { UINT_32 macroTileWidth; UINT_32 macroTileHeight; UINT_32 tileSize; UINT_32 bankHeightAlign; UINT_32 macroAspectAlign; UINT_32 thickness = Thickness(tileMode); UINT_32 pipes = HwlGetPipes(pTileInfo); // // Align bank height first according to latest h/w spec // // tile_size = MIN(tile_split, 64 * tile_thickness * element_bytes * num_samples) tileSize = Min(pTileInfo->tileSplitBytes, BITS_TO_BYTES(64 * thickness * bpp * numSamples)); // bank_height_align = // MAX(1, (pipe_interleave_bytes * bank_interleave)/(tile_size*bank_width)) bankHeightAlign = Max(1u, m_pipeInterleaveBytes * m_bankInterleave / (tileSize * pTileInfo->bankWidth) ); pTileInfo->bankHeight = PowTwoAlign(pTileInfo->bankHeight, bankHeightAlign); // num_pipes * bank_width * macro_tile_aspect >= // (pipe_interleave_size * bank_interleave) / tile_size if (numSamples == 1) { // this restriction is only for mipmap (mipmap's numSamples must be 1) macroAspectAlign = Max(1u, m_pipeInterleaveBytes * m_bankInterleave / (tileSize * pipes * pTileInfo->bankWidth) ); pTileInfo->macroAspectRatio = PowTwoAlign(pTileInfo->macroAspectRatio, macroAspectAlign); } valid = HwlReduceBankWidthHeight(tileSize, bpp, flags, numSamples, bankHeightAlign, pipes, pTileInfo); // // The required granularity for pitch is the macro tile width. // macroTileWidth = MicroTileWidth * pTileInfo->bankWidth * pipes * pTileInfo->macroAspectRatio; pOut->pitchAlign = macroTileWidth; pOut->blockWidth = macroTileWidth; AdjustPitchAlignment(flags, &pOut->pitchAlign); // // The required granularity for height is the macro tile height. // macroTileHeight = MicroTileHeight * pTileInfo->bankHeight * pTileInfo->banks / pTileInfo->macroAspectRatio; pOut->heightAlign = macroTileHeight; pOut->blockHeight = macroTileHeight; // // Compute base alignment // pOut->baseAlign = pipes * pTileInfo->bankWidth * pTileInfo->banks * pTileInfo->bankHeight * tileSize; HwlComputeSurfaceAlignmentsMacroTiled(tileMode, bpp, flags, mipLevel, numSamples, pOut); } return valid; } /** **************************************************************************************************** * EgBasedLib::SanityCheckMacroTiled * * @brief * Check if macro-tiled parameters are valid * @return * TRUE if valid **************************************************************************************************** */ BOOL_32 EgBasedLib::SanityCheckMacroTiled( ADDR_TILEINFO* pTileInfo ///< [in] macro-tiled parameters ) const { BOOL_32 valid = TRUE; ASSERTED UINT_32 numPipes = HwlGetPipes(pTileInfo); switch (pTileInfo->banks) { case 2: //fall through case 4: //fall through case 8: //fall through case 16: break; default: valid = FALSE; break; } if (valid) { switch (pTileInfo->bankWidth) { case 1: //fall through case 2: //fall through case 4: //fall through case 8: break; default: valid = FALSE; break; } } if (valid) { switch (pTileInfo->bankHeight) { case 1: //fall through case 2: //fall through case 4: //fall through case 8: break; default: valid = FALSE; break; } } if (valid) { switch (pTileInfo->macroAspectRatio) { case 1: //fall through case 2: //fall through case 4: //fall through case 8: break; default: valid = FALSE; break; } } if (valid) { if (pTileInfo->banks < pTileInfo->macroAspectRatio) { // This will generate macro tile height <= 1 valid = FALSE; } } if (valid) { if (pTileInfo->tileSplitBytes > m_rowSize) { ADDR_WARN(0, ("tileSplitBytes is bigger than row size")); } } if (valid) { valid = HwlSanityCheckMacroTiled(pTileInfo); } ADDR_ASSERT(valid == TRUE); // Add this assert for guidance ADDR_ASSERT(numPipes * pTileInfo->banks >= 4); return valid; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceMipLevelTileMode * * @brief * Compute valid tile mode for surface mipmap sub-levels * * @return * Suitable tile mode **************************************************************************************************** */ AddrTileMode EgBasedLib::ComputeSurfaceMipLevelTileMode( AddrTileMode baseTileMode, ///< [in] base tile mode UINT_32 bpp, ///< [in] bits per pixels UINT_32 pitch, ///< [in] current level pitch UINT_32 height, ///< [in] current level height UINT_32 numSlices, ///< [in] current number of slices UINT_32 numSamples, ///< [in] number of samples UINT_32 pitchAlign, ///< [in] pitch alignment UINT_32 heightAlign, ///< [in] height alignment ADDR_TILEINFO* pTileInfo ///< [in] ptr to bank structure ) const { UINT_64 bytesPerSlice; (void)bytesPerSlice; UINT_32 bytesPerTile; AddrTileMode expTileMode = baseTileMode; UINT_32 microTileThickness = Thickness(expTileMode); UINT_32 interleaveSize = m_pipeInterleaveBytes * m_bankInterleave; // // Compute the size of a slice. // bytesPerSlice = BITS_TO_BYTES(static_cast(pitch) * height * bpp * numSamples); bytesPerTile = BITS_TO_BYTES(MicroTilePixels * microTileThickness * NextPow2(bpp) * numSamples); // // Reduce tiling mode from thick to thin if the number of slices is less than the // micro tile thickness. // if (numSlices < microTileThickness) { expTileMode = HwlDegradeThickTileMode(expTileMode, numSlices, &bytesPerTile); } if (bytesPerTile > pTileInfo->tileSplitBytes) { bytesPerTile = pTileInfo->tileSplitBytes; } UINT_32 threshold1 = bytesPerTile * HwlGetPipes(pTileInfo) * pTileInfo->bankWidth * pTileInfo->macroAspectRatio; UINT_32 threshold2 = bytesPerTile * pTileInfo->bankWidth * pTileInfo->bankHeight; // // Reduce the tile mode from 2D/3D to 1D in following conditions // switch (expTileMode) { case ADDR_TM_2D_TILED_THIN1: //fall through case ADDR_TM_3D_TILED_THIN1: case ADDR_TM_PRT_TILED_THIN1: case ADDR_TM_PRT_2D_TILED_THIN1: case ADDR_TM_PRT_3D_TILED_THIN1: if ((pitch < pitchAlign) || (height < heightAlign) || (interleaveSize > threshold1) || (interleaveSize > threshold2)) { expTileMode = ADDR_TM_1D_TILED_THIN1; } break; case ADDR_TM_2D_TILED_THICK: //fall through case ADDR_TM_3D_TILED_THICK: case ADDR_TM_2D_TILED_XTHICK: case ADDR_TM_3D_TILED_XTHICK: case ADDR_TM_PRT_TILED_THICK: case ADDR_TM_PRT_2D_TILED_THICK: case ADDR_TM_PRT_3D_TILED_THICK: if ((pitch < pitchAlign) || (height < heightAlign)) { expTileMode = ADDR_TM_1D_TILED_THICK; } break; default: break; } return expTileMode; } /** **************************************************************************************************** * EgBasedLib::HwlGetAlignmentInfoMacroTiled * @brief * Get alignment info for giving tile mode * @return * TRUE if getting alignment is OK **************************************************************************************************** */ BOOL_32 EgBasedLib::HwlGetAlignmentInfoMacroTiled( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] create surface info UINT_32* pPitchAlign, ///< [out] pitch alignment UINT_32* pHeightAlign, ///< [out] height alignment UINT_32* pSizeAlign ///< [out] size alignment ) const { BOOL_32 valid = TRUE; ADDR_ASSERT(IsMacroTiled(pIn->tileMode)); UINT_32 numSamples = (pIn->numFrags == 0) ? pIn->numSamples : pIn->numFrags; ADDR_ASSERT(pIn->pTileInfo); ADDR_TILEINFO tileInfo = *pIn->pTileInfo; ADDR_COMPUTE_SURFACE_INFO_OUTPUT out = {0}; out.pTileInfo = &tileInfo; if (UseTileIndex(pIn->tileIndex)) { out.tileIndex = pIn->tileIndex; out.macroModeIndex = TileIndexInvalid; } HwlSetupTileInfo(pIn->tileMode, pIn->flags, pIn->bpp, pIn->width, pIn->height, numSamples, &tileInfo, &tileInfo, pIn->tileType, &out); valid = ComputeSurfaceAlignmentsMacroTiled(pIn->tileMode, pIn->bpp, pIn->flags, pIn->mipLevel, numSamples, &out); if (valid) { *pPitchAlign = out.pitchAlign; *pHeightAlign = out.heightAlign; *pSizeAlign = out.baseAlign; } return valid; } /** **************************************************************************************************** * EgBasedLib::HwlDegradeThickTileMode * * @brief * Degrades valid tile mode for thick modes if needed * * @return * Suitable tile mode **************************************************************************************************** */ AddrTileMode EgBasedLib::HwlDegradeThickTileMode( AddrTileMode baseTileMode, ///< [in] base tile mode UINT_32 numSlices, ///< [in] current number of slices UINT_32* pBytesPerTile ///< [in,out] pointer to bytes per slice ) const { ADDR_ASSERT(numSlices < Thickness(baseTileMode)); // if pBytesPerTile is NULL, this is a don't-care.... UINT_32 bytesPerTile = pBytesPerTile != NULL ? *pBytesPerTile : 64; AddrTileMode expTileMode = baseTileMode; switch (baseTileMode) { case ADDR_TM_1D_TILED_THICK: expTileMode = ADDR_TM_1D_TILED_THIN1; bytesPerTile >>= 2; break; case ADDR_TM_2D_TILED_THICK: expTileMode = ADDR_TM_2D_TILED_THIN1; bytesPerTile >>= 2; break; case ADDR_TM_3D_TILED_THICK: expTileMode = ADDR_TM_3D_TILED_THIN1; bytesPerTile >>= 2; break; case ADDR_TM_2D_TILED_XTHICK: if (numSlices < ThickTileThickness) { expTileMode = ADDR_TM_2D_TILED_THIN1; bytesPerTile >>= 3; } else { expTileMode = ADDR_TM_2D_TILED_THICK; bytesPerTile >>= 1; } break; case ADDR_TM_3D_TILED_XTHICK: if (numSlices < ThickTileThickness) { expTileMode = ADDR_TM_3D_TILED_THIN1; bytesPerTile >>= 3; } else { expTileMode = ADDR_TM_3D_TILED_THICK; bytesPerTile >>= 1; } break; default: ADDR_ASSERT_ALWAYS(); break; } if (pBytesPerTile != NULL) { *pBytesPerTile = bytesPerTile; } return expTileMode; } /** **************************************************************************************************** * EgBasedLib::DispatchComputeSurfaceAddrFromCoord * * @brief * Compute surface address from given coord (x, y, slice,sample) * * @return * Address in bytes **************************************************************************************************** */ UINT_64 EgBasedLib::DispatchComputeSurfaceAddrFromCoord( const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { UINT_32 x = pIn->x; UINT_32 y = pIn->y; UINT_32 slice = pIn->slice; UINT_32 sample = pIn->sample; UINT_32 bpp = pIn->bpp; UINT_32 pitch = pIn->pitch; UINT_32 height = pIn->height; UINT_32 numSlices = pIn->numSlices; UINT_32 numSamples = ((pIn->numSamples == 0) ? 1 : pIn->numSamples); UINT_32 numFrags = ((pIn->numFrags == 0) ? numSamples : pIn->numFrags); AddrTileMode tileMode = pIn->tileMode; AddrTileType microTileType = pIn->tileType; BOOL_32 ignoreSE = pIn->ignoreSE; BOOL_32 isDepthSampleOrder = pIn->isDepth; ADDR_TILEINFO* pTileInfo = pIn->pTileInfo; UINT_32* pBitPosition = &pOut->bitPosition; UINT_64 addr; // ADDR_DEPTH_SAMPLE_ORDER = non-disp + depth-sample-order if (microTileType == ADDR_DEPTH_SAMPLE_ORDER) { isDepthSampleOrder = TRUE; } if (m_chipFamily >= ADDR_CHIP_FAMILY_NI) { if (numFrags != numSamples) { numSamples = numFrags; ADDR_ASSERT(sample < numSamples); } /// @note /// 128 bit/thick tiled surface doesn't support display tiling and /// mipmap chain must have the same tileType, so please fill tileType correctly if (IsLinear(pIn->tileMode) == FALSE) { if (bpp >= 128 || Thickness(tileMode) > 1) { ADDR_ASSERT(microTileType != ADDR_DISPLAYABLE); } } } switch (tileMode) { case ADDR_TM_LINEAR_GENERAL://fall through case ADDR_TM_LINEAR_ALIGNED: addr = ComputeSurfaceAddrFromCoordLinear(x, y, slice, sample, bpp, pitch, height, numSlices, pBitPosition); break; case ADDR_TM_1D_TILED_THIN1://fall through case ADDR_TM_1D_TILED_THICK: addr = ComputeSurfaceAddrFromCoordMicroTiled(x, y, slice, sample, bpp, pitch, height, numSamples, tileMode, microTileType, isDepthSampleOrder, pBitPosition); break; case ADDR_TM_2D_TILED_THIN1: //fall through case ADDR_TM_2D_TILED_THICK: //fall through case ADDR_TM_3D_TILED_THIN1: //fall through case ADDR_TM_3D_TILED_THICK: //fall through case ADDR_TM_2D_TILED_XTHICK: //fall through case ADDR_TM_3D_TILED_XTHICK: //fall through case ADDR_TM_PRT_TILED_THIN1: //fall through case ADDR_TM_PRT_2D_TILED_THIN1://fall through case ADDR_TM_PRT_3D_TILED_THIN1://fall through case ADDR_TM_PRT_TILED_THICK: //fall through case ADDR_TM_PRT_2D_TILED_THICK://fall through case ADDR_TM_PRT_3D_TILED_THICK: UINT_32 pipeSwizzle; UINT_32 bankSwizzle; if (m_configFlags.useCombinedSwizzle) { ExtractBankPipeSwizzle(pIn->tileSwizzle, pIn->pTileInfo, &bankSwizzle, &pipeSwizzle); } else { pipeSwizzle = pIn->pipeSwizzle; bankSwizzle = pIn->bankSwizzle; } addr = ComputeSurfaceAddrFromCoordMacroTiled(x, y, slice, sample, bpp, pitch, height, numSamples, tileMode, microTileType, ignoreSE, isDepthSampleOrder, pipeSwizzle, bankSwizzle, pTileInfo, pBitPosition); break; default: addr = 0; ADDR_ASSERT_ALWAYS(); break; } return addr; } /** **************************************************************************************************** * EgBasedLib::ComputeMacroTileEquation * * @brief * Computes the address equation in macro tile * @return * If equation can be computed **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::ComputeMacroTileEquation( UINT_32 log2BytesPP, ///< [in] log2 of bytes per pixel AddrTileMode tileMode, ///< [in] tile mode AddrTileType microTileType, ///< [in] micro tiling type ADDR_TILEINFO* pTileInfo, ///< [in] bank structure ADDR_EQUATION* pEquation ///< [out] Equation for addressing in macro tile ) const { ADDR_E_RETURNCODE retCode; // Element equation within a tile retCode = ComputeMicroTileEquation(log2BytesPP, tileMode, microTileType, pEquation); if (retCode == ADDR_OK) { // Tile equesiton with signle pipe bank UINT_32 numPipes = HwlGetPipes(pTileInfo); UINT_32 numPipeBits = Log2(numPipes); for (UINT_32 i = 0; i < Log2(pTileInfo->bankWidth); i++) { pEquation->addr[pEquation->numBits].valid = 1; pEquation->addr[pEquation->numBits].channel = 0; pEquation->addr[pEquation->numBits].index = i + log2BytesPP + 3 + numPipeBits; pEquation->numBits++; } for (UINT_32 i = 0; i < Log2(pTileInfo->bankHeight); i++) { pEquation->addr[pEquation->numBits].valid = 1; pEquation->addr[pEquation->numBits].channel = 1; pEquation->addr[pEquation->numBits].index = i + 3; pEquation->numBits++; } ADDR_EQUATION equation; memset(&equation, 0, sizeof(ADDR_EQUATION)); UINT_32 thresholdX = 32; UINT_32 thresholdY = 32; if (IsPrtNoRotationTileMode(tileMode)) { UINT_32 macroTilePitch = (MicroTileWidth * pTileInfo->bankWidth * numPipes) * pTileInfo->macroAspectRatio; UINT_32 macroTileHeight = (MicroTileHeight * pTileInfo->bankHeight * pTileInfo->banks) / pTileInfo->macroAspectRatio; thresholdX = Log2(macroTilePitch); thresholdY = Log2(macroTileHeight); } // Pipe equation retCode = ComputePipeEquation(log2BytesPP, thresholdX, thresholdY, pTileInfo, &equation); if (retCode == ADDR_OK) { UINT_32 pipeBitStart = Log2(m_pipeInterleaveBytes); if (pEquation->numBits > pipeBitStart) { UINT_32 numLeftShift = pEquation->numBits - pipeBitStart; for (UINT_32 i = 0; i < numLeftShift; i++) { pEquation->addr[pEquation->numBits + equation.numBits - i - 1] = pEquation->addr[pEquation->numBits - i - 1]; pEquation->xor1[pEquation->numBits + equation.numBits - i - 1] = pEquation->xor1[pEquation->numBits - i - 1]; pEquation->xor2[pEquation->numBits + equation.numBits - i - 1] = pEquation->xor2[pEquation->numBits - i - 1]; } } for (UINT_32 i = 0; i < equation.numBits; i++) { pEquation->addr[pipeBitStart + i] = equation.addr[i]; pEquation->xor1[pipeBitStart + i] = equation.xor1[i]; pEquation->xor2[pipeBitStart + i] = equation.xor2[i]; pEquation->numBits++; } // Bank equation memset(&equation, 0, sizeof(ADDR_EQUATION)); retCode = ComputeBankEquation(log2BytesPP, thresholdX, thresholdY, pTileInfo, &equation); if (retCode == ADDR_OK) { UINT_32 bankBitStart = pipeBitStart + numPipeBits + Log2(m_bankInterleave); if (pEquation->numBits > bankBitStart) { UINT_32 numLeftShift = pEquation->numBits - bankBitStart; for (UINT_32 i = 0; i < numLeftShift; i++) { pEquation->addr[pEquation->numBits + equation.numBits - i - 1] = pEquation->addr[pEquation->numBits - i - 1]; pEquation->xor1[pEquation->numBits + equation.numBits - i - 1] = pEquation->xor1[pEquation->numBits - i - 1]; pEquation->xor2[pEquation->numBits + equation.numBits - i - 1] = pEquation->xor2[pEquation->numBits - i - 1]; } } for (UINT_32 i = 0; i < equation.numBits; i++) { pEquation->addr[bankBitStart + i] = equation.addr[i]; pEquation->xor1[bankBitStart + i] = equation.xor1[i]; pEquation->xor2[bankBitStart + i] = equation.xor2[i]; pEquation->numBits++; } } } } return retCode; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceAddrFromCoordMicroTiled * * @brief * Computes the surface address and bit position from a * coordinate for 2D tilied (macro tiled) * @return * The byte address **************************************************************************************************** */ UINT_64 EgBasedLib::ComputeSurfaceAddrFromCoordMacroTiled( UINT_32 x, ///< [in] x coordinate UINT_32 y, ///< [in] y coordinate UINT_32 slice, ///< [in] slice index UINT_32 sample, ///< [in] sample index UINT_32 bpp, ///< [in] bits per pixel UINT_32 pitch, ///< [in] surface pitch, in pixels UINT_32 height, ///< [in] surface height, in pixels UINT_32 numSamples, ///< [in] number of samples AddrTileMode tileMode, ///< [in] tile mode AddrTileType microTileType, ///< [in] micro tiling type BOOL_32 ignoreSE, ///< [in] TRUE if shader enginers can be ignored BOOL_32 isDepthSampleOrder, ///< [in] TRUE if it depth sample ordering is used UINT_32 pipeSwizzle, ///< [in] pipe swizzle UINT_32 bankSwizzle, ///< [in] bank swizzle ADDR_TILEINFO* pTileInfo, ///< [in] bank structure /// **All fields to be valid on entry** UINT_32* pBitPosition ///< [out] bit position, e.g. FMT_1 will use this ) const { UINT_64 addr; UINT_32 microTileBytes; UINT_32 microTileBits; UINT_32 sampleOffset; UINT_32 pixelIndex; UINT_32 pixelOffset; UINT_32 elementOffset; UINT_32 tileSplitSlice; UINT_32 pipe; UINT_32 bank; UINT_64 sliceBytes; UINT_64 sliceOffset; UINT_32 macroTilePitch; UINT_32 macroTileHeight; UINT_32 macroTilesPerRow; UINT_32 macroTilesPerSlice; UINT_64 macroTileBytes; UINT_32 macroTileIndexX; UINT_32 macroTileIndexY; UINT_64 macroTileOffset; UINT_64 totalOffset; UINT_64 pipeInterleaveMask; UINT_64 bankInterleaveMask; UINT_64 pipeInterleaveOffset; UINT_32 bankInterleaveOffset; UINT_64 offset; UINT_32 tileRowIndex; UINT_32 tileColumnIndex; UINT_32 tileIndex; UINT_32 tileOffset; UINT_32 microTileThickness = Thickness(tileMode); // // Compute the number of group, pipe, and bank bits. // UINT_32 numPipes = HwlGetPipes(pTileInfo); UINT_32 numPipeInterleaveBits = Log2(m_pipeInterleaveBytes); UINT_32 numPipeBits = Log2(numPipes); UINT_32 numBankInterleaveBits = Log2(m_bankInterleave); UINT_32 numBankBits = Log2(pTileInfo->banks); // // Compute the micro tile size. // microTileBits = MicroTilePixels * microTileThickness * bpp * numSamples; microTileBytes = microTileBits / 8; // // Compute the pixel index within the micro tile. // pixelIndex = ComputePixelIndexWithinMicroTile(x, y, slice, bpp, tileMode, microTileType); // // Compute the sample offset and pixel offset. // if (isDepthSampleOrder) { // // For depth surfaces, samples are stored contiguously for each element, so the sample // offset is the sample number times the element size. // sampleOffset = sample * bpp; pixelOffset = pixelIndex * bpp * numSamples; } else { // // For color surfaces, all elements for a particular sample are stored contiguously, so // the sample offset is the sample number times the micro tile size divided yBit the number // of samples. // sampleOffset = sample * (microTileBits / numSamples); pixelOffset = pixelIndex * bpp; } // // Compute the element offset. // elementOffset = pixelOffset + sampleOffset; *pBitPosition = static_cast(elementOffset % 8); elementOffset /= 8; //bit-to-byte // // Determine if tiles need to be split across slices. // // If the size of the micro tile is larger than the tile split size, then the tile will be // split across multiple slices. // UINT_32 slicesPerTile = 1; if ((microTileBytes > pTileInfo->tileSplitBytes) && (microTileThickness == 1)) { //don't support for thick mode // // Compute the number of slices per tile. // slicesPerTile = microTileBytes / pTileInfo->tileSplitBytes; // // Compute the tile split slice number for use in rotating the bank. // tileSplitSlice = elementOffset / pTileInfo->tileSplitBytes; // // Adjust the element offset to account for the portion of the tile that is being moved to // a new slice.. // elementOffset %= pTileInfo->tileSplitBytes; // // Adjust the microTileBytes size to tileSplitBytes size since // a new slice.. // microTileBytes = pTileInfo->tileSplitBytes; } else { tileSplitSlice = 0; } // // Compute macro tile pitch and height. // macroTilePitch = (MicroTileWidth * pTileInfo->bankWidth * numPipes) * pTileInfo->macroAspectRatio; macroTileHeight = (MicroTileHeight * pTileInfo->bankHeight * pTileInfo->banks) / pTileInfo->macroAspectRatio; // // Compute the number of bytes per macro tile. Note: bytes of the same bank/pipe actually // macroTileBytes = static_cast(microTileBytes) * (macroTilePitch / MicroTileWidth) * (macroTileHeight / MicroTileHeight) / (numPipes * pTileInfo->banks); // // Compute the number of macro tiles per row. // macroTilesPerRow = pitch / macroTilePitch; // // Compute the offset to the macro tile containing the specified coordinate. // macroTileIndexX = x / macroTilePitch; macroTileIndexY = y / macroTileHeight; macroTileOffset = ((macroTileIndexY * macroTilesPerRow) + macroTileIndexX) * macroTileBytes; // // Compute the number of macro tiles per slice. // macroTilesPerSlice = macroTilesPerRow * (height / macroTileHeight); // // Compute the slice size. // sliceBytes = macroTilesPerSlice * macroTileBytes; // // Compute the slice offset. // sliceOffset = sliceBytes * (tileSplitSlice + slicesPerTile * (slice / microTileThickness)); // // Compute tile offest // tileRowIndex = (y / MicroTileHeight) % pTileInfo->bankHeight; tileColumnIndex = ((x / MicroTileWidth) / numPipes) % pTileInfo->bankWidth; tileIndex = (tileRowIndex * pTileInfo->bankWidth) + tileColumnIndex; tileOffset = tileIndex * microTileBytes; // // Combine the slice offset and macro tile offset with the pixel and sample offsets, accounting // for the pipe and bank bits in the middle of the address. // totalOffset = sliceOffset + macroTileOffset + elementOffset + tileOffset; // // Get the pipe and bank. // // when the tileMode is PRT type, then adjust x and y coordinates if (IsPrtNoRotationTileMode(tileMode)) { x = x % macroTilePitch; y = y % macroTileHeight; } pipe = ComputePipeFromCoord(x, y, slice, tileMode, pipeSwizzle, ignoreSE, pTileInfo); bank = ComputeBankFromCoord(x, y, slice, tileMode, bankSwizzle, tileSplitSlice, pTileInfo); // // Split the offset to put some bits below the pipe+bank bits and some above. // pipeInterleaveMask = (1 << numPipeInterleaveBits) - 1; bankInterleaveMask = (1 << numBankInterleaveBits) - 1; pipeInterleaveOffset = totalOffset & pipeInterleaveMask; bankInterleaveOffset = static_cast((totalOffset >> numPipeInterleaveBits) & bankInterleaveMask); offset = totalOffset >> (numPipeInterleaveBits + numBankInterleaveBits); // // Assemble the address from its components. // addr = pipeInterleaveOffset; // This is to remove /analyze warnings UINT_32 pipeBits = pipe << numPipeInterleaveBits; UINT_32 bankInterleaveBits = bankInterleaveOffset << (numPipeInterleaveBits + numPipeBits); UINT_32 bankBits = bank << (numPipeInterleaveBits + numPipeBits + numBankInterleaveBits); UINT_64 offsetBits = offset << (numPipeInterleaveBits + numPipeBits + numBankInterleaveBits + numBankBits); addr |= pipeBits; addr |= bankInterleaveBits; addr |= bankBits; addr |= offsetBits; return addr; } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceAddrFromCoordMicroTiled * * @brief * Computes the surface address and bit position from a coordinate for 1D tilied * (micro tiled) * @return * The byte address **************************************************************************************************** */ UINT_64 EgBasedLib::ComputeSurfaceAddrFromCoordMicroTiled( UINT_32 x, ///< [in] x coordinate UINT_32 y, ///< [in] y coordinate UINT_32 slice, ///< [in] slice index UINT_32 sample, ///< [in] sample index UINT_32 bpp, ///< [in] bits per pixel UINT_32 pitch, ///< [in] pitch, in pixels UINT_32 height, ///< [in] height, in pixels UINT_32 numSamples, ///< [in] number of samples AddrTileMode tileMode, ///< [in] tile mode AddrTileType microTileType, ///< [in] micro tiling type BOOL_32 isDepthSampleOrder, ///< [in] TRUE if depth sample ordering is used UINT_32* pBitPosition ///< [out] bit position, e.g. FMT_1 will use this ) const { UINT_64 addr = 0; UINT_32 microTileBytes; UINT_64 sliceBytes; UINT_32 microTilesPerRow; UINT_32 microTileIndexX; UINT_32 microTileIndexY; UINT_32 microTileIndexZ; UINT_64 sliceOffset; UINT_64 microTileOffset; UINT_32 sampleOffset; UINT_32 pixelIndex; UINT_32 pixelOffset; UINT_32 microTileThickness = Thickness(tileMode); // // Compute the micro tile size. // microTileBytes = BITS_TO_BYTES(MicroTilePixels * microTileThickness * bpp * numSamples); // // Compute the slice size. // sliceBytes = BITS_TO_BYTES(static_cast(pitch) * height * microTileThickness * bpp * numSamples); // // Compute the number of micro tiles per row. // microTilesPerRow = pitch / MicroTileWidth; // // Compute the micro tile index. // microTileIndexX = x / MicroTileWidth; microTileIndexY = y / MicroTileHeight; microTileIndexZ = slice / microTileThickness; // // Compute the slice offset. // sliceOffset = static_cast(microTileIndexZ) * sliceBytes; // // Compute the offset to the micro tile containing the specified coordinate. // microTileOffset = (static_cast(microTileIndexY) * microTilesPerRow + microTileIndexX) * microTileBytes; // // Compute the pixel index within the micro tile. // pixelIndex = ComputePixelIndexWithinMicroTile(x, y, slice, bpp, tileMode, microTileType); // Compute the sample offset. // if (isDepthSampleOrder) { // // For depth surfaces, samples are stored contiguously for each element, so the sample // offset is the sample number times the element size. // sampleOffset = sample * bpp; pixelOffset = pixelIndex * bpp * numSamples; } else { // // For color surfaces, all elements for a particular sample are stored contiguously, so // the sample offset is the sample number times the micro tile size divided yBit the number // of samples. // sampleOffset = sample * (microTileBytes*8 / numSamples); pixelOffset = pixelIndex * bpp; } // // Compute the bit position of the pixel. Each element is stored with one bit per sample. // UINT_32 elemOffset = sampleOffset + pixelOffset; *pBitPosition = elemOffset % 8; elemOffset /= 8; // // Combine the slice offset, micro tile offset, sample offset, and pixel offsets. // addr = sliceOffset + microTileOffset + elemOffset; return addr; } /** **************************************************************************************************** * EgBasedLib::HwlComputePixelCoordFromOffset * * @brief * Compute pixel coordinate from offset inside a micro tile * @return * N/A **************************************************************************************************** */ VOID EgBasedLib::HwlComputePixelCoordFromOffset( UINT_32 offset, ///< [in] offset inside micro tile in bits UINT_32 bpp, ///< [in] bits per pixel UINT_32 numSamples, ///< [in] number of samples AddrTileMode tileMode, ///< [in] tile mode UINT_32 tileBase, ///< [in] base offset within a tile UINT_32 compBits, ///< [in] component bits actually needed(for planar surface) UINT_32* pX, ///< [out] x coordinate UINT_32* pY, ///< [out] y coordinate UINT_32* pSlice, ///< [out] slice index UINT_32* pSample, ///< [out] sample index AddrTileType microTileType, ///< [in] micro tiling type BOOL_32 isDepthSampleOrder ///< [in] TRUE if depth sample order in microtile is used ) const { UINT_32 x = 0; UINT_32 y = 0; UINT_32 z = 0; UINT_32 thickness = Thickness(tileMode); // For planar surface, we adjust offset acoording to tile base if ((bpp != compBits) && (compBits != 0) && isDepthSampleOrder) { offset -= tileBase; ADDR_ASSERT(microTileType == ADDR_NON_DISPLAYABLE || microTileType == ADDR_DEPTH_SAMPLE_ORDER); bpp = compBits; } UINT_32 sampleTileBits; UINT_32 samplePixelBits; UINT_32 pixelIndex; if (isDepthSampleOrder) { samplePixelBits = bpp * numSamples; pixelIndex = offset / samplePixelBits; *pSample = (offset % samplePixelBits) / bpp; } else { sampleTileBits = MicroTilePixels * bpp * thickness; *pSample = offset / sampleTileBits; pixelIndex = (offset % sampleTileBits) / bpp; } if (microTileType != ADDR_THICK) { if (microTileType == ADDR_DISPLAYABLE) // displayable { switch (bpp) { case 8: x = pixelIndex & 0x7; y = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,3),_BIT(pixelIndex,4)); break; case 16: x = pixelIndex & 0x7; y = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,4),_BIT(pixelIndex,3)); break; case 32: x = Bits2Number(3, _BIT(pixelIndex,3),_BIT(pixelIndex,1),_BIT(pixelIndex,0)); y = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,4),_BIT(pixelIndex,2)); break; case 64: x = Bits2Number(3, _BIT(pixelIndex,3),_BIT(pixelIndex,2),_BIT(pixelIndex,0)); y = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,4),_BIT(pixelIndex,1)); break; case 128: x = Bits2Number(3, _BIT(pixelIndex,3),_BIT(pixelIndex,2),_BIT(pixelIndex,1)); y = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,4),_BIT(pixelIndex,0)); break; default: break; } } else if (microTileType == ADDR_NON_DISPLAYABLE || microTileType == ADDR_DEPTH_SAMPLE_ORDER) { x = Bits2Number(3, _BIT(pixelIndex,4),_BIT(pixelIndex,2),_BIT(pixelIndex,0)); y = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,3),_BIT(pixelIndex,1)); } else if (microTileType == ADDR_ROTATED) { /* 8-Bit Elements element_index[5:0] = { x[2], x[0], x[1], y[2], y[1], y[0] } 16-Bit Elements element_index[5:0] = { x[2], x[1], x[0], y[2], y[1], y[0] } 32-Bit Elements element_index[5:0] = { x[2], x[1], y[2], x[0], y[1], y[0] } 64-Bit Elements element_index[5:0] = { y[2], x[2], x[1], y[1], x[0], y[0] } */ switch(bpp) { case 8: x = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,3),_BIT(pixelIndex,4)); y = pixelIndex & 0x7; break; case 16: x = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,4),_BIT(pixelIndex,3)); y = pixelIndex & 0x7; break; case 32: x = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,4),_BIT(pixelIndex,2)); y = Bits2Number(3, _BIT(pixelIndex,3),_BIT(pixelIndex,1),_BIT(pixelIndex,0)); break; case 64: x = Bits2Number(3, _BIT(pixelIndex,4),_BIT(pixelIndex,3),_BIT(pixelIndex,1)); y = Bits2Number(3, _BIT(pixelIndex,5),_BIT(pixelIndex,2),_BIT(pixelIndex,0)); break; default: ADDR_ASSERT_ALWAYS(); break; } } if (thickness > 1) // thick { z = Bits2Number(3, _BIT(pixelIndex,8),_BIT(pixelIndex,7),_BIT(pixelIndex,6)); } } else { ADDR_ASSERT((m_chipFamily >= ADDR_CHIP_FAMILY_CI) && (thickness > 1)); /* 8-Bit Elements and 16-Bit Elements element_index[7:0] = { y[2], x[2], z[1], z[0], y[1], x[1], y[0], x[0] } 32-Bit Elements element_index[7:0] = { y[2], x[2], z[1], y[1], z[0], x[1], y[0], x[0] } 64-Bit Elements and 128-Bit Elements element_index[7:0] = { y[2], x[2], z[1], y[1], x[1], z[0], y[0], x[0] } The equation to compute the element index for the extra thick tile: element_index[8] = z[2] */ switch (bpp) { case 8: case 16: // fall-through x = Bits2Number(3, _BIT(pixelIndex,6),_BIT(pixelIndex,2),_BIT(pixelIndex,0)); y = Bits2Number(3, _BIT(pixelIndex,7),_BIT(pixelIndex,3),_BIT(pixelIndex,1)); z = Bits2Number(2, _BIT(pixelIndex,5),_BIT(pixelIndex,4)); break; case 32: x = Bits2Number(3, _BIT(pixelIndex,6),_BIT(pixelIndex,2),_BIT(pixelIndex,0)); y = Bits2Number(3, _BIT(pixelIndex,7),_BIT(pixelIndex,4),_BIT(pixelIndex,1)); z = Bits2Number(2, _BIT(pixelIndex,5),_BIT(pixelIndex,3)); break; case 64: case 128: // fall-through x = Bits2Number(3, _BIT(pixelIndex,6),_BIT(pixelIndex,3),_BIT(pixelIndex,0)); y = Bits2Number(3, _BIT(pixelIndex,7),_BIT(pixelIndex,4),_BIT(pixelIndex,1)); z = Bits2Number(2, _BIT(pixelIndex,5),_BIT(pixelIndex,2)); break; default: ADDR_ASSERT_ALWAYS(); break; } if (thickness == 8) { z += Bits2Number(3,_BIT(pixelIndex,8),0,0); } } *pX = x; *pY = y; *pSlice += z; } /** **************************************************************************************************** * EgBasedLib::DispatchComputeSurfaceCoordFromAddrDispatch * * @brief * Compute (x,y,slice,sample) coordinates from surface address * @return * N/A **************************************************************************************************** */ VOID EgBasedLib::DispatchComputeSurfaceCoordFromAddr( const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { UINT_64 addr = pIn->addr; UINT_32 bitPosition = pIn->bitPosition; UINT_32 bpp = pIn->bpp; UINT_32 pitch = pIn->pitch; UINT_32 height = pIn->height; UINT_32 numSlices = pIn->numSlices; UINT_32 numSamples = ((pIn->numSamples == 0) ? 1 : pIn->numSamples); UINT_32 numFrags = ((pIn->numFrags == 0) ? numSamples : pIn->numFrags); AddrTileMode tileMode = pIn->tileMode; UINT_32 tileBase = pIn->tileBase; UINT_32 compBits = pIn->compBits; AddrTileType microTileType = pIn->tileType; BOOL_32 ignoreSE = pIn->ignoreSE; BOOL_32 isDepthSampleOrder = pIn->isDepth; ADDR_TILEINFO* pTileInfo = pIn->pTileInfo; UINT_32* pX = &pOut->x; UINT_32* pY = &pOut->y; UINT_32* pSlice = &pOut->slice; UINT_32* pSample = &pOut->sample; if (microTileType == ADDR_DEPTH_SAMPLE_ORDER) { isDepthSampleOrder = TRUE; } if (m_chipFamily >= ADDR_CHIP_FAMILY_NI) { if (numFrags != numSamples) { numSamples = numFrags; } /// @note /// 128 bit/thick tiled surface doesn't support display tiling and /// mipmap chain must have the same tileType, so please fill tileType correctly if (IsLinear(pIn->tileMode) == FALSE) { if (bpp >= 128 || Thickness(tileMode) > 1) { ADDR_ASSERT(microTileType != ADDR_DISPLAYABLE); } } } switch (tileMode) { case ADDR_TM_LINEAR_GENERAL://fall through case ADDR_TM_LINEAR_ALIGNED: ComputeSurfaceCoordFromAddrLinear(addr, bitPosition, bpp, pitch, height, numSlices, pX, pY, pSlice, pSample); break; case ADDR_TM_1D_TILED_THIN1://fall through case ADDR_TM_1D_TILED_THICK: ComputeSurfaceCoordFromAddrMicroTiled(addr, bitPosition, bpp, pitch, height, numSamples, tileMode, tileBase, compBits, pX, pY, pSlice, pSample, microTileType, isDepthSampleOrder); break; case ADDR_TM_2D_TILED_THIN1: //fall through case ADDR_TM_2D_TILED_THICK: //fall through case ADDR_TM_3D_TILED_THIN1: //fall through case ADDR_TM_3D_TILED_THICK: //fall through case ADDR_TM_2D_TILED_XTHICK: //fall through case ADDR_TM_3D_TILED_XTHICK: //fall through case ADDR_TM_PRT_TILED_THIN1: //fall through case ADDR_TM_PRT_2D_TILED_THIN1://fall through case ADDR_TM_PRT_3D_TILED_THIN1://fall through case ADDR_TM_PRT_TILED_THICK: //fall through case ADDR_TM_PRT_2D_TILED_THICK://fall through case ADDR_TM_PRT_3D_TILED_THICK: UINT_32 pipeSwizzle; UINT_32 bankSwizzle; if (m_configFlags.useCombinedSwizzle) { ExtractBankPipeSwizzle(pIn->tileSwizzle, pIn->pTileInfo, &bankSwizzle, &pipeSwizzle); } else { pipeSwizzle = pIn->pipeSwizzle; bankSwizzle = pIn->bankSwizzle; } ComputeSurfaceCoordFromAddrMacroTiled(addr, bitPosition, bpp, pitch, height, numSamples, tileMode, tileBase, compBits, microTileType, ignoreSE, isDepthSampleOrder, pipeSwizzle, bankSwizzle, pTileInfo, pX, pY, pSlice, pSample); break; default: ADDR_ASSERT_ALWAYS(); } } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceCoordFromAddrMacroTiled * * @brief * Compute surface coordinates from address for macro tiled surface * @return * N/A **************************************************************************************************** */ VOID EgBasedLib::ComputeSurfaceCoordFromAddrMacroTiled( UINT_64 addr, ///< [in] byte address UINT_32 bitPosition, ///< [in] bit position UINT_32 bpp, ///< [in] bits per pixel UINT_32 pitch, ///< [in] pitch in pixels UINT_32 height, ///< [in] height in pixels UINT_32 numSamples, ///< [in] number of samples AddrTileMode tileMode, ///< [in] tile mode UINT_32 tileBase, ///< [in] tile base offset UINT_32 compBits, ///< [in] component bits (for planar surface) AddrTileType microTileType, ///< [in] micro tiling type BOOL_32 ignoreSE, ///< [in] TRUE if shader engines can be ignored BOOL_32 isDepthSampleOrder, ///< [in] TRUE if depth sample order is used UINT_32 pipeSwizzle, ///< [in] pipe swizzle UINT_32 bankSwizzle, ///< [in] bank swizzle ADDR_TILEINFO* pTileInfo, ///< [in] bank structure. /// **All fields to be valid on entry** UINT_32* pX, ///< [out] X coord UINT_32* pY, ///< [out] Y coord UINT_32* pSlice, ///< [out] slice index UINT_32* pSample ///< [out] sample index ) const { UINT_32 mx; UINT_32 my; UINT_64 tileBits; UINT_64 macroTileBits; UINT_32 slices; UINT_32 tileSlices; UINT_64 elementOffset; UINT_64 macroTileIndex; UINT_32 tileIndex; UINT_64 totalOffset; UINT_32 bank; UINT_32 pipe; UINT_32 groupBits = m_pipeInterleaveBytes << 3; UINT_32 pipes = HwlGetPipes(pTileInfo); UINT_32 banks = pTileInfo->banks; UINT_32 bankInterleave = m_bankInterleave; UINT_64 addrBits = BYTES_TO_BITS(addr) + bitPosition; // // remove bits for bank and pipe // totalOffset = (addrBits % groupBits) + (((addrBits / groupBits / pipes) % bankInterleave) * groupBits) + (((addrBits / groupBits / pipes) / bankInterleave) / banks) * groupBits * bankInterleave; UINT_32 microTileThickness = Thickness(tileMode); UINT_32 microTileBits = bpp * microTileThickness * MicroTilePixels * numSamples; UINT_32 microTileBytes = BITS_TO_BYTES(microTileBits); // // Determine if tiles need to be split across slices. // // If the size of the micro tile is larger than the tile split size, then the tile will be // split across multiple slices. // UINT_32 slicesPerTile = 1; //_State->TileSlices if ((microTileBytes > pTileInfo->tileSplitBytes) && (microTileThickness == 1)) { //don't support for thick mode // // Compute the number of slices per tile. // slicesPerTile = microTileBytes / pTileInfo->tileSplitBytes; } tileBits = microTileBits / slicesPerTile; // micro tile bits // in micro tiles because not MicroTileWidth timed. UINT_32 macroWidth = pTileInfo->bankWidth * pipes * pTileInfo->macroAspectRatio; // in micro tiles as well UINT_32 macroHeight = pTileInfo->bankHeight * banks / pTileInfo->macroAspectRatio; UINT_32 pitchInMacroTiles = pitch / MicroTileWidth / macroWidth; macroTileBits = (macroWidth * macroHeight) * tileBits / (banks * pipes); macroTileIndex = totalOffset / macroTileBits; // pitchMacros * height / heightMacros; macroTilesPerSlice == _State->SliceMacros UINT_32 macroTilesPerSlice = (pitch / (macroWidth * MicroTileWidth)) * height / (macroHeight * MicroTileWidth); slices = static_cast(macroTileIndex / macroTilesPerSlice); *pSlice = static_cast(slices / slicesPerTile * microTileThickness); // // calculate element offset and x[2:0], y[2:0], z[1:0] for thick // tileSlices = slices % slicesPerTile; elementOffset = tileSlices * tileBits; elementOffset += totalOffset % tileBits; UINT_32 coordZ = 0; HwlComputePixelCoordFromOffset(static_cast(elementOffset), bpp, numSamples, tileMode, tileBase, compBits, pX, pY, &coordZ, pSample, microTileType, isDepthSampleOrder); macroTileIndex = macroTileIndex % macroTilesPerSlice; *pY += static_cast(macroTileIndex / pitchInMacroTiles * macroHeight * MicroTileHeight); *pX += static_cast(macroTileIndex % pitchInMacroTiles * macroWidth * MicroTileWidth); *pSlice += coordZ; tileIndex = static_cast((totalOffset % macroTileBits) / tileBits); my = (tileIndex / pTileInfo->bankWidth) % pTileInfo->bankHeight * MicroTileHeight; mx = (tileIndex % pTileInfo->bankWidth) * pipes * MicroTileWidth; *pY += my; *pX += mx; bank = ComputeBankFromAddr(addr, banks, pipes); pipe = ComputePipeFromAddr(addr, pipes); HwlComputeSurfaceCoord2DFromBankPipe(tileMode, pX, pY, *pSlice, bank, pipe, bankSwizzle, pipeSwizzle, tileSlices, ignoreSE, pTileInfo); } /** **************************************************************************************************** * EgBasedLib::ComputeSurfaceCoord2DFromBankPipe * * @brief * Compute surface x,y coordinates from bank/pipe info * @return * N/A **************************************************************************************************** */ VOID EgBasedLib::ComputeSurfaceCoord2DFromBankPipe( AddrTileMode tileMode, ///< [in] tile mode UINT_32 x, ///< [in] x coordinate UINT_32 y, ///< [in] y coordinate UINT_32 slice, ///< [in] slice index UINT_32 bank, ///< [in] bank number UINT_32 pipe, ///< [in] pipe number UINT_32 bankSwizzle,///< [in] bank swizzle UINT_32 pipeSwizzle,///< [in] pipe swizzle UINT_32 tileSlices, ///< [in] slices in a micro tile ADDR_TILEINFO* pTileInfo, ///< [in] bank structure. **All fields to be valid on entry** CoordFromBankPipe* pOutput ///< [out] pointer to extracted x/y bits ) const { UINT_32 yBit3 = 0; UINT_32 yBit4 = 0; UINT_32 yBit5 = 0; UINT_32 yBit6 = 0; UINT_32 xBit3 = 0; UINT_32 xBit4 = 0; UINT_32 xBit5 = 0; UINT_32 tileSplitRotation; UINT_32 numPipes = HwlGetPipes(pTileInfo); UINT_32 bankRotation = ComputeBankRotation(tileMode, pTileInfo->banks, numPipes); UINT_32 pipeRotation = ComputePipeRotation(tileMode, numPipes); UINT_32 xBit = x / (MicroTileWidth * pTileInfo->bankWidth * numPipes); UINT_32 yBit = y / (MicroTileHeight * pTileInfo->bankHeight); //calculate the bank and pipe before rotation and swizzle switch (tileMode) { case ADDR_TM_2D_TILED_THIN1: //fall through case ADDR_TM_2D_TILED_THICK: //fall through case ADDR_TM_2D_TILED_XTHICK: //fall through case ADDR_TM_3D_TILED_THIN1: //fall through case ADDR_TM_3D_TILED_THICK: //fall through case ADDR_TM_3D_TILED_XTHICK: tileSplitRotation = ((pTileInfo->banks / 2) + 1); break; default: tileSplitRotation = 0; break; } UINT_32 microTileThickness = Thickness(tileMode); bank ^= tileSplitRotation * tileSlices; if (pipeRotation == 0) { bank ^= bankRotation * (slice / microTileThickness) + bankSwizzle; bank %= pTileInfo->banks; pipe ^= pipeSwizzle; } else { bank ^= bankRotation * (slice / microTileThickness) / numPipes + bankSwizzle; bank %= pTileInfo->banks; pipe ^= pipeRotation * (slice / microTileThickness) + pipeSwizzle; } if (pTileInfo->macroAspectRatio == 1) { switch (pTileInfo->banks) { case 2: yBit3 = _BIT(bank, 0) ^ _BIT(xBit,0); break; case 4: yBit4 = _BIT(bank, 0) ^ _BIT(xBit,0); yBit3 = _BIT(bank, 1) ^ _BIT(xBit,1); break; case 8: yBit3 = _BIT(bank, 2) ^ _BIT(xBit,2); yBit5 = _BIT(bank, 0) ^ _BIT(xBit,0); yBit4 = _BIT(bank, 1) ^ _BIT(xBit,1) ^ yBit5; break; case 16: yBit3 = _BIT(bank, 3) ^ _BIT(xBit, 3); yBit4 = _BIT(bank, 2) ^ _BIT(xBit, 2); yBit6 = _BIT(bank, 0) ^ _BIT(xBit, 0); yBit5 = _BIT(bank, 1) ^ _BIT(xBit, 1) ^ yBit6; break; default: break; } } else if (pTileInfo->macroAspectRatio == 2) { switch (pTileInfo->banks) { case 2: //xBit3 = yBit3^b0 xBit3 = _BIT(bank, 0) ^ _BIT(yBit,0); break; case 4: //xBit3=yBit4^b0; yBit3=xBit4^b1 xBit3 = _BIT(bank, 0) ^ _BIT(yBit,1); yBit3 = _BIT(bank, 1) ^ _BIT(xBit,1); break; case 8: //xBit4, xBit5, yBit5 are known xBit3 = _BIT(bank, 0) ^ _BIT(yBit,2); yBit3 = _BIT(bank, 2) ^ _BIT(xBit,2); yBit4 = _BIT(bank, 1) ^ _BIT(xBit,1) ^ _BIT(yBit, 2); break; case 16://x4,x5,x6,y6 are known xBit3 = _BIT(bank, 0) ^ _BIT(yBit, 3); //x3 = y6 ^ b0 yBit3 = _BIT(bank, 3) ^ _BIT(xBit, 3); //y3 = x6 ^ b3 yBit4 = _BIT(bank, 2) ^ _BIT(xBit, 2); //y4 = x5 ^ b2 yBit5 = _BIT(bank, 1) ^ _BIT(xBit, 1) ^ _BIT(yBit, 3); //y5=x4^y6^b1 break; default: break; } } else if (pTileInfo->macroAspectRatio == 4) { switch (pTileInfo->banks) { case 4: //yBit3, yBit4 xBit3 = _BIT(bank, 0) ^ _BIT(yBit,1); xBit4 = _BIT(bank, 1) ^ _BIT(yBit,0); break; case 8: //xBit5, yBit4, yBit5 xBit3 = _BIT(bank, 0) ^ _BIT(yBit,2); yBit3 = _BIT(bank, 2) ^ _BIT(xBit,2); xBit4 = _BIT(bank, 1) ^ _BIT(yBit,1) ^ _BIT(yBit,2); break; case 16: //xBit5, xBit6, yBit5, yBit6 xBit3 = _BIT(bank, 0) ^ _BIT(yBit, 3);//x3 = b0 ^ y6 xBit4 = _BIT(bank, 1) ^ _BIT(yBit, 2) ^ _BIT(yBit, 3);//x4 = b1 ^ y5 ^ y6; yBit3 = _BIT(bank, 3) ^ _BIT(xBit, 3); //y3 = b3 ^ x6; yBit4 = _BIT(bank, 2) ^ _BIT(xBit, 2); //y4 = b2 ^ x5; break; default: break; } } else if (pTileInfo->macroAspectRatio == 8) { switch (pTileInfo->banks) { case 8: //yBit3, yBit4, yBit5 xBit3 = _BIT(bank, 0) ^ _BIT(yBit,2); //x3 = b0 ^ y5; xBit4 = _BIT(bank, 1) ^ _BIT(yBit,1) ^ _BIT(yBit, 2);//x4 = b1 ^ y4 ^ y5; xBit5 = _BIT(bank, 2) ^ _BIT(yBit,0); break; case 16: //xBit6, yBit4, yBit5, yBit6 xBit3 = _BIT(bank, 0) ^ _BIT(yBit, 3);//x3 = y6 ^ b0 xBit4 = _BIT(bank, 1) ^ _BIT(yBit, 2) ^ _BIT(yBit, 3);//x4 = y5 ^ y6 ^ b1 xBit5 = _BIT(bank, 2) ^ _BIT(yBit, 1);//x5 = y4 ^ b2 yBit3 = _BIT(bank, 3) ^ _BIT(xBit, 3); //y3 = x6 ^ b3 break; default: break; } } pOutput->xBits = xBit; pOutput->yBits = yBit; pOutput->xBit3 = xBit3; pOutput->xBit4 = xBit4; pOutput->xBit5 = xBit5; pOutput->yBit3 = yBit3; pOutput->yBit4 = yBit4; pOutput->yBit5 = yBit5; pOutput->yBit6 = yBit6; } /** **************************************************************************************************** * EgBasedLib::HwlExtractBankPipeSwizzle * @brief * Entry of EgBasedLib ExtractBankPipeSwizzle * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlExtractBankPipeSwizzle( const ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT* pIn, ///< [in] input structure ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT* pOut ///< [out] output structure ) const { ExtractBankPipeSwizzle(pIn->base256b, pIn->pTileInfo, &pOut->bankSwizzle, &pOut->pipeSwizzle); return ADDR_OK; } /** **************************************************************************************************** * EgBasedLib::HwlCombineBankPipeSwizzle * @brief * Combine bank/pipe swizzle * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlCombineBankPipeSwizzle( UINT_32 bankSwizzle, ///< [in] bank swizzle UINT_32 pipeSwizzle, ///< [in] pipe swizzle ADDR_TILEINFO* pTileInfo, ///< [in] tile info UINT_64 baseAddr, ///< [in] base address UINT_32* pTileSwizzle ///< [out] combined swizzle ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; if (pTileSwizzle) { *pTileSwizzle = GetBankPipeSwizzle(bankSwizzle, pipeSwizzle, baseAddr, pTileInfo); } else { retCode = ADDR_INVALIDPARAMS; } return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlComputeBaseSwizzle * @brief * Compute base swizzle * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlComputeBaseSwizzle( const ADDR_COMPUTE_BASE_SWIZZLE_INPUT* pIn, ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT* pOut ) const { UINT_32 bankSwizzle = 0; UINT_32 pipeSwizzle = 0; ADDR_TILEINFO* pTileInfo = pIn->pTileInfo; ADDR_ASSERT(IsMacroTiled(pIn->tileMode)); ADDR_ASSERT(pIn->pTileInfo); /// This is a legacy misreading of h/w doc, use it as it doesn't hurt. static const UINT_8 bankRotationArray[4][16] = { { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, // ADDR_SURF_2_BANK { 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, // ADDR_SURF_4_BANK { 0, 3, 6, 1, 4, 7, 2, 5, 0, 0, 0, 0, 0, 0, 0, 0 }, // ADDR_SURF_8_BANK { 0, 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9 }, // ADDR_SURF_16_BANK }; UINT_32 pipes = HwlGetPipes(pTileInfo); (void)pipes; UINT_32 banks = pTileInfo ? pTileInfo->banks : 2; UINT_32 hwNumBanks; // Uses less bank swizzle bits if (pIn->option.reduceBankBit && banks > 2) { banks >>= 1; } switch (banks) { case 2: hwNumBanks = 0; break; case 4: hwNumBanks = 1; break; case 8: hwNumBanks = 2; break; case 16: hwNumBanks = 3; break; default: ADDR_ASSERT_ALWAYS(); hwNumBanks = 0; break; } if (pIn->option.genOption == ADDR_SWIZZLE_GEN_LINEAR) { bankSwizzle = pIn->surfIndex & (banks - 1); } else // (pIn->option.genOption == ADDR_SWIZZLE_GEN_DEFAULT) { bankSwizzle = bankRotationArray[hwNumBanks][pIn->surfIndex & (banks - 1)]; } if (IsMacro3dTiled(pIn->tileMode)) { pipeSwizzle = pIn->surfIndex & (HwlGetPipes(pTileInfo) - 1); } return HwlCombineBankPipeSwizzle(bankSwizzle, pipeSwizzle, pTileInfo, 0, &pOut->tileSwizzle); } /** **************************************************************************************************** * EgBasedLib::ExtractBankPipeSwizzle * @brief * Extract bank/pipe swizzle from base256b * @return * N/A **************************************************************************************************** */ VOID EgBasedLib::ExtractBankPipeSwizzle( UINT_32 base256b, ///< [in] input base256b register value ADDR_TILEINFO* pTileInfo, ///< [in] 2D tile parameters. Client must provide all data UINT_32* pBankSwizzle, ///< [out] bank swizzle UINT_32* pPipeSwizzle ///< [out] pipe swizzle ) const { UINT_32 bankSwizzle = 0; UINT_32 pipeSwizzle = 0; if (base256b != 0) { UINT_32 numPipes = HwlGetPipes(pTileInfo); UINT_32 bankBits = QLog2(pTileInfo->banks); UINT_32 pipeBits = QLog2(numPipes); UINT_32 groupBytes = m_pipeInterleaveBytes; UINT_32 bankInterleave = m_bankInterleave; pipeSwizzle = (base256b / (groupBytes >> 8)) & ((1<> 8) / numPipes / bankInterleave) & ((1 << bankBits) - 1); } *pPipeSwizzle = pipeSwizzle; *pBankSwizzle = bankSwizzle; } /** **************************************************************************************************** * EgBasedLib::GetBankPipeSwizzle * @brief * Combine bank/pipe swizzle * @return * Base256b bits (only filled bank/pipe bits) **************************************************************************************************** */ UINT_32 EgBasedLib::GetBankPipeSwizzle( UINT_32 bankSwizzle, ///< [in] bank swizzle UINT_32 pipeSwizzle, ///< [in] pipe swizzle UINT_64 baseAddr, ///< [in] base address ADDR_TILEINFO* pTileInfo ///< [in] tile info ) const { UINT_32 pipeBits = QLog2(HwlGetPipes(pTileInfo)); UINT_32 bankInterleaveBits = QLog2(m_bankInterleave); UINT_32 tileSwizzle = pipeSwizzle + ((bankSwizzle << bankInterleaveBits) << pipeBits); baseAddr ^= tileSwizzle * m_pipeInterleaveBytes; baseAddr >>= 8; return static_cast(baseAddr); } /** **************************************************************************************************** * EgBasedLib::ComputeSliceTileSwizzle * @brief * Compute cubemap/3d texture faces/slices tile swizzle * @return * Tile swizzle **************************************************************************************************** */ UINT_32 EgBasedLib::ComputeSliceTileSwizzle( AddrTileMode tileMode, ///< [in] Tile mode UINT_32 baseSwizzle, ///< [in] Base swizzle UINT_32 slice, ///< [in] Slice index, Cubemap face index, 0 means +X UINT_64 baseAddr, ///< [in] Base address ADDR_TILEINFO* pTileInfo ///< [in] Bank structure ) const { UINT_32 tileSwizzle = 0; if (IsMacroTiled(tileMode)) // Swizzle only for macro tile mode { UINT_32 firstSlice = slice / Thickness(tileMode); UINT_32 numPipes = HwlGetPipes(pTileInfo); UINT_32 numBanks = pTileInfo->banks; UINT_32 pipeRotation; UINT_32 bankRotation; UINT_32 bankSwizzle = 0; UINT_32 pipeSwizzle = 0; pipeRotation = ComputePipeRotation(tileMode, numPipes); bankRotation = ComputeBankRotation(tileMode, numBanks, numPipes); if (baseSwizzle != 0) { ExtractBankPipeSwizzle(baseSwizzle, pTileInfo, &bankSwizzle, &pipeSwizzle); } if (pipeRotation == 0) //2D mode { bankSwizzle += firstSlice * bankRotation; bankSwizzle %= numBanks; } else //3D mode { pipeSwizzle += firstSlice * pipeRotation; pipeSwizzle %= numPipes; bankSwizzle += firstSlice * bankRotation / numPipes; bankSwizzle %= numBanks; } tileSwizzle = GetBankPipeSwizzle(bankSwizzle, pipeSwizzle, baseAddr, pTileInfo); } return tileSwizzle; } /** **************************************************************************************************** * EgBasedLib::HwlComputeQbStereoRightSwizzle * * @brief * Compute right eye swizzle * @return * swizzle **************************************************************************************************** */ UINT_32 EgBasedLib::HwlComputeQbStereoRightSwizzle( ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pInfo ///< [in] Surface info, must be valid ) const { UINT_32 bankBits = 0; UINT_32 swizzle = 0; // The assumption is default swizzle for left eye is 0 if (IsMacroTiled(pInfo->tileMode) && pInfo->pStereoInfo && pInfo->pTileInfo) { bankBits = ComputeBankFromCoord(0, pInfo->height, 0, pInfo->tileMode, 0, 0, pInfo->pTileInfo); if (bankBits) { HwlCombineBankPipeSwizzle(bankBits, 0, pInfo->pTileInfo, 0, &swizzle); } } return swizzle; } /** **************************************************************************************************** * EgBasedLib::ComputeBankFromCoord * * @brief * Compute bank number from coordinates * @return * Bank number **************************************************************************************************** */ UINT_32 EgBasedLib::ComputeBankFromCoord( UINT_32 x, ///< [in] x coordinate UINT_32 y, ///< [in] y coordinate UINT_32 slice, ///< [in] slice index AddrTileMode tileMode, ///< [in] tile mode UINT_32 bankSwizzle, ///< [in] bank swizzle UINT_32 tileSplitSlice, ///< [in] If the size of the pixel offset is larger than the /// tile split size, then the pixel will be moved to a separate /// slice. This value equals pixelOffset / tileSplitBytes /// in this case. Otherwise this is 0. ADDR_TILEINFO* pTileInfo ///< [in] tile info ) const { UINT_32 pipes = HwlGetPipes(pTileInfo); UINT_32 bankBit0 = 0; UINT_32 bankBit1 = 0; UINT_32 bankBit2 = 0; UINT_32 bankBit3 = 0; UINT_32 sliceRotation; UINT_32 tileSplitRotation; UINT_32 bank; UINT_32 numBanks = pTileInfo->banks; UINT_32 bankWidth = pTileInfo->bankWidth; UINT_32 bankHeight = pTileInfo->bankHeight; UINT_32 tx = x / MicroTileWidth / (bankWidth * pipes); UINT_32 ty = y / MicroTileHeight / bankHeight; UINT_32 x3 = _BIT(tx,0); UINT_32 x4 = _BIT(tx,1); UINT_32 x5 = _BIT(tx,2); UINT_32 x6 = _BIT(tx,3); UINT_32 y3 = _BIT(ty,0); UINT_32 y4 = _BIT(ty,1); UINT_32 y5 = _BIT(ty,2); UINT_32 y6 = _BIT(ty,3); switch (numBanks) { case 16: bankBit0 = x3 ^ y6; bankBit1 = x4 ^ y5 ^ y6; bankBit2 = x5 ^ y4; bankBit3 = x6 ^ y3; break; case 8: bankBit0 = x3 ^ y5; bankBit1 = x4 ^ y4 ^ y5; bankBit2 = x5 ^ y3; break; case 4: bankBit0 = x3 ^ y4; bankBit1 = x4 ^ y3; break; case 2: bankBit0 = x3 ^ y3; break; default: ADDR_ASSERT_ALWAYS(); break; } bank = bankBit0 | (bankBit1 << 1) | (bankBit2 << 2) | (bankBit3 << 3); //Bits2Number(4, bankBit3, bankBit2, bankBit1, bankBit0); bank = HwlPreAdjustBank((x / MicroTileWidth), bank, pTileInfo); // // Compute bank rotation for the slice. // UINT_32 microTileThickness = Thickness(tileMode); switch (tileMode) { case ADDR_TM_2D_TILED_THIN1: // fall through case ADDR_TM_2D_TILED_THICK: // fall through case ADDR_TM_2D_TILED_XTHICK: sliceRotation = ((numBanks / 2) - 1) * (slice / microTileThickness); break; case ADDR_TM_3D_TILED_THIN1: // fall through case ADDR_TM_3D_TILED_THICK: // fall through case ADDR_TM_3D_TILED_XTHICK: sliceRotation = Max(1u, (pipes / 2) - 1) * (slice / microTileThickness) / pipes; break; default: sliceRotation = 0; break; } // // Compute bank rotation for the tile split slice. // // The sample slice will be non-zero if samples must be split across multiple slices. // This situation arises when the micro tile size multiplied yBit the number of samples exceeds // the split size (set in GB_ADDR_CONFIG). // switch (tileMode) { case ADDR_TM_2D_TILED_THIN1: //fall through case ADDR_TM_3D_TILED_THIN1: //fall through case ADDR_TM_PRT_2D_TILED_THIN1: //fall through case ADDR_TM_PRT_3D_TILED_THIN1: //fall through tileSplitRotation = ((numBanks / 2) + 1) * tileSplitSlice; break; default: tileSplitRotation = 0; break; } // // Apply bank rotation for the slice and tile split slice. // bank ^= bankSwizzle + sliceRotation; bank ^= tileSplitRotation; bank &= (numBanks - 1); return bank; } /** **************************************************************************************************** * EgBasedLib::ComputeBankFromAddr * * @brief * Compute the bank number from an address * @return * Bank number **************************************************************************************************** */ UINT_32 EgBasedLib::ComputeBankFromAddr( UINT_64 addr, ///< [in] address UINT_32 numBanks, ///< [in] number of banks UINT_32 numPipes ///< [in] number of pipes ) const { UINT_32 bank; // // The LSBs of the address are arranged as follows: // bank | bankInterleave | pipe | pipeInterleave // // To get the bank number, shift off the pipe interleave, pipe, and bank interlave bits and // mask the bank bits. // bank = static_cast( (addr >> Log2(m_pipeInterleaveBytes * numPipes * m_bankInterleave)) & (numBanks - 1) ); return bank; } /** **************************************************************************************************** * EgBasedLib::ComputePipeRotation * * @brief * Compute pipe rotation value * @return * Pipe rotation **************************************************************************************************** */ UINT_32 EgBasedLib::ComputePipeRotation( AddrTileMode tileMode, ///< [in] tile mode UINT_32 numPipes ///< [in] number of pipes ) const { UINT_32 rotation; switch (tileMode) { case ADDR_TM_3D_TILED_THIN1: //fall through case ADDR_TM_3D_TILED_THICK: //fall through case ADDR_TM_3D_TILED_XTHICK: //fall through case ADDR_TM_PRT_3D_TILED_THIN1: //fall through case ADDR_TM_PRT_3D_TILED_THICK: rotation = (numPipes < 4) ? 1 : (numPipes / 2 - 1); break; default: rotation = 0; } return rotation; } /** **************************************************************************************************** * EgBasedLib::ComputeBankRotation * * @brief * Compute bank rotation value * @return * Bank rotation **************************************************************************************************** */ UINT_32 EgBasedLib::ComputeBankRotation( AddrTileMode tileMode, ///< [in] tile mode UINT_32 numBanks, ///< [in] number of banks UINT_32 numPipes ///< [in] number of pipes ) const { UINT_32 rotation; switch (tileMode) { case ADDR_TM_2D_TILED_THIN1: // fall through case ADDR_TM_2D_TILED_THICK: // fall through case ADDR_TM_2D_TILED_XTHICK: case ADDR_TM_PRT_2D_TILED_THIN1: case ADDR_TM_PRT_2D_TILED_THICK: // Rotate banks per Z-slice yBit 1 for 4-bank or 3 for 8-bank rotation = numBanks / 2 - 1; break; case ADDR_TM_3D_TILED_THIN1: // fall through case ADDR_TM_3D_TILED_THICK: // fall through case ADDR_TM_3D_TILED_XTHICK: case ADDR_TM_PRT_3D_TILED_THIN1: case ADDR_TM_PRT_3D_TILED_THICK: rotation = (numPipes < 4) ? 1 : (numPipes / 2 - 1); // rotate pipes & banks break; default: rotation = 0; } return rotation; } /** **************************************************************************************************** * EgBasedLib::ComputeHtileBytes * * @brief * Compute htile size in bytes * * @return * Htile size in bytes **************************************************************************************************** */ UINT_64 EgBasedLib::ComputeHtileBytes( UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 bpp, ///< [in] bits per pixel BOOL_32 isLinear, ///< [in] if it is linear mode UINT_32 numSlices, ///< [in] number of slices UINT_64* sliceBytes, ///< [out] bytes per slice UINT_32 baseAlign ///< [in] base alignments ) const { UINT_64 surfBytes; const UINT_64 HtileCacheLineSize = BITS_TO_BYTES(HtileCacheBits); *sliceBytes = BITS_TO_BYTES(static_cast(pitch) * height * bpp / 64); if (m_configFlags.useHtileSliceAlign) { // Align the sliceSize to htilecachelinesize * pipes at first *sliceBytes = PowTwoAlign(*sliceBytes, HtileCacheLineSize * m_pipes); surfBytes = *sliceBytes * numSlices; } else { // Align the surfSize to htilecachelinesize * pipes at last surfBytes = *sliceBytes * numSlices; surfBytes = PowTwoAlign(surfBytes, HtileCacheLineSize * m_pipes); } return surfBytes; } /** **************************************************************************************************** * EgBasedLib::DispatchComputeFmaskInfo * * @brief * Compute fmask sizes include padded pitch, height, slices, total size in bytes, * meanwhile output suitable tile mode and alignments as well. Results are returned * through output parameters. * * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::DispatchComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut) ///< [out] output structure { ADDR_E_RETURNCODE retCode = ADDR_OK; ADDR_COMPUTE_SURFACE_INFO_INPUT surfIn = {0}; ADDR_COMPUTE_SURFACE_INFO_OUTPUT surfOut = {0}; // Setup input structure surfIn.tileMode = pIn->tileMode; surfIn.width = pIn->pitch; surfIn.height = pIn->height; surfIn.numSlices = pIn->numSlices; surfIn.pTileInfo = pIn->pTileInfo; surfIn.tileType = ADDR_NON_DISPLAYABLE; surfIn.flags.fmask = 1; // Setup output structure surfOut.pTileInfo = pOut->pTileInfo; // Setup hwl specific fields HwlFmaskPreThunkSurfInfo(pIn, pOut, &surfIn, &surfOut); surfIn.bpp = HwlComputeFmaskBits(pIn, &surfIn.numSamples); // ComputeSurfaceInfo needs numSamples in surfOut as surface routines need adjusted numSamples surfOut.numSamples = surfIn.numSamples; retCode = HwlComputeSurfaceInfo(&surfIn, &surfOut); // Save bpp field for surface dump support surfOut.bpp = surfIn.bpp; if (retCode == ADDR_OK) { pOut->bpp = surfOut.bpp; pOut->pitch = surfOut.pitch; pOut->height = surfOut.height; pOut->numSlices = surfOut.depth; pOut->fmaskBytes = surfOut.surfSize; pOut->baseAlign = surfOut.baseAlign; pOut->pitchAlign = surfOut.pitchAlign; pOut->heightAlign = surfOut.heightAlign; if (surfOut.depth > 1) { // For fmask, expNumSlices is stored in depth. pOut->sliceSize = surfOut.surfSize / surfOut.depth; } else { pOut->sliceSize = surfOut.surfSize; } // Save numSamples field for surface dump support pOut->numSamples = surfOut.numSamples; HwlFmaskPostThunkSurfInfo(&surfOut, pOut); } return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlFmaskSurfaceInfo * @brief * Entry of EgBasedLib ComputeFmaskInfo * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut ///< [out] output structure ) { ADDR_E_RETURNCODE retCode = ADDR_OK; ADDR_TILEINFO tileInfo = {0}; // Use internal tile info if pOut does not have a valid pTileInfo if (pOut->pTileInfo == NULL) { pOut->pTileInfo = &tileInfo; } retCode = DispatchComputeFmaskInfo(pIn, pOut); if (retCode == ADDR_OK) { pOut->tileIndex = HwlPostCheckTileIndex(pOut->pTileInfo, pIn->tileMode, ADDR_NON_DISPLAYABLE, pOut->tileIndex); } // Resets pTileInfo to NULL if the internal tile info is used if (pOut->pTileInfo == &tileInfo) { pOut->pTileInfo = NULL; } return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlComputeFmaskAddrFromCoord * @brief * Entry of EgBasedLib ComputeFmaskAddrFromCoord * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlComputeFmaskAddrFromCoord( const ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlComputeFmaskCoordFromAddr * @brief * Entry of EgBasedLib ComputeFmaskCoordFromAddr * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlComputeFmaskCoordFromAddr( const ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; return retCode; } /** **************************************************************************************************** * EgBasedLib::ComputeFmaskNumPlanesFromNumSamples * * @brief * Compute fmask number of planes from number of samples * * @return * Number of planes **************************************************************************************************** */ UINT_32 EgBasedLib::ComputeFmaskNumPlanesFromNumSamples( UINT_32 numSamples) ///< [in] number of samples { UINT_32 numPlanes; // // FMASK is stored such that each micro tile is composed of elements containing N bits, where // N is the number of samples. There is a micro tile for each bit in the FMASK address, and // micro tiles for each address bit, sometimes referred to as a plane, are stored sequentially. // The FMASK for a 2-sample surface looks like a general surface with 2 bits per element. // The FMASK for a 4-sample surface looks like a general surface with 4 bits per element and // 2 samples. The FMASK for an 8-sample surface looks like a general surface with 8 bits per // element and 4 samples. R6xx and R7xx only stored 3 planes for 8-sample FMASK surfaces. // This was changed for R8xx to simplify the logic in the CB. // switch (numSamples) { case 2: numPlanes = 1; break; case 4: numPlanes = 2; break; case 8: numPlanes = 4; break; default: ADDR_UNHANDLED_CASE(); numPlanes = 0; break; } return numPlanes; } /** **************************************************************************************************** * EgBasedLib::ComputeFmaskResolvedBppFromNumSamples * * @brief * Compute resolved fmask effective bpp based on number of samples * * @return * bpp **************************************************************************************************** */ UINT_32 EgBasedLib::ComputeFmaskResolvedBppFromNumSamples( UINT_32 numSamples) ///< number of samples { UINT_32 bpp; // // Resolved FMASK surfaces are generated yBit the CB and read yBit the texture unit // so that the texture unit can read compressed multi-sample color data. // These surfaces store each index value packed per element. // Each element contains at least num_samples * log2(num_samples) bits. // Resolved FMASK surfaces are addressed as follows: // 2-sample Addressed similarly to a color surface with 8 bits per element and 1 sample. // 4-sample Addressed similarly to a color surface with 8 bits per element and 1 sample. // 8-sample Addressed similarly to a color surface with 32 bits per element and 1 sample. switch (numSamples) { case 2: bpp = 8; break; case 4: bpp = 8; break; case 8: bpp = 32; break; default: ADDR_UNHANDLED_CASE(); bpp = 0; break; } return bpp; } /** **************************************************************************************************** * EgBasedLib::IsTileInfoAllZero * * @brief * Return TRUE if all field are zero * @note * Since NULL input is consider to be all zero **************************************************************************************************** */ BOOL_32 EgBasedLib::IsTileInfoAllZero( const ADDR_TILEINFO* pTileInfo) { BOOL_32 allZero = TRUE; if (pTileInfo) { if ((pTileInfo->banks != 0) || (pTileInfo->bankWidth != 0) || (pTileInfo->bankHeight != 0) || (pTileInfo->macroAspectRatio != 0) || (pTileInfo->tileSplitBytes != 0) || (pTileInfo->pipeConfig != 0) ) { allZero = FALSE; } } return allZero; } /** **************************************************************************************************** * EgBasedLib::HwlTileInfoEqual * * @brief * Return TRUE if all field are equal * @note * Only takes care of current HWL's data **************************************************************************************************** */ BOOL_32 EgBasedLib::HwlTileInfoEqual( const ADDR_TILEINFO* pLeft, ///<[in] Left compare operand const ADDR_TILEINFO* pRight ///<[in] Right compare operand ) const { BOOL_32 equal = FALSE; if (pLeft->banks == pRight->banks && pLeft->bankWidth == pRight->bankWidth && pLeft->bankHeight == pRight->bankHeight && pLeft->macroAspectRatio == pRight->macroAspectRatio && pLeft->tileSplitBytes == pRight->tileSplitBytes) { equal = TRUE; } return equal; } /** **************************************************************************************************** * EgBasedLib::HwlConvertTileInfoToHW * @brief * Entry of EgBasedLib ConvertTileInfoToHW * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlConvertTileInfoToHW( const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ///< [in] input structure ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; ADDR_TILEINFO *pTileInfoIn = pIn->pTileInfo; ADDR_TILEINFO *pTileInfoOut = pOut->pTileInfo; if ((pTileInfoIn != NULL) && (pTileInfoOut != NULL)) { if (pIn->reverse == FALSE) { switch (pTileInfoIn->banks) { case 2: pTileInfoOut->banks = 0; break; case 4: pTileInfoOut->banks = 1; break; case 8: pTileInfoOut->banks = 2; break; case 16: pTileInfoOut->banks = 3; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->banks = 0; break; } switch (pTileInfoIn->bankWidth) { case 1: pTileInfoOut->bankWidth = 0; break; case 2: pTileInfoOut->bankWidth = 1; break; case 4: pTileInfoOut->bankWidth = 2; break; case 8: pTileInfoOut->bankWidth = 3; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->bankWidth = 0; break; } switch (pTileInfoIn->bankHeight) { case 1: pTileInfoOut->bankHeight = 0; break; case 2: pTileInfoOut->bankHeight = 1; break; case 4: pTileInfoOut->bankHeight = 2; break; case 8: pTileInfoOut->bankHeight = 3; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->bankHeight = 0; break; } switch (pTileInfoIn->macroAspectRatio) { case 1: pTileInfoOut->macroAspectRatio = 0; break; case 2: pTileInfoOut->macroAspectRatio = 1; break; case 4: pTileInfoOut->macroAspectRatio = 2; break; case 8: pTileInfoOut->macroAspectRatio = 3; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->macroAspectRatio = 0; break; } switch (pTileInfoIn->tileSplitBytes) { case 64: pTileInfoOut->tileSplitBytes = 0; break; case 128: pTileInfoOut->tileSplitBytes = 1; break; case 256: pTileInfoOut->tileSplitBytes = 2; break; case 512: pTileInfoOut->tileSplitBytes = 3; break; case 1024: pTileInfoOut->tileSplitBytes = 4; break; case 2048: pTileInfoOut->tileSplitBytes = 5; break; case 4096: pTileInfoOut->tileSplitBytes = 6; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->tileSplitBytes = 0; break; } } else { switch (pTileInfoIn->banks) { case 0: pTileInfoOut->banks = 2; break; case 1: pTileInfoOut->banks = 4; break; case 2: pTileInfoOut->banks = 8; break; case 3: pTileInfoOut->banks = 16; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->banks = 2; break; } switch (pTileInfoIn->bankWidth) { case 0: pTileInfoOut->bankWidth = 1; break; case 1: pTileInfoOut->bankWidth = 2; break; case 2: pTileInfoOut->bankWidth = 4; break; case 3: pTileInfoOut->bankWidth = 8; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->bankWidth = 1; break; } switch (pTileInfoIn->bankHeight) { case 0: pTileInfoOut->bankHeight = 1; break; case 1: pTileInfoOut->bankHeight = 2; break; case 2: pTileInfoOut->bankHeight = 4; break; case 3: pTileInfoOut->bankHeight = 8; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->bankHeight = 1; break; } switch (pTileInfoIn->macroAspectRatio) { case 0: pTileInfoOut->macroAspectRatio = 1; break; case 1: pTileInfoOut->macroAspectRatio = 2; break; case 2: pTileInfoOut->macroAspectRatio = 4; break; case 3: pTileInfoOut->macroAspectRatio = 8; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->macroAspectRatio = 1; break; } switch (pTileInfoIn->tileSplitBytes) { case 0: pTileInfoOut->tileSplitBytes = 64; break; case 1: pTileInfoOut->tileSplitBytes = 128; break; case 2: pTileInfoOut->tileSplitBytes = 256; break; case 3: pTileInfoOut->tileSplitBytes = 512; break; case 4: pTileInfoOut->tileSplitBytes = 1024; break; case 5: pTileInfoOut->tileSplitBytes = 2048; break; case 6: pTileInfoOut->tileSplitBytes = 4096; break; default: ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; pTileInfoOut->tileSplitBytes = 64; break; } } if (pTileInfoIn != pTileInfoOut) { pTileInfoOut->pipeConfig = pTileInfoIn->pipeConfig; } } else { ADDR_ASSERT_ALWAYS(); retCode = ADDR_INVALIDPARAMS; } return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlComputeSurfaceInfo * @brief * Entry of EgBasedLib ComputeSurfaceInfo * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; if (pIn->numSamples < pIn->numFrags) { retCode = ADDR_INVALIDPARAMS; } ADDR_TILEINFO tileInfo = {0}; if (retCode == ADDR_OK) { // Uses internal tile info if pOut does not have a valid pTileInfo if (pOut->pTileInfo == NULL) { pOut->pTileInfo = &tileInfo; } if (DispatchComputeSurfaceInfo(pIn, pOut) == FALSE) { retCode = ADDR_INVALIDPARAMS; } // In case client uses tile info as input and would like to calculate a correct size and // alignment together with tile info as output when the tile info is not suppose to have any // matching indices in tile mode tables. if (pIn->flags.skipIndicesOutput == FALSE) { // Returns an index pOut->tileIndex = HwlPostCheckTileIndex(pOut->pTileInfo, pOut->tileMode, pOut->tileType, pOut->tileIndex); if (IsMacroTiled(pOut->tileMode) && (pOut->macroModeIndex == TileIndexInvalid)) { pOut->macroModeIndex = HwlComputeMacroModeIndex(pOut->tileIndex, pIn->flags, pIn->bpp, pIn->numSamples, pOut->pTileInfo); } } // Resets pTileInfo to NULL if the internal tile info is used if (pOut->pTileInfo == &tileInfo) { #if DEBUG // Client does not pass in a valid pTileInfo if (IsMacroTiled(pOut->tileMode)) { // If a valid index is returned, then no pTileInfo is okay ADDR_ASSERT((m_configFlags.useTileIndex == FALSE) || (pOut->tileIndex != TileIndexInvalid)); if (IsTileInfoAllZero(pIn->pTileInfo) == FALSE) { // The initial value of pIn->pTileInfo is copied to tileInfo // We do not expect any of these value to be changed nor any 0 of inputs ADDR_ASSERT(tileInfo.banks == pIn->pTileInfo->banks); ADDR_ASSERT(tileInfo.bankWidth == pIn->pTileInfo->bankWidth); ADDR_ASSERT(tileInfo.bankHeight == pIn->pTileInfo->bankHeight); ADDR_ASSERT(tileInfo.macroAspectRatio == pIn->pTileInfo->macroAspectRatio); ADDR_ASSERT(tileInfo.tileSplitBytes == pIn->pTileInfo->tileSplitBytes); } } #endif pOut->pTileInfo = NULL; } } return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlComputeSurfaceAddrFromCoord * @brief * Entry of EgBasedLib ComputeSurfaceAddrFromCoord * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlComputeSurfaceAddrFromCoord( const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; if ( #if !ALT_TEST // Overflow test needs this out-of-boundary coord (pIn->x > pIn->pitch) || (pIn->y > pIn->height) || #endif (pIn->numSamples > m_maxSamples)) { retCode = ADDR_INVALIDPARAMS; } else { pOut->addr = DispatchComputeSurfaceAddrFromCoord(pIn, pOut); } return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlComputeSurfaceCoordFromAddr * @brief * Entry of EgBasedLib ComputeSurfaceCoordFromAddr * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlComputeSurfaceCoordFromAddr( const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; if ((pIn->bitPosition >= 8) || (pIn->numSamples > m_maxSamples)) { retCode = ADDR_INVALIDPARAMS; } else { DispatchComputeSurfaceCoordFromAddr(pIn, pOut); } return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlComputeSliceTileSwizzle * @brief * Entry of EgBasedLib ComputeSurfaceCoordFromAddr * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE EgBasedLib::HwlComputeSliceTileSwizzle( const ADDR_COMPUTE_SLICESWIZZLE_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SLICESWIZZLE_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; if (pIn->pTileInfo && (pIn->pTileInfo->banks > 0)) { pOut->tileSwizzle = ComputeSliceTileSwizzle(pIn->tileMode, pIn->baseSwizzle, pIn->slice, pIn->baseAddr, pIn->pTileInfo); } else { retCode = ADDR_INVALIDPARAMS; } return retCode; } /** **************************************************************************************************** * EgBasedLib::HwlComputeHtileBpp * * @brief * Compute htile bpp * * @return * Htile bpp **************************************************************************************************** */ UINT_32 EgBasedLib::HwlComputeHtileBpp( BOOL_32 isWidth8, ///< [in] TRUE if block width is 8 BOOL_32 isHeight8 ///< [in] TRUE if block height is 8 ) const { // only support 8x8 mode ADDR_ASSERT(isWidth8 && isHeight8); return 32; } /** **************************************************************************************************** * EgBasedLib::HwlComputeHtileBaseAlign * * @brief * Compute htile base alignment * * @return * Htile base alignment **************************************************************************************************** */ UINT_32 EgBasedLib::HwlComputeHtileBaseAlign( BOOL_32 isTcCompatible, ///< [in] if TC compatible BOOL_32 isLinear, ///< [in] if it is linear mode ADDR_TILEINFO* pTileInfo ///< [in] Tile info ) const { UINT_32 baseAlign = m_pipeInterleaveBytes * HwlGetPipes(pTileInfo); if (isTcCompatible) { ADDR_ASSERT(pTileInfo != NULL); if (pTileInfo) { baseAlign *= pTileInfo->banks; } } return baseAlign; } /** **************************************************************************************************** * EgBasedLib::HwlGetPitchAlignmentMicroTiled * * @brief * Compute 1D tiled surface pitch alignment, calculation results are returned through * output parameters. * * @return * pitch alignment **************************************************************************************************** */ UINT_32 EgBasedLib::HwlGetPitchAlignmentMicroTiled( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 numSamples ///< [in] number of samples ) const { UINT_32 pitchAlign; UINT_32 microTileThickness = Thickness(tileMode); UINT_32 pixelsPerMicroTile; UINT_32 pixelsPerPipeInterleave; UINT_32 microTilesPerPipeInterleave; // // Special workaround for depth/stencil buffer, use 8 bpp to meet larger requirement for // stencil buffer since pitch alignment is related to bpp. // For a depth only buffer do not set this. // // Note: this actually does not work for mipmap but mipmap depth texture is not really // sampled with mipmap. // if (flags.depth && (flags.noStencil == FALSE)) { bpp = 8; } pixelsPerMicroTile = MicroTilePixels * microTileThickness; pixelsPerPipeInterleave = BYTES_TO_BITS(m_pipeInterleaveBytes) / (bpp * numSamples); microTilesPerPipeInterleave = pixelsPerPipeInterleave / pixelsPerMicroTile; pitchAlign = Max(MicroTileWidth, microTilesPerPipeInterleave * MicroTileWidth); return pitchAlign; } /** **************************************************************************************************** * EgBasedLib::HwlGetSizeAdjustmentMicroTiled * * @brief * Adjust 1D tiled surface pitch and slice size * * @return * Logical slice size in bytes **************************************************************************************************** */ UINT_64 EgBasedLib::HwlGetSizeAdjustmentMicroTiled( UINT_32 thickness, ///< [in] thickness UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 numSamples, ///< [in] number of samples UINT_32 baseAlign, ///< [in] base alignment UINT_32 pitchAlign, ///< [in] pitch alignment UINT_32* pPitch, ///< [in,out] pointer to pitch UINT_32* pHeight ///< [in,out] pointer to height ) const { UINT_64 logicalSliceSize; ASSERTED UINT_64 physicalSliceSize; UINT_32 pitch = *pPitch; UINT_32 height = *pHeight; // Logical slice: pitch * height * bpp * numSamples (no 1D MSAA so actually numSamples == 1) logicalSliceSize = BITS_TO_BYTES(static_cast(pitch) * height * bpp * numSamples); // Physical slice: multiplied by thickness physicalSliceSize = logicalSliceSize * thickness; // // R800 will always pad physical slice size to baseAlign which is pipe_interleave_bytes // ADDR_ASSERT((physicalSliceSize % baseAlign) == 0); return logicalSliceSize; } /** **************************************************************************************************** * EgBasedLib::HwlStereoCheckRightOffsetPadding * * @brief * check if the height needs extra padding for stereo right eye offset, to avoid swizzling * * @return * TRUE is the extra padding is needed * **************************************************************************************************** */ UINT_32 EgBasedLib::HwlStereoCheckRightOffsetPadding( ADDR_TILEINFO* pTileInfo ///< Tiling info ) const { UINT_32 stereoHeightAlign = 0; if (pTileInfo->macroAspectRatio > 2) { // Since 3D rendering treats right eye surface starting from y == "eye height" while // display engine treats it to be 0, so the bank bits may be different. // Additional padding in height is required to make sure it's possible // to achieve synonym by adjusting bank swizzle of right eye surface. static const UINT_32 StereoAspectRatio = 2; stereoHeightAlign = pTileInfo->banks * pTileInfo->bankHeight * MicroTileHeight / StereoAspectRatio; } return stereoHeightAlign; } } // V1 } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/r800/egbaddrlib.h000066400000000000000000000413201420110115200236050ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file egbaddrlib.h * @brief Contains the EgBasedLib class definition. **************************************************************************************************** */ #ifndef __EG_BASED_ADDR_LIB_H__ #define __EG_BASED_ADDR_LIB_H__ #include "addrlib1.h" namespace rocr { namespace Addr { namespace V1 { /// Structures for functions struct CoordFromBankPipe { UINT_32 xBits : 3; UINT_32 yBits : 4; UINT_32 xBit3 : 1; UINT_32 xBit4 : 1; UINT_32 xBit5 : 1; UINT_32 yBit3 : 1; UINT_32 yBit4 : 1; UINT_32 yBit5 : 1; UINT_32 yBit6 : 1; }; /** **************************************************************************************************** * @brief This class is the Evergreen based address library * @note Abstract class **************************************************************************************************** */ class EgBasedLib : public Lib { protected: EgBasedLib(const Client* pClient); virtual ~EgBasedLib(); public: /// Surface info functions // NOTE: DispatchComputeSurfaceInfo using TileInfo takes both an input and an output. // On input: // One or more fields may be 0 to be calculated/defaulted - pre-SI h/w. // H/W using tile mode index only accepts none or all 0's - SI and newer h/w. // It then returns the actual tiling configuration used. // Other methods' TileInfo must be valid on entry BOOL_32 DispatchComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; ADDR_E_RETURNCODE DispatchComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut); protected: // Hwl interface virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceAddrFromCoord( const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSurfaceCoordFromAddr( const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeSliceTileSwizzle( const ADDR_COMPUTE_SLICESWIZZLE_INPUT* pIn, ADDR_COMPUTE_SLICESWIZZLE_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlExtractBankPipeSwizzle( const ADDR_EXTRACT_BANKPIPE_SWIZZLE_INPUT* pIn, ADDR_EXTRACT_BANKPIPE_SWIZZLE_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlCombineBankPipeSwizzle( UINT_32 bankSwizzle, UINT_32 pipeSwizzle, ADDR_TILEINFO* pTileInfo, UINT_64 baseAddr, UINT_32* pTileSwizzle) const; virtual ADDR_E_RETURNCODE HwlComputeBaseSwizzle( const ADDR_COMPUTE_BASE_SWIZZLE_INPUT* pIn, ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlConvertTileInfoToHW( const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut) const; virtual UINT_32 HwlComputeHtileBpp( BOOL_32 isWidth8, BOOL_32 isHeight8) const; virtual UINT_32 HwlComputeHtileBaseAlign( BOOL_32 isTcCompatible, BOOL_32 isLinear, ADDR_TILEINFO* pTileInfo) const; virtual ADDR_E_RETURNCODE HwlComputeFmaskInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pOut); virtual ADDR_E_RETURNCODE HwlComputeFmaskAddrFromCoord( const ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlComputeFmaskCoordFromAddr( const ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut) const; virtual BOOL_32 HwlGetAlignmentInfoMacroTiled( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32* pPitchAlign, UINT_32* pHeightAlign, UINT_32* pSizeAlign) const; virtual UINT_32 HwlComputeQbStereoRightSwizzle( ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pInfo) const; virtual VOID HwlComputePixelCoordFromOffset( UINT_32 offset, UINT_32 bpp, UINT_32 numSamples, AddrTileMode tileMode, UINT_32 tileBase, UINT_32 compBits, UINT_32* pX, UINT_32* pY, UINT_32* pSlice, UINT_32* pSample, AddrTileType microTileType, BOOL_32 isDepthSampleOrder) const; /// Return Cmask block max virtual BOOL_32 HwlGetMaxCmaskBlockMax() const { return 0x3FFF; // 14 bits, 0n16383 } // Sub-hwl interface /// Pure virtual function to setup tile info (indices) if client requests to do so virtual VOID HwlSetupTileInfo( AddrTileMode tileMode, ADDR_SURFACE_FLAGS flags, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, ADDR_TILEINFO* inputTileInfo, ADDR_TILEINFO* outputTileInfo, AddrTileType inTileType, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const = 0; /// Pure virtual function to get pitch alignment for linear modes virtual UINT_32 HwlGetPitchAlignmentLinear(UINT_32 bpp, ADDR_SURFACE_FLAGS flags) const = 0; /// Pure virtual function to get size adjustment for linear modes virtual UINT_64 HwlGetSizeAdjustmentLinear( AddrTileMode tileMode, UINT_32 bpp, UINT_32 numSamples, UINT_32 baseAlign, UINT_32 pitchAlign, UINT_32 *pPitch, UINT_32 *pHeight, UINT_32 *pHeightAlign) const = 0; virtual UINT_32 HwlGetPitchAlignmentMicroTiled( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples) const; virtual UINT_64 HwlGetSizeAdjustmentMicroTiled( UINT_32 thickness, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples, UINT_32 baseAlign, UINT_32 pitchAlign, UINT_32 *pPitch, UINT_32 *pHeight) const; /// Pure virtual function to do extra sanity check virtual BOOL_32 HwlSanityCheckMacroTiled( ADDR_TILEINFO* pTileInfo) const = 0; /// Pure virtual function to check current level to be the last macro tiled one virtual VOID HwlCheckLastMacroTiledLvl( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const = 0; /// Adjusts bank before bank is modified by rotation virtual UINT_32 HwlPreAdjustBank( UINT_32 tileX, UINT_32 bank, ADDR_TILEINFO* pTileInfo) const = 0; virtual VOID HwlComputeSurfaceCoord2DFromBankPipe( AddrTileMode tileMode, UINT_32* pX, UINT_32* pY, UINT_32 slice, UINT_32 bank, UINT_32 pipe, UINT_32 bankSwizzle, UINT_32 pipeSwizzle, UINT_32 tileSlices, BOOL_32 ignoreSE, ADDR_TILEINFO* pTileInfo) const = 0; virtual BOOL_32 HwlTileInfoEqual( const ADDR_TILEINFO* pLeft, const ADDR_TILEINFO* pRight) const; virtual AddrTileMode HwlDegradeThickTileMode( AddrTileMode baseTileMode, UINT_32 numSlices, UINT_32* pBytesPerTile) const; virtual INT_32 HwlPostCheckTileIndex( const ADDR_TILEINFO* pInfo, AddrTileMode mode, AddrTileType type, INT curIndex = TileIndexInvalid) const { return TileIndexInvalid; } virtual VOID HwlFmaskPreThunkSurfInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pFmaskIn, const ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut, ADDR_COMPUTE_SURFACE_INFO_INPUT* pSurfIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut) const { } virtual VOID HwlFmaskPostThunkSurfInfo( const ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut) const { } virtual UINT_32 HwlStereoCheckRightOffsetPadding(ADDR_TILEINFO* pTileInfo) const; virtual BOOL_32 HwlReduceBankWidthHeight( UINT_32 tileSize, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples, UINT_32 bankHeightAlign, UINT_32 pipes, ADDR_TILEINFO* pTileInfo) const; // Protected non-virtual functions /// Mip level functions AddrTileMode ComputeSurfaceMipLevelTileMode( AddrTileMode baseTileMode, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSlices, UINT_32 numSamples, UINT_32 pitchAlign, UINT_32 heightAlign, ADDR_TILEINFO* pTileInfo) const; /// Swizzle functions VOID ExtractBankPipeSwizzle( UINT_32 base256b, ADDR_TILEINFO* pTileInfo, UINT_32* pBankSwizzle, UINT_32* pPipeSwizzle) const; UINT_32 GetBankPipeSwizzle( UINT_32 bankSwizzle, UINT_32 pipeSwizzle, UINT_64 baseAddr, ADDR_TILEINFO* pTileInfo) const; UINT_32 ComputeSliceTileSwizzle( AddrTileMode tileMode, UINT_32 baseSwizzle, UINT_32 slice, UINT_64 baseAddr, ADDR_TILEINFO* pTileInfo) const; /// Addressing functions virtual ADDR_E_RETURNCODE ComputeBankEquation( UINT_32 log2BytesPP, UINT_32 threshX, UINT_32 threshY, ADDR_TILEINFO* pTileInfo, ADDR_EQUATION* pEquation) const { return ADDR_NOTSUPPORTED; } UINT_32 ComputeBankFromCoord( UINT_32 x, UINT_32 y, UINT_32 slice, AddrTileMode tileMode, UINT_32 bankSwizzle, UINT_32 tileSpitSlice, ADDR_TILEINFO* pTileInfo) const; UINT_32 ComputeBankFromAddr( UINT_64 addr, UINT_32 numBanks, UINT_32 numPipes) const; UINT_32 ComputePipeRotation( AddrTileMode tileMode, UINT_32 numPipes) const; UINT_32 ComputeBankRotation( AddrTileMode tileMode, UINT_32 numBanks, UINT_32 numPipes) const; VOID ComputeSurfaceCoord2DFromBankPipe( AddrTileMode tileMode, UINT_32 x, UINT_32 y, UINT_32 slice, UINT_32 bank, UINT_32 pipe, UINT_32 bankSwizzle, UINT_32 pipeSwizzle, UINT_32 tileSlices, ADDR_TILEINFO* pTileInfo, CoordFromBankPipe *pOutput) const; /// Htile/Cmask functions UINT_64 ComputeHtileBytes( UINT_32 pitch, UINT_32 height, UINT_32 bpp, BOOL_32 isLinear, UINT_32 numSlices, UINT_64* sliceBytes, UINT_32 baseAlign) const; ADDR_E_RETURNCODE ComputeMacroTileEquation( UINT_32 log2BytesPP, AddrTileMode tileMode, AddrTileType microTileType, ADDR_TILEINFO* pTileInfo, ADDR_EQUATION* pEquation) const; // Static functions static BOOL_32 IsTileInfoAllZero(const ADDR_TILEINFO* pTileInfo); static UINT_32 ComputeFmaskNumPlanesFromNumSamples(UINT_32 numSamples); static UINT_32 ComputeFmaskResolvedBppFromNumSamples(UINT_32 numSamples); virtual VOID HwlComputeSurfaceAlignmentsMacroTiled( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 mipLevel, UINT_32 numSamples, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const { } private: BOOL_32 ComputeSurfaceInfoLinear( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut, UINT_32 padDims) const; BOOL_32 ComputeSurfaceInfoMicroTiled( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut, UINT_32 padDims, AddrTileMode expTileMode) const; BOOL_32 ComputeSurfaceInfoMacroTiled( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut, UINT_32 padDims, AddrTileMode expTileMode) const; BOOL_32 ComputeSurfaceAlignmentsLinear( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32* pBaseAlign, UINT_32* pPitchAlign, UINT_32* pHeightAlign) const; BOOL_32 ComputeSurfaceAlignmentsMicroTiled( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 mipLevel, UINT_32 numSamples, UINT_32* pBaseAlign, UINT_32* pPitchAlign, UINT_32* pHeightAlign) const; BOOL_32 ComputeSurfaceAlignmentsMacroTiled( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 mipLevel, UINT_32 numSamples, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; /// Surface addressing functions UINT_64 DispatchComputeSurfaceAddrFromCoord( const ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_SURFACE_ADDRFROMCOORD_OUTPUT* pOut) const; VOID DispatchComputeSurfaceCoordFromAddr( const ADDR_COMPUTE_SURFACE_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_SURFACE_COORDFROMADDR_OUTPUT* pOut) const; UINT_64 ComputeSurfaceAddrFromCoordMicroTiled( UINT_32 x, UINT_32 y, UINT_32 slice, UINT_32 sample, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, AddrTileMode tileMode, AddrTileType microTileType, BOOL_32 isDepthSampleOrder, UINT_32* pBitPosition) const; UINT_64 ComputeSurfaceAddrFromCoordMacroTiled( UINT_32 x, UINT_32 y, UINT_32 slice, UINT_32 sample, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, AddrTileMode tileMode, AddrTileType microTileType, BOOL_32 ignoreSE, BOOL_32 isDepthSampleOrder, UINT_32 pipeSwizzle, UINT_32 bankSwizzle, ADDR_TILEINFO* pTileInfo, UINT_32* pBitPosition) const; VOID ComputeSurfaceCoordFromAddrMacroTiled( UINT_64 addr, UINT_32 bitPosition, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, AddrTileMode tileMode, UINT_32 tileBase, UINT_32 compBits, AddrTileType microTileType, BOOL_32 ignoreSE, BOOL_32 isDepthSampleOrder, UINT_32 pipeSwizzle, UINT_32 bankSwizzle, ADDR_TILEINFO* pTileInfo, UINT_32* pX, UINT_32* pY, UINT_32* pSlice, UINT_32* pSample) const; /// Fmask functions UINT_64 DispatchComputeFmaskAddrFromCoord( const ADDR_COMPUTE_FMASK_ADDRFROMCOORD_INPUT* pIn, ADDR_COMPUTE_FMASK_ADDRFROMCOORD_OUTPUT* pOut) const; VOID DispatchComputeFmaskCoordFromAddr( const ADDR_COMPUTE_FMASK_COORDFROMADDR_INPUT* pIn, ADDR_COMPUTE_FMASK_COORDFROMADDR_OUTPUT* pOut) const; // FMASK related methods - private UINT_64 ComputeFmaskAddrFromCoordMicroTiled( UINT_32 x, UINT_32 y, UINT_32 slice, UINT_32 sample, UINT_32 plane, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, AddrTileMode tileMode, BOOL_32 resolved, UINT_32* pBitPosition) const; VOID ComputeFmaskCoordFromAddrMicroTiled( UINT_64 addr, UINT_32 bitPosition, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, AddrTileMode tileMode, BOOL_32 resolved, UINT_32* pX, UINT_32* pY, UINT_32* pSlice, UINT_32* pSample, UINT_32* pPlane) const; VOID ComputeFmaskCoordFromAddrMacroTiled( UINT_64 addr, UINT_32 bitPosition, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, AddrTileMode tileMode, UINT_32 pipeSwizzle, UINT_32 bankSwizzle, BOOL_32 ignoreSE, ADDR_TILEINFO* pTileInfo, BOOL_32 resolved, UINT_32* pX, UINT_32* pY, UINT_32* pSlice, UINT_32* pSample, UINT_32* pPlane) const; UINT_64 ComputeFmaskAddrFromCoordMacroTiled( UINT_32 x, UINT_32 y, UINT_32 slice, UINT_32 sample, UINT_32 plane, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, AddrTileMode tileMode, UINT_32 pipeSwizzle, UINT_32 bankSwizzle, BOOL_32 ignoreSE, ADDR_TILEINFO* pTileInfo, BOOL_32 resolved, UINT_32* pBitPosition) const; /// Sanity check functions BOOL_32 SanityCheckMacroTiled( ADDR_TILEINFO* pTileInfo) const; protected: UINT_32 m_ranks; ///< Number of ranks - MC_ARB_RAMCFG.NOOFRANK UINT_32 m_logicalBanks; ///< Logical banks = m_banks * m_ranks if m_banks != 16 UINT_32 m_bankInterleave; ///< Bank interleave, as a multiple of pipe interleave size }; } // V1 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/r800/siaddrlib.cpp000066400000000000000000003665701420110115200240370ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file siaddrlib.cpp * @brief Contains the implementation for the SiLib class. **************************************************************************************************** */ #include "siaddrlib.h" #include "si_gb_reg.h" #include "amdgpu_asic_addr.h" //////////////////////////////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////////////////////////////// namespace rocr { namespace Addr { /** **************************************************************************************************** * SiHwlInit * * @brief * Creates an SiLib object. * * @return * Returns an SiLib object pointer. **************************************************************************************************** */ Lib* SiHwlInit(const Client* pClient) { return V1::SiLib::CreateObj(pClient); } namespace V1 { // We don't support MSAA for equation const BOOL_32 SiLib::m_EquationSupport[SiLib::TileTableSize][SiLib::MaxNumElementBytes] = { {TRUE, TRUE, TRUE, FALSE, FALSE}, // 0, non-AA compressed depth or any stencil {FALSE, FALSE, FALSE, FALSE, FALSE}, // 1, 2xAA/4xAA compressed depth with or without stencil {FALSE, FALSE, FALSE, FALSE, FALSE}, // 2, 8xAA compressed depth with or without stencil {FALSE, TRUE, FALSE, FALSE, FALSE}, // 3, 16 bpp depth PRT (non-MSAA), don't support uncompressed depth {TRUE, TRUE, TRUE, FALSE, FALSE}, // 4, 1D depth {FALSE, FALSE, FALSE, FALSE, FALSE}, // 5, 16 bpp depth PRT (4xMSAA) {FALSE, FALSE, TRUE, FALSE, FALSE}, // 6, 32 bpp depth PRT (non-MSAA) {FALSE, FALSE, FALSE, FALSE, FALSE}, // 7, 32 bpp depth PRT (4xMSAA) {TRUE, TRUE, TRUE, TRUE, TRUE }, // 8, Linear {TRUE, TRUE, TRUE, TRUE, TRUE }, // 9, 1D display {TRUE, FALSE, FALSE, FALSE, FALSE}, // 10, 8 bpp color (displayable) {FALSE, TRUE, FALSE, FALSE, FALSE}, // 11, 16 bpp color (displayable) {FALSE, FALSE, TRUE, TRUE, FALSE}, // 12, 32/64 bpp color (displayable) {TRUE, TRUE, TRUE, TRUE, TRUE }, // 13, 1D thin {TRUE, FALSE, FALSE, FALSE, FALSE}, // 14, 8 bpp color non-displayable {FALSE, TRUE, FALSE, FALSE, FALSE}, // 15, 16 bpp color non-displayable {FALSE, FALSE, TRUE, FALSE, FALSE}, // 16, 32 bpp color non-displayable {FALSE, FALSE, FALSE, TRUE, TRUE }, // 17, 64/128 bpp color non-displayable {TRUE, TRUE, TRUE, TRUE, TRUE }, // 18, 1D THICK {FALSE, FALSE, FALSE, FALSE, FALSE}, // 19, 2D XTHICK {FALSE, FALSE, FALSE, FALSE, FALSE}, // 20, 2D THICK {TRUE, FALSE, FALSE, FALSE, FALSE}, // 21, 8 bpp 2D PRTs (non-MSAA) {FALSE, TRUE, FALSE, FALSE, FALSE}, // 22, 16 bpp 2D PRTs (non-MSAA) {FALSE, FALSE, TRUE, FALSE, FALSE}, // 23, 32 bpp 2D PRTs (non-MSAA) {FALSE, FALSE, FALSE, TRUE, FALSE}, // 24, 64 bpp 2D PRTs (non-MSAA) {FALSE, FALSE, FALSE, FALSE, TRUE }, // 25, 128bpp 2D PRTs (non-MSAA) {FALSE, FALSE, FALSE, FALSE, FALSE}, // 26, none {FALSE, FALSE, FALSE, FALSE, FALSE}, // 27, none {FALSE, FALSE, FALSE, FALSE, FALSE}, // 28, none {FALSE, FALSE, FALSE, FALSE, FALSE}, // 29, none {FALSE, FALSE, FALSE, FALSE, FALSE}, // 30, 64bpp 2D PRTs (4xMSAA) {FALSE, FALSE, FALSE, FALSE, FALSE}, // 31, none }; /** **************************************************************************************************** * SiLib::SiLib * * @brief * Constructor * **************************************************************************************************** */ SiLib::SiLib(const Client* pClient) : EgBasedLib(pClient), m_noOfEntries(0), m_numEquations(0) { m_class = SI_ADDRLIB; memset(&m_settings, 0, sizeof(m_settings)); } /** **************************************************************************************************** * SiLib::~SiLib * * @brief * Destructor **************************************************************************************************** */ SiLib::~SiLib() { } /** **************************************************************************************************** * SiLib::HwlGetPipes * * @brief * Get number pipes * @return * num pipes **************************************************************************************************** */ UINT_32 SiLib::HwlGetPipes( const ADDR_TILEINFO* pTileInfo ///< [in] Tile info ) const { UINT_32 numPipes; if (pTileInfo) { numPipes = GetPipePerSurf(pTileInfo->pipeConfig); } else { ADDR_ASSERT_ALWAYS(); numPipes = m_pipes; // Suppose we should still have a global pipes } return numPipes; } /** **************************************************************************************************** * SiLib::GetPipePerSurf * @brief * get pipe num base on inputing tileinfo->pipeconfig * @return * pipe number **************************************************************************************************** */ UINT_32 SiLib::GetPipePerSurf( AddrPipeCfg pipeConfig ///< [in] pipe config ) const { UINT_32 numPipes = 0; switch (pipeConfig) { case ADDR_PIPECFG_P2: numPipes = 2; break; case ADDR_PIPECFG_P4_8x16: case ADDR_PIPECFG_P4_16x16: case ADDR_PIPECFG_P4_16x32: case ADDR_PIPECFG_P4_32x32: numPipes = 4; break; case ADDR_PIPECFG_P8_16x16_8x16: case ADDR_PIPECFG_P8_16x32_8x16: case ADDR_PIPECFG_P8_32x32_8x16: case ADDR_PIPECFG_P8_16x32_16x16: case ADDR_PIPECFG_P8_32x32_16x16: case ADDR_PIPECFG_P8_32x32_16x32: case ADDR_PIPECFG_P8_32x64_32x32: numPipes = 8; break; case ADDR_PIPECFG_P16_32x32_8x16: case ADDR_PIPECFG_P16_32x32_16x16: numPipes = 16; break; default: ADDR_ASSERT(!"Invalid pipe config"); numPipes = m_pipes; } return numPipes; } /** **************************************************************************************************** * SiLib::ComputeBankEquation * * @brief * Compute bank equation * * @return * If equation can be computed **************************************************************************************************** */ ADDR_E_RETURNCODE SiLib::ComputeBankEquation( UINT_32 log2BytesPP, ///< [in] log2 of bytes per pixel UINT_32 threshX, ///< [in] threshold for x channel UINT_32 threshY, ///< [in] threshold for y channel ADDR_TILEINFO* pTileInfo, ///< [in] tile info ADDR_EQUATION* pEquation ///< [out] bank equation ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; UINT_32 pipes = HwlGetPipes(pTileInfo); UINT_32 bankXStart = 3 + Log2(pipes) + Log2(pTileInfo->bankWidth); UINT_32 bankYStart = 3 + Log2(pTileInfo->bankHeight); ADDR_CHANNEL_SETTING x3 = InitChannel(1, 0, log2BytesPP + bankXStart); ADDR_CHANNEL_SETTING x4 = InitChannel(1, 0, log2BytesPP + bankXStart + 1); ADDR_CHANNEL_SETTING x5 = InitChannel(1, 0, log2BytesPP + bankXStart + 2); ADDR_CHANNEL_SETTING x6 = InitChannel(1, 0, log2BytesPP + bankXStart + 3); ADDR_CHANNEL_SETTING y3 = InitChannel(1, 1, bankYStart); ADDR_CHANNEL_SETTING y4 = InitChannel(1, 1, bankYStart + 1); ADDR_CHANNEL_SETTING y5 = InitChannel(1, 1, bankYStart + 2); ADDR_CHANNEL_SETTING y6 = InitChannel(1, 1, bankYStart + 3); x3.value = (threshX > bankXStart) ? x3.value : 0; x4.value = (threshX > bankXStart + 1) ? x4.value : 0; x5.value = (threshX > bankXStart + 2) ? x5.value : 0; x6.value = (threshX > bankXStart + 3) ? x6.value : 0; y3.value = (threshY > bankYStart) ? y3.value : 0; y4.value = (threshY > bankYStart + 1) ? y4.value : 0; y5.value = (threshY > bankYStart + 2) ? y5.value : 0; y6.value = (threshY > bankYStart + 3) ? y6.value : 0; switch (pTileInfo->banks) { case 16: if (pTileInfo->macroAspectRatio == 1) { pEquation->addr[0] = y6; pEquation->xor1[0] = x3; pEquation->addr[1] = y5; pEquation->xor1[1] = y6; pEquation->xor2[1] = x4; pEquation->addr[2] = y4; pEquation->xor1[2] = x5; pEquation->addr[3] = y3; pEquation->xor1[3] = x6; } else if (pTileInfo->macroAspectRatio == 2) { pEquation->addr[0] = x3; pEquation->xor1[0] = y6; pEquation->addr[1] = y5; pEquation->xor1[1] = y6; pEquation->xor2[1] = x4; pEquation->addr[2] = y4; pEquation->xor1[2] = x5; pEquation->addr[3] = y3; pEquation->xor1[3] = x6; } else if (pTileInfo->macroAspectRatio == 4) { pEquation->addr[0] = x3; pEquation->xor1[0] = y6; pEquation->addr[1] = x4; pEquation->xor1[1] = y5; pEquation->xor2[1] = y6; pEquation->addr[2] = y4; pEquation->xor1[2] = x5; pEquation->addr[3] = y3; pEquation->xor1[3] = x6; } else if (pTileInfo->macroAspectRatio == 8) { pEquation->addr[0] = x3; pEquation->xor1[0] = y6; pEquation->addr[1] = x4; pEquation->xor1[1] = y5; pEquation->xor2[1] = y6; pEquation->addr[2] = x5; pEquation->xor1[2] = y4; pEquation->addr[3] = y3; pEquation->xor1[3] = x6; } else { ADDR_ASSERT_ALWAYS(); } pEquation->numBits = 4; break; case 8: if (pTileInfo->macroAspectRatio == 1) { pEquation->addr[0] = y5; pEquation->xor1[0] = x3; pEquation->addr[1] = y4; pEquation->xor1[1] = y5; pEquation->xor2[1] = x4; pEquation->addr[2] = y3; pEquation->xor1[2] = x5; } else if (pTileInfo->macroAspectRatio == 2) { pEquation->addr[0] = x3; pEquation->xor1[0] = y5; pEquation->addr[1] = y4; pEquation->xor1[1] = y5; pEquation->xor2[1] = x4; pEquation->addr[2] = y3; pEquation->xor1[2] = x5; } else if (pTileInfo->macroAspectRatio == 4) { pEquation->addr[0] = x3; pEquation->xor1[0] = y5; pEquation->addr[1] = x4; pEquation->xor1[1] = y4; pEquation->xor2[1] = y5; pEquation->addr[2] = y3; pEquation->xor1[2] = x5; } else { ADDR_ASSERT_ALWAYS(); } pEquation->numBits = 3; break; case 4: if (pTileInfo->macroAspectRatio == 1) { pEquation->addr[0] = y4; pEquation->xor1[0] = x3; pEquation->addr[1] = y3; pEquation->xor1[1] = x4; } else if (pTileInfo->macroAspectRatio == 2) { pEquation->addr[0] = x3; pEquation->xor1[0] = y4; pEquation->addr[1] = y3; pEquation->xor1[1] = x4; } else { pEquation->addr[0] = x3; pEquation->xor1[0] = y4; pEquation->addr[1] = x4; pEquation->xor1[1] = y3; } pEquation->numBits = 2; break; case 2: if (pTileInfo->macroAspectRatio == 1) { pEquation->addr[0] = y3; pEquation->xor1[0] = x3; } else { pEquation->addr[0] = x3; pEquation->xor1[0] = y3; } pEquation->numBits = 1; break; default: pEquation->numBits = 0; retCode = ADDR_NOTSUPPORTED; ADDR_ASSERT_ALWAYS(); break; } for (UINT_32 i = 0; i < pEquation->numBits; i++) { if (pEquation->addr[i].value == 0) { if (pEquation->xor1[i].value == 0) { // 00X -> X00 pEquation->addr[i].value = pEquation->xor2[i].value; pEquation->xor2[i].value = 0; } else { pEquation->addr[i].value = pEquation->xor1[i].value; if (pEquation->xor2[i].value != 0) { // 0XY -> XY0 pEquation->xor1[i].value = pEquation->xor2[i].value; pEquation->xor2[i].value = 0; } else { // 0X0 -> X00 pEquation->xor1[i].value = 0; } } } else if (pEquation->xor1[i].value == 0) { if (pEquation->xor2[i].value != 0) { // X0Y -> XY0 pEquation->xor1[i].value = pEquation->xor2[i].value; pEquation->xor2[i].value = 0; } } } if ((pTileInfo->bankWidth == 1) && ((pTileInfo->pipeConfig == ADDR_PIPECFG_P4_32x32) || (pTileInfo->pipeConfig == ADDR_PIPECFG_P8_32x64_32x32))) { retCode = ADDR_NOTSUPPORTED; } return retCode; } /** **************************************************************************************************** * SiLib::ComputePipeEquation * * @brief * Compute pipe equation * * @return * If equation can be computed **************************************************************************************************** */ ADDR_E_RETURNCODE SiLib::ComputePipeEquation( UINT_32 log2BytesPP, ///< [in] Log2 of bytes per pixel UINT_32 threshX, ///< [in] Threshold for X channel UINT_32 threshY, ///< [in] Threshold for Y channel ADDR_TILEINFO* pTileInfo, ///< [in] Tile info ADDR_EQUATION* pEquation ///< [out] Pipe configure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; ADDR_CHANNEL_SETTING* pAddr = pEquation->addr; ADDR_CHANNEL_SETTING* pXor1 = pEquation->xor1; ADDR_CHANNEL_SETTING* pXor2 = pEquation->xor2; ADDR_CHANNEL_SETTING x3 = InitChannel(1, 0, 3 + log2BytesPP); ADDR_CHANNEL_SETTING x4 = InitChannel(1, 0, 4 + log2BytesPP); ADDR_CHANNEL_SETTING x5 = InitChannel(1, 0, 5 + log2BytesPP); ADDR_CHANNEL_SETTING x6 = InitChannel(1, 0, 6 + log2BytesPP); ADDR_CHANNEL_SETTING y3 = InitChannel(1, 1, 3); ADDR_CHANNEL_SETTING y4 = InitChannel(1, 1, 4); ADDR_CHANNEL_SETTING y5 = InitChannel(1, 1, 5); ADDR_CHANNEL_SETTING y6 = InitChannel(1, 1, 6); x3.value = (threshX > 3) ? x3.value : 0; x4.value = (threshX > 4) ? x4.value : 0; x5.value = (threshX > 5) ? x5.value : 0; x6.value = (threshX > 6) ? x6.value : 0; y3.value = (threshY > 3) ? y3.value : 0; y4.value = (threshY > 4) ? y4.value : 0; y5.value = (threshY > 5) ? y5.value : 0; y6.value = (threshY > 6) ? y6.value : 0; switch (pTileInfo->pipeConfig) { case ADDR_PIPECFG_P2: pAddr[0] = x3; pXor1[0] = y3; pEquation->numBits = 1; break; case ADDR_PIPECFG_P4_8x16: pAddr[0] = x4; pXor1[0] = y3; pAddr[1] = x3; pXor1[1] = y4; pEquation->numBits = 2; break; case ADDR_PIPECFG_P4_16x16: pAddr[0] = x3; pXor1[0] = y3; pXor2[0] = x4; pAddr[1] = x4; pXor1[1] = y4; pEquation->numBits = 2; break; case ADDR_PIPECFG_P4_16x32: pAddr[0] = x3; pXor1[0] = y3; pXor2[0] = x4; pAddr[1] = x4; pXor1[1] = y5; pEquation->numBits = 2; break; case ADDR_PIPECFG_P4_32x32: pAddr[0] = x3; pXor1[0] = y3; pXor2[0] = x5; pAddr[1] = x5; pXor1[1] = y5; pEquation->numBits = 2; break; case ADDR_PIPECFG_P8_16x16_8x16: pAddr[0] = x4; pXor1[0] = y3; pXor2[0] = x5; pAddr[1] = x3; pXor1[1] = y5; pEquation->numBits = 3; break; case ADDR_PIPECFG_P8_16x32_8x16: pAddr[0] = x4; pXor1[0] = y3; pXor2[0] = x5; pAddr[1] = x3; pXor1[1] = y4; pAddr[2] = x4; pXor1[2] = y5; pEquation->numBits = 3; break; case ADDR_PIPECFG_P8_16x32_16x16: pAddr[0] = x3; pXor1[0] = y3; pXor2[0] = x4; pAddr[1] = x5; pXor1[1] = y4; pAddr[2] = x4; pXor1[2] = y5; pEquation->numBits = 3; break; case ADDR_PIPECFG_P8_32x32_8x16: pAddr[0] = x4; pXor1[0] = y3; pXor2[0] = x5; pAddr[1] = x3; pXor1[1] = y4; pAddr[2] = x5; pXor1[2] = y5; pEquation->numBits = 3; break; case ADDR_PIPECFG_P8_32x32_16x16: pAddr[0] = x3; pXor1[0] = y3; pXor2[0] = x4; pAddr[1] = x4; pXor1[1] = y4; pAddr[2] = x5; pXor1[2] = y5; pEquation->numBits = 3; break; case ADDR_PIPECFG_P8_32x32_16x32: pAddr[0] = x3; pXor1[0] = y3; pXor2[0] = x4; pAddr[1] = x4; pXor1[1] = y6; pAddr[2] = x5; pXor1[2] = y5; pEquation->numBits = 3; break; case ADDR_PIPECFG_P8_32x64_32x32: pAddr[0] = x3; pXor1[0] = y3; pXor2[0] = x5; pAddr[1] = x6; pXor1[1] = y5; pAddr[2] = x5; pXor1[2] = y6; pEquation->numBits = 3; break; case ADDR_PIPECFG_P16_32x32_8x16: pAddr[0] = x4; pXor1[0] = y3; pAddr[1] = x3; pXor1[1] = y4; pAddr[2] = x5; pXor1[2] = y6; pAddr[3] = x6; pXor1[3] = y5; pEquation->numBits = 4; break; case ADDR_PIPECFG_P16_32x32_16x16: pAddr[0] = x3; pXor1[0] = y3; pXor2[0] = x4; pAddr[1] = x4; pXor1[1] = y4; pAddr[2] = x5; pXor1[2] = y6; pAddr[3] = x6; pXor1[3] = y5; pEquation->numBits = 4; break; default: ADDR_UNHANDLED_CASE(); pEquation->numBits = 0; retCode = ADDR_NOTSUPPORTED; break; } if (m_settings.isVegaM && (pEquation->numBits == 4)) { ADDR_CHANNEL_SETTING addeMsb = pAddr[0]; ADDR_CHANNEL_SETTING xor1Msb = pXor1[0]; ADDR_CHANNEL_SETTING xor2Msb = pXor2[0]; pAddr[0] = pAddr[1]; pXor1[0] = pXor1[1]; pXor2[0] = pXor2[1]; pAddr[1] = pAddr[2]; pXor1[1] = pXor1[2]; pXor2[1] = pXor2[2]; pAddr[2] = pAddr[3]; pXor1[2] = pXor1[3]; pXor2[2] = pXor2[3]; pAddr[3] = addeMsb; pXor1[3] = xor1Msb; pXor2[3] = xor2Msb; } for (UINT_32 i = 0; i < pEquation->numBits; i++) { if (pAddr[i].value == 0) { if (pXor1[i].value == 0) { pAddr[i].value = pXor2[i].value; } else { pAddr[i].value = pXor1[i].value; pXor1[i].value = 0; } } } return retCode; } /** **************************************************************************************************** * SiLib::ComputePipeFromCoord * * @brief * Compute pipe number from coordinates * @return * Pipe number **************************************************************************************************** */ UINT_32 SiLib::ComputePipeFromCoord( UINT_32 x, ///< [in] x coordinate UINT_32 y, ///< [in] y coordinate UINT_32 slice, ///< [in] slice index AddrTileMode tileMode, ///< [in] tile mode UINT_32 pipeSwizzle, ///< [in] pipe swizzle BOOL_32 ignoreSE, ///< [in] TRUE if shader engines are ignored ADDR_TILEINFO* pTileInfo ///< [in] Tile info ) const { UINT_32 pipe; UINT_32 pipeBit0 = 0; UINT_32 pipeBit1 = 0; UINT_32 pipeBit2 = 0; UINT_32 pipeBit3 = 0; UINT_32 sliceRotation; UINT_32 numPipes = 0; UINT_32 tx = x / MicroTileWidth; UINT_32 ty = y / MicroTileHeight; UINT_32 x3 = _BIT(tx,0); UINT_32 x4 = _BIT(tx,1); UINT_32 x5 = _BIT(tx,2); UINT_32 x6 = _BIT(tx,3); UINT_32 y3 = _BIT(ty,0); UINT_32 y4 = _BIT(ty,1); UINT_32 y5 = _BIT(ty,2); UINT_32 y6 = _BIT(ty,3); switch (pTileInfo->pipeConfig) { case ADDR_PIPECFG_P2: pipeBit0 = x3 ^ y3; numPipes = 2; break; case ADDR_PIPECFG_P4_8x16: pipeBit0 = x4 ^ y3; pipeBit1 = x3 ^ y4; numPipes = 4; break; case ADDR_PIPECFG_P4_16x16: pipeBit0 = x3 ^ y3 ^ x4; pipeBit1 = x4 ^ y4; numPipes = 4; break; case ADDR_PIPECFG_P4_16x32: pipeBit0 = x3 ^ y3 ^ x4; pipeBit1 = x4 ^ y5; numPipes = 4; break; case ADDR_PIPECFG_P4_32x32: pipeBit0 = x3 ^ y3 ^ x5; pipeBit1 = x5 ^ y5; numPipes = 4; break; case ADDR_PIPECFG_P8_16x16_8x16: pipeBit0 = x4 ^ y3 ^ x5; pipeBit1 = x3 ^ y5; numPipes = 8; break; case ADDR_PIPECFG_P8_16x32_8x16: pipeBit0 = x4 ^ y3 ^ x5; pipeBit1 = x3 ^ y4; pipeBit2 = x4 ^ y5; numPipes = 8; break; case ADDR_PIPECFG_P8_16x32_16x16: pipeBit0 = x3 ^ y3 ^ x4; pipeBit1 = x5 ^ y4; pipeBit2 = x4 ^ y5; numPipes = 8; break; case ADDR_PIPECFG_P8_32x32_8x16: pipeBit0 = x4 ^ y3 ^ x5; pipeBit1 = x3 ^ y4; pipeBit2 = x5 ^ y5; numPipes = 8; break; case ADDR_PIPECFG_P8_32x32_16x16: pipeBit0 = x3 ^ y3 ^ x4; pipeBit1 = x4 ^ y4; pipeBit2 = x5 ^ y5; numPipes = 8; break; case ADDR_PIPECFG_P8_32x32_16x32: pipeBit0 = x3 ^ y3 ^ x4; pipeBit1 = x4 ^ y6; pipeBit2 = x5 ^ y5; numPipes = 8; break; case ADDR_PIPECFG_P8_32x64_32x32: pipeBit0 = x3 ^ y3 ^ x5; pipeBit1 = x6 ^ y5; pipeBit2 = x5 ^ y6; numPipes = 8; break; case ADDR_PIPECFG_P16_32x32_8x16: pipeBit0 = x4 ^ y3; pipeBit1 = x3 ^ y4; pipeBit2 = x5 ^ y6; pipeBit3 = x6 ^ y5; numPipes = 16; break; case ADDR_PIPECFG_P16_32x32_16x16: pipeBit0 = x3 ^ y3 ^ x4; pipeBit1 = x4 ^ y4; pipeBit2 = x5 ^ y6; pipeBit3 = x6 ^ y5; numPipes = 16; break; default: ADDR_UNHANDLED_CASE(); break; } if (m_settings.isVegaM && (numPipes == 16)) { UINT_32 pipeMsb = pipeBit0; pipeBit0 = pipeBit1; pipeBit1 = pipeBit2; pipeBit2 = pipeBit3; pipeBit3 = pipeMsb; } pipe = pipeBit0 | (pipeBit1 << 1) | (pipeBit2 << 2) | (pipeBit3 << 3); UINT_32 microTileThickness = Thickness(tileMode); // // Apply pipe rotation for the slice. // switch (tileMode) { case ADDR_TM_3D_TILED_THIN1: //fall through thin case ADDR_TM_3D_TILED_THICK: //fall through thick case ADDR_TM_3D_TILED_XTHICK: sliceRotation = Max(1, static_cast(numPipes / 2) - 1) * (slice / microTileThickness); break; default: sliceRotation = 0; break; } pipeSwizzle += sliceRotation; pipeSwizzle &= (numPipes - 1); pipe = pipe ^ pipeSwizzle; return pipe; } /** **************************************************************************************************** * SiLib::ComputeTileCoordFromPipeAndElemIdx * * @brief * Compute (x,y) of a tile within a macro tile from address * @return * Pipe number **************************************************************************************************** */ VOID SiLib::ComputeTileCoordFromPipeAndElemIdx( UINT_32 elemIdx, ///< [in] per pipe element index within a macro tile UINT_32 pipe, ///< [in] pipe index AddrPipeCfg pipeCfg, ///< [in] pipe config UINT_32 pitchInMacroTile, ///< [in] surface pitch in macro tile UINT_32 x, ///< [in] x coordinate of the (0,0) tile in a macro tile UINT_32 y, ///< [in] y coordinate of the (0,0) tile in a macro tile UINT_32* pX, ///< [out] x coordinate UINT_32* pY ///< [out] y coordinate ) const { UINT_32 pipebit0 = _BIT(pipe,0); UINT_32 pipebit1 = _BIT(pipe,1); UINT_32 pipebit2 = _BIT(pipe,2); UINT_32 pipebit3 = _BIT(pipe,3); UINT_32 elemIdx0 = _BIT(elemIdx,0); UINT_32 elemIdx1 = _BIT(elemIdx,1); UINT_32 elemIdx2 = _BIT(elemIdx,2); UINT_32 x3 = 0; UINT_32 x4 = 0; UINT_32 x5 = 0; UINT_32 x6 = 0; UINT_32 y3 = 0; UINT_32 y4 = 0; UINT_32 y5 = 0; UINT_32 y6 = 0; switch(pipeCfg) { case ADDR_PIPECFG_P2: x4 = elemIdx2; y4 = elemIdx1 ^ x4; y3 = elemIdx0 ^ x4; x3 = pipebit0 ^ y3; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); break; case ADDR_PIPECFG_P4_8x16: x4 = elemIdx1; y4 = elemIdx0 ^ x4; x3 = pipebit1 ^ y4; y3 = pipebit0 ^ x4; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); break; case ADDR_PIPECFG_P4_16x16: x4 = elemIdx1; y3 = elemIdx0 ^ x4; y4 = pipebit1 ^ x4; x3 = pipebit0 ^ y3 ^ x4; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); break; case ADDR_PIPECFG_P4_16x32: x3 = elemIdx0 ^ pipebit0; y5 = _BIT(y,5); x4 = pipebit1 ^ y5; y3 = pipebit0 ^ x3 ^ x4; y4 = elemIdx1 ^ x4; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); break; case ADDR_PIPECFG_P4_32x32: x4 = elemIdx2; y3 = elemIdx0 ^ x4; y4 = elemIdx1 ^ x4; if((pitchInMacroTile % 2) == 0) { //even y5 = _BIT(y,5); x5 = pipebit1 ^ y5; x3 = pipebit0 ^ y3 ^ x5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(3, x5, x4, x3); } else { //odd x5 = _BIT(x,5); x3 = pipebit0 ^ y3 ^ x5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); } break; case ADDR_PIPECFG_P8_16x16_8x16: x4 = elemIdx0; y5 = _BIT(y,5); x5 = _BIT(x,5); x3 = pipebit1 ^ y5; y4 = pipebit2 ^ x4; y3 = pipebit0 ^ x5 ^ x4; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); break; case ADDR_PIPECFG_P8_16x32_8x16: x3 = elemIdx0; y4 = pipebit1 ^ x3; y5 = _BIT(y,5); x5 = _BIT(x,5); x4 = pipebit2 ^ y5; y3 = pipebit0 ^ x4 ^ x5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); break; case ADDR_PIPECFG_P8_32x32_8x16: x4 = elemIdx1; y4 = elemIdx0 ^ x4; x3 = pipebit1 ^ y4; if((pitchInMacroTile % 2) == 0) { //even y5 = _BIT(y,5); x5 = _BIT(x,5); x5 = pipebit2 ^ y5; y3 = pipebit0 ^ x4 ^ x5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(3, x5, x4, x3); } else { //odd x5 = _BIT(x,5); y3 = pipebit0 ^ x4 ^ x5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); } break; case ADDR_PIPECFG_P8_16x32_16x16: x3 = elemIdx0; x5 = _BIT(x,5); y5 = _BIT(y,5); x4 = pipebit2 ^ y5; y4 = pipebit1 ^ x5; y3 = pipebit0 ^ x3 ^ x4; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); break; case ADDR_PIPECFG_P8_32x32_16x16: x4 = elemIdx1; y3 = elemIdx0 ^ x4; x3 = y3^x4^pipebit0; y4 = pipebit1 ^ x4; if((pitchInMacroTile % 2) == 0) { //even y5 = _BIT(y,5); x5 = pipebit2 ^ y5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(3, x5, x4, x3); } else { //odd *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); } break; case ADDR_PIPECFG_P8_32x32_16x32: if((pitchInMacroTile % 2) == 0) { //even y5 = _BIT(y,5); y6 = _BIT(y,6); x4 = pipebit1 ^ y6; y3 = elemIdx0 ^ x4; y4 = elemIdx1 ^ x4; x3 = pipebit0 ^ y3 ^ x4; x5 = pipebit2 ^ y5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(3, x5, x4, x3); } else { //odd y6 = _BIT(y,6); x4 = pipebit1 ^ y6; y3 = elemIdx0 ^ x4; y4 = elemIdx1 ^ x4; x3 = pipebit0 ^ y3 ^ x4; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(2, x4, x3); } break; case ADDR_PIPECFG_P8_32x64_32x32: x4 = elemIdx2; y3 = elemIdx0 ^ x4; y4 = elemIdx1 ^ x4; if((pitchInMacroTile % 4) == 0) { //multiple of 4 y5 = _BIT(y,5); y6 = _BIT(y,6); x5 = pipebit2 ^ y6; x6 = pipebit1 ^ y5; x3 = pipebit0 ^ y3 ^ x5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(4, x6, x5, x4, x3); } else { y6 = _BIT(y,6); x5 = pipebit2 ^ y6; x3 = pipebit0 ^ y3 ^ x5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(3, x5, x4, x3); } break; case ADDR_PIPECFG_P16_32x32_8x16: x4 = elemIdx1; y4 = elemIdx0 ^ x4; y3 = pipebit0 ^ x4; x3 = pipebit1 ^ y4; if((pitchInMacroTile % 4) == 0) { //multiple of 4 y5 = _BIT(y,5); y6 = _BIT(y,6); x5 = pipebit2 ^ y6; x6 = pipebit3 ^ y5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(4, x6, x5,x4, x3); } else { y6 = _BIT(y,6); x5 = pipebit2 ^ y6; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(3, x5, x4, x3); } break; case ADDR_PIPECFG_P16_32x32_16x16: x4 = elemIdx1; y3 = elemIdx0 ^ x4; y4 = pipebit1 ^ x4; x3 = pipebit0 ^ y3 ^ x4; if((pitchInMacroTile % 4) == 0) { //multiple of 4 y5 = _BIT(y,5); y6 = _BIT(y,6); x5 = pipebit2 ^ y6; x6 = pipebit3 ^ y5; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(4, x6, x5, x4, x3); } else { y6 = _BIT(y,6); x5 = pipebit2 ^ y6; *pY = Bits2Number(2, y4, y3); *pX = Bits2Number(3, x5, x4, x3); } break; default: ADDR_UNHANDLED_CASE(); } } /** **************************************************************************************************** * SiLib::TileCoordToMaskElementIndex * * @brief * Compute element index from coordinates in tiles * @return * Element index **************************************************************************************************** */ UINT_32 SiLib::TileCoordToMaskElementIndex( UINT_32 tx, ///< [in] x coord, in Tiles UINT_32 ty, ///< [in] y coord, in Tiles AddrPipeCfg pipeConfig, ///< [in] pipe config UINT_32* macroShift, ///< [out] macro shift UINT_32* elemIdxBits ///< [out] tile offset bits ) const { UINT_32 elemIdx = 0; UINT_32 elemIdx0, elemIdx1, elemIdx2; UINT_32 tx0, tx1; UINT_32 ty0, ty1; tx0 = _BIT(tx,0); tx1 = _BIT(tx,1); ty0 = _BIT(ty,0); ty1 = _BIT(ty,1); switch(pipeConfig) { case ADDR_PIPECFG_P2: *macroShift = 3; *elemIdxBits =3; elemIdx2 = tx1; elemIdx1 = tx1 ^ ty1; elemIdx0 = tx1 ^ ty0; elemIdx = Bits2Number(3,elemIdx2,elemIdx1,elemIdx0); break; case ADDR_PIPECFG_P4_8x16: *macroShift = 2; *elemIdxBits =2; elemIdx1 = tx1; elemIdx0 = tx1 ^ ty1; elemIdx = Bits2Number(2,elemIdx1,elemIdx0); break; case ADDR_PIPECFG_P4_16x16: *macroShift = 2; *elemIdxBits =2; elemIdx0 = tx1^ty0; elemIdx1 = tx1; elemIdx = Bits2Number(2, elemIdx1, elemIdx0); break; case ADDR_PIPECFG_P4_16x32: *macroShift = 2; *elemIdxBits =2; elemIdx0 = tx1^ty0; elemIdx1 = tx1^ty1; elemIdx = Bits2Number(2, elemIdx1, elemIdx0); break; case ADDR_PIPECFG_P4_32x32: *macroShift = 2; *elemIdxBits =3; elemIdx0 = tx1^ty0; elemIdx1 = tx1^ty1; elemIdx2 = tx1; elemIdx = Bits2Number(3, elemIdx2, elemIdx1, elemIdx0); break; case ADDR_PIPECFG_P8_16x16_8x16: *macroShift = 1; *elemIdxBits =1; elemIdx0 = tx1; elemIdx = elemIdx0; break; case ADDR_PIPECFG_P8_16x32_8x16: *macroShift = 1; *elemIdxBits =1; elemIdx0 = tx0; elemIdx = elemIdx0; break; case ADDR_PIPECFG_P8_32x32_8x16: *macroShift = 1; *elemIdxBits =2; elemIdx1 = tx1; elemIdx0 = tx1^ty1; elemIdx = Bits2Number(2, elemIdx1, elemIdx0); break; case ADDR_PIPECFG_P8_16x32_16x16: *macroShift = 1; *elemIdxBits =1; elemIdx0 = tx0; elemIdx = elemIdx0; break; case ADDR_PIPECFG_P8_32x32_16x16: *macroShift = 1; *elemIdxBits =2; elemIdx0 = tx1^ty0; elemIdx1 = tx1; elemIdx = Bits2Number(2, elemIdx1, elemIdx0); break; case ADDR_PIPECFG_P8_32x32_16x32: *macroShift = 1; *elemIdxBits =2; elemIdx0 = tx1^ty0; elemIdx1 = tx1^ty1; elemIdx = Bits2Number(2, elemIdx1, elemIdx0); break; case ADDR_PIPECFG_P8_32x64_32x32: *macroShift = 1; *elemIdxBits =3; elemIdx0 = tx1^ty0; elemIdx1 = tx1^ty1; elemIdx2 = tx1; elemIdx = Bits2Number(3, elemIdx2, elemIdx1, elemIdx0); break; case ADDR_PIPECFG_P16_32x32_8x16: *macroShift = 0; *elemIdxBits =2; elemIdx0 = tx1^ty1; elemIdx1 = tx1; elemIdx = Bits2Number(2, elemIdx1, elemIdx0); break; case ADDR_PIPECFG_P16_32x32_16x16: *macroShift = 0; *elemIdxBits =2; elemIdx0 = tx1^ty0; elemIdx1 = tx1; elemIdx = Bits2Number(2, elemIdx1, elemIdx0); break; default: ADDR_UNHANDLED_CASE(); break; } return elemIdx; } /** **************************************************************************************************** * SiLib::HwlComputeTileDataWidthAndHeightLinear * * @brief * Compute the squared cache shape for per-tile data (CMASK and HTILE) for linear layout * * @return * N/A * * @note * MacroWidth and macroHeight are measured in pixels **************************************************************************************************** */ VOID SiLib::HwlComputeTileDataWidthAndHeightLinear( UINT_32* pMacroWidth, ///< [out] macro tile width UINT_32* pMacroHeight, ///< [out] macro tile height UINT_32 bpp, ///< [in] bits per pixel ADDR_TILEINFO* pTileInfo ///< [in] tile info ) const { ADDR_ASSERT(pTileInfo != NULL); UINT_32 macroWidth; UINT_32 macroHeight; /// In linear mode, the htile or cmask buffer must be padded out to 4 tiles /// but for P8_32x64_32x32, it must be padded out to 8 tiles /// Actually there are more pipe configs which need 8-tile padding but SI family /// has a bug which is fixed in CI family if ((pTileInfo->pipeConfig == ADDR_PIPECFG_P8_32x64_32x32) || (pTileInfo->pipeConfig == ADDR_PIPECFG_P16_32x32_8x16) || (pTileInfo->pipeConfig == ADDR_PIPECFG_P8_32x32_16x16)) { macroWidth = 8*MicroTileWidth; macroHeight = 8*MicroTileHeight; } else { macroWidth = 4*MicroTileWidth; macroHeight = 4*MicroTileHeight; } *pMacroWidth = macroWidth; *pMacroHeight = macroHeight; } /** **************************************************************************************************** * SiLib::HwlComputeHtileBytes * * @brief * Compute htile size in bytes * * @return * Htile size in bytes **************************************************************************************************** */ UINT_64 SiLib::HwlComputeHtileBytes( UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 bpp, ///< [in] bits per pixel BOOL_32 isLinear, ///< [in] if it is linear mode UINT_32 numSlices, ///< [in] number of slices UINT_64* pSliceBytes, ///< [out] bytes per slice UINT_32 baseAlign ///< [in] base alignments ) const { return ComputeHtileBytes(pitch, height, bpp, isLinear, numSlices, pSliceBytes, baseAlign); } /** **************************************************************************************************** * SiLib::HwlComputeXmaskAddrFromCoord * * @brief * Compute address from coordinates for htile/cmask * @return * Byte address **************************************************************************************************** */ UINT_64 SiLib::HwlComputeXmaskAddrFromCoord( UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 x, ///< [in] x coord UINT_32 y, ///< [in] y coord UINT_32 slice, ///< [in] slice/depth index UINT_32 numSlices, ///< [in] number of slices UINT_32 factor, ///< [in] factor that indicates cmask(2) or htile(1) BOOL_32 isLinear, ///< [in] linear or tiled HTILE layout BOOL_32 isWidth8, ///< [in] TRUE if width is 8, FALSE means 4. It's register value BOOL_32 isHeight8, ///< [in] TRUE if width is 8, FALSE means 4. It's register value ADDR_TILEINFO* pTileInfo, ///< [in] Tile info UINT_32* pBitPosition ///< [out] bit position inside a byte ) const { UINT_32 tx = x / MicroTileWidth; UINT_32 ty = y / MicroTileHeight; UINT_32 newPitch; UINT_32 newHeight; UINT_64 totalBytes; UINT_32 macroWidth; UINT_32 macroHeight; UINT_64 pSliceBytes; UINT_32 pBaseAlign; UINT_32 tileNumPerPipe; UINT_32 elemBits; if (factor == 2) //CMASK { ADDR_CMASK_FLAGS flags = {{0}}; tileNumPerPipe = 256; ComputeCmaskInfo(flags, pitch, height, numSlices, isLinear, pTileInfo, &newPitch, &newHeight, &totalBytes, ¯oWidth, ¯oHeight); elemBits = CmaskElemBits; } else //HTile { ADDR_HTILE_FLAGS flags = {{0}}; tileNumPerPipe = 512; ComputeHtileInfo(flags, pitch, height, numSlices, isLinear, TRUE, TRUE, pTileInfo, &newPitch, &newHeight, &totalBytes, ¯oWidth, ¯oHeight, &pSliceBytes, &pBaseAlign); elemBits = 32; } const UINT_32 pitchInTile = newPitch / MicroTileWidth; const UINT_32 heightInTile = newHeight / MicroTileWidth; UINT_64 macroOffset; // Per pipe starting offset of the macro tile in which this tile lies. UINT_64 microNumber; // Per pipe starting offset of the macro tile in which this tile lies. UINT_32 microX; UINT_32 microY; UINT_64 microOffset; UINT_32 microShift; UINT_64 totalOffset; UINT_32 elemIdxBits; UINT_32 elemIdx = TileCoordToMaskElementIndex(tx, ty, pTileInfo->pipeConfig, µShift, &elemIdxBits); UINT_32 numPipes = HwlGetPipes(pTileInfo); if (isLinear) { //linear addressing // Linear addressing is extremelly wasting memory if slice > 1, since each pipe has the full // slice memory foot print instead of divided by numPipes. microX = tx / 4; // Macro Tile is 4x4 microY = ty / 4 ; microNumber = static_cast(microX + microY * (pitchInTile / 4)) << microShift; UINT_32 sliceBits = pitchInTile * heightInTile; // do htile single slice alignment if the flag is true if (m_configFlags.useHtileSliceAlign && (factor == 1)) //Htile { sliceBits = PowTwoAlign(sliceBits, BITS_TO_BYTES(HtileCacheBits) * numPipes / elemBits); } macroOffset = slice * (sliceBits / numPipes) * elemBits ; } else { //tiled addressing const UINT_32 macroWidthInTile = macroWidth / MicroTileWidth; // Now in unit of Tiles const UINT_32 macroHeightInTile = macroHeight / MicroTileHeight; const UINT_32 pitchInCL = pitchInTile / macroWidthInTile; const UINT_32 heightInCL = heightInTile / macroHeightInTile; const UINT_32 macroX = x / macroWidth; const UINT_32 macroY = y / macroHeight; const UINT_32 macroNumber = macroX + macroY * pitchInCL + slice * pitchInCL * heightInCL; // Per pipe starting offset of the cache line in which this tile lies. microX = (x % macroWidth) / MicroTileWidth / 4; // Macro Tile is 4x4 microY = (y % macroHeight) / MicroTileHeight / 4 ; microNumber = static_cast(microX + microY * (macroWidth / MicroTileWidth / 4)) << microShift; macroOffset = macroNumber * tileNumPerPipe * elemBits; } if(elemIdxBits == microShift) { microNumber += elemIdx; } else { microNumber >>= elemIdxBits; microNumber <<= elemIdxBits; microNumber += elemIdx; } microOffset = elemBits * microNumber; totalOffset = microOffset + macroOffset; UINT_32 pipe = ComputePipeFromCoord(x, y, 0, ADDR_TM_2D_TILED_THIN1, 0, FALSE, pTileInfo); UINT_64 addrInBits = totalOffset % (m_pipeInterleaveBytes * 8) + pipe * (m_pipeInterleaveBytes * 8) + totalOffset / (m_pipeInterleaveBytes * 8) * (m_pipeInterleaveBytes * 8) * numPipes; *pBitPosition = static_cast(addrInBits) % 8; UINT_64 addr = addrInBits / 8; return addr; } /** **************************************************************************************************** * SiLib::HwlComputeXmaskCoordFromAddr * * @brief * Compute the coord from an address of a cmask/htile * * @return * N/A * * @note * This method is reused by htile, so rename to Xmask **************************************************************************************************** */ VOID SiLib::HwlComputeXmaskCoordFromAddr( UINT_64 addr, ///< [in] address UINT_32 bitPosition, ///< [in] bitPosition in a byte UINT_32 pitch, ///< [in] pitch UINT_32 height, ///< [in] height UINT_32 numSlices, ///< [in] number of slices UINT_32 factor, ///< [in] factor that indicates cmask or htile BOOL_32 isLinear, ///< [in] linear or tiled HTILE layout BOOL_32 isWidth8, ///< [in] Not used by SI BOOL_32 isHeight8, ///< [in] Not used by SI ADDR_TILEINFO* pTileInfo, ///< [in] Tile info UINT_32* pX, ///< [out] x coord UINT_32* pY, ///< [out] y coord UINT_32* pSlice ///< [out] slice index ) const { UINT_32 newPitch; UINT_32 newHeight; UINT_64 totalBytes; UINT_32 clWidth; UINT_32 clHeight; UINT_32 tileNumPerPipe; UINT_64 sliceBytes; *pX = 0; *pY = 0; *pSlice = 0; if (factor == 2) //CMASK { ADDR_CMASK_FLAGS flags = {{0}}; tileNumPerPipe = 256; ComputeCmaskInfo(flags, pitch, height, numSlices, isLinear, pTileInfo, &newPitch, &newHeight, &totalBytes, &clWidth, &clHeight); } else //HTile { ADDR_HTILE_FLAGS flags = {{0}}; tileNumPerPipe = 512; ComputeHtileInfo(flags, pitch, height, numSlices, isLinear, TRUE, TRUE, pTileInfo, &newPitch, &newHeight, &totalBytes, &clWidth, &clHeight, &sliceBytes); } const UINT_32 pitchInTile = newPitch / MicroTileWidth; const UINT_32 heightInTile = newHeight / MicroTileWidth; const UINT_32 pitchInMacroTile = pitchInTile / 4; UINT_32 macroShift; UINT_32 elemIdxBits; // get macroShift and elemIdxBits TileCoordToMaskElementIndex(0, 0, pTileInfo->pipeConfig, ¯oShift, &elemIdxBits); const UINT_32 numPipes = HwlGetPipes(pTileInfo); const UINT_32 pipe = (UINT_32)((addr / m_pipeInterleaveBytes) % numPipes); // per pipe UINT_64 localOffset = (addr % m_pipeInterleaveBytes) + (addr / m_pipeInterleaveBytes / numPipes)* m_pipeInterleaveBytes; UINT_32 tileIndex; if (factor == 2) //CMASK { tileIndex = (UINT_32)(localOffset * 2 + (bitPosition != 0)); } else { tileIndex = (UINT_32)(localOffset / 4); } UINT_32 macroOffset; if (isLinear) { UINT_32 sliceSizeInTile = pitchInTile * heightInTile; // do htile single slice alignment if the flag is true if (m_configFlags.useHtileSliceAlign && (factor == 1)) //Htile { sliceSizeInTile = PowTwoAlign(sliceSizeInTile, static_cast(sliceBytes) / 64); } *pSlice = tileIndex / (sliceSizeInTile / numPipes); macroOffset = tileIndex % (sliceSizeInTile / numPipes); } else { const UINT_32 clWidthInTile = clWidth / MicroTileWidth; // Now in unit of Tiles const UINT_32 clHeightInTile = clHeight / MicroTileHeight; const UINT_32 pitchInCL = pitchInTile / clWidthInTile; const UINT_32 heightInCL = heightInTile / clHeightInTile; const UINT_32 clIndex = tileIndex / tileNumPerPipe; UINT_32 clX = clIndex % pitchInCL; UINT_32 clY = (clIndex % (heightInCL * pitchInCL)) / pitchInCL; *pX = clX * clWidthInTile * MicroTileWidth; *pY = clY * clHeightInTile * MicroTileHeight; *pSlice = clIndex / (heightInCL * pitchInCL); macroOffset = tileIndex % tileNumPerPipe; } UINT_32 elemIdx = macroOffset & 7; macroOffset >>= elemIdxBits; if (elemIdxBits != macroShift) { macroOffset <<= (elemIdxBits - macroShift); UINT_32 pipebit1 = _BIT(pipe,1); UINT_32 pipebit2 = _BIT(pipe,2); UINT_32 pipebit3 = _BIT(pipe,3); if (pitchInMacroTile % 2) { //odd switch (pTileInfo->pipeConfig) { case ADDR_PIPECFG_P4_32x32: macroOffset |= pipebit1; break; case ADDR_PIPECFG_P8_32x32_8x16: case ADDR_PIPECFG_P8_32x32_16x16: case ADDR_PIPECFG_P8_32x32_16x32: macroOffset |= pipebit2; break; default: break; } } if (pitchInMacroTile % 4) { if (pTileInfo->pipeConfig == ADDR_PIPECFG_P8_32x64_32x32) { macroOffset |= (pipebit1<<1); } if ((pTileInfo->pipeConfig == ADDR_PIPECFG_P16_32x32_8x16) || (pTileInfo->pipeConfig == ADDR_PIPECFG_P16_32x32_16x16) ) { macroOffset |= (pipebit3<<1); } } } UINT_32 macroX; UINT_32 macroY; if (isLinear) { macroX = macroOffset % pitchInMacroTile; macroY = macroOffset / pitchInMacroTile; } else { const UINT_32 clWidthInMacroTile = clWidth / (MicroTileWidth * 4); macroX = macroOffset % clWidthInMacroTile; macroY = macroOffset / clWidthInMacroTile; } *pX += macroX * 4 * MicroTileWidth; *pY += macroY * 4 * MicroTileHeight; UINT_32 microX; UINT_32 microY; ComputeTileCoordFromPipeAndElemIdx(elemIdx, pipe, pTileInfo->pipeConfig, pitchInMacroTile, *pX, *pY, µX, µY); *pX += microX * MicroTileWidth; *pY += microY * MicroTileWidth; } /** **************************************************************************************************** * SiLib::HwlGetPitchAlignmentLinear * @brief * Get pitch alignment * @return * pitch alignment **************************************************************************************************** */ UINT_32 SiLib::HwlGetPitchAlignmentLinear( UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags ///< [in] surface flags ) const { UINT_32 pitchAlign; // Interleaved access requires a 256B aligned pitch, so fall back to pre-SI alignment if (flags.interleaved) { pitchAlign = Max(64u, m_pipeInterleaveBytes / BITS_TO_BYTES(bpp)); } else { pitchAlign = Max(8u, 64 / BITS_TO_BYTES(bpp)); } return pitchAlign; } /** **************************************************************************************************** * SiLib::HwlGetSizeAdjustmentLinear * * @brief * Adjust linear surface pitch and slice size * * @return * Logical slice size in bytes **************************************************************************************************** */ UINT_64 SiLib::HwlGetSizeAdjustmentLinear( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel UINT_32 numSamples, ///< [in] number of samples UINT_32 baseAlign, ///< [in] base alignment UINT_32 pitchAlign, ///< [in] pitch alignment UINT_32* pPitch, ///< [in,out] pointer to pitch UINT_32* pHeight, ///< [in,out] pointer to height UINT_32* pHeightAlign ///< [in,out] pointer to height align ) const { UINT_64 sliceSize; if (tileMode == ADDR_TM_LINEAR_GENERAL) { sliceSize = BITS_TO_BYTES(static_cast(*pPitch) * (*pHeight) * bpp * numSamples); } else { UINT_32 pitch = *pPitch; UINT_32 height = *pHeight; UINT_32 pixelsPerPipeInterleave = m_pipeInterleaveBytes / BITS_TO_BYTES(bpp); UINT_32 sliceAlignInPixel = pixelsPerPipeInterleave < 64 ? 64 : pixelsPerPipeInterleave; // numSamples should be 1 in real cases (no MSAA for linear but TGL may pass non 1 value) UINT_64 pixelPerSlice = static_cast(pitch) * height * numSamples; while (pixelPerSlice % sliceAlignInPixel) { pitch += pitchAlign; pixelPerSlice = static_cast(pitch) * height * numSamples; } *pPitch = pitch; UINT_32 heightAlign = 1; while ((pitch * heightAlign) % sliceAlignInPixel) { heightAlign++; } *pHeightAlign = heightAlign; sliceSize = BITS_TO_BYTES(pixelPerSlice * bpp); } return sliceSize; } /** **************************************************************************************************** * SiLib::HwlPreHandleBaseLvl3xPitch * * @brief * Pre-handler of 3x pitch (96 bit) adjustment * * @return * Expected pitch **************************************************************************************************** */ UINT_32 SiLib::HwlPreHandleBaseLvl3xPitch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input UINT_32 expPitch ///< [in] pitch ) const { ADDR_ASSERT(pIn->width == expPitch); // From SI, if pow2Pad is 1 the pitch is expanded 3x first, then padded to pow2, so nothing to // do here if (pIn->flags.pow2Pad == FALSE) { Addr::V1::Lib::HwlPreHandleBaseLvl3xPitch(pIn, expPitch); } else { ADDR_ASSERT(IsPow2(expPitch)); } return expPitch; } /** **************************************************************************************************** * SiLib::HwlPostHandleBaseLvl3xPitch * * @brief * Post-handler of 3x pitch adjustment * * @return * Expected pitch **************************************************************************************************** */ UINT_32 SiLib::HwlPostHandleBaseLvl3xPitch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input UINT_32 expPitch ///< [in] pitch ) const { /** * @note The pitch will be divided by 3 in the end so the value will look odd but h/w should * be able to compute a correct pitch from it as h/w address library is doing the job. */ // From SI, the pitch is expanded 3x first, then padded to pow2, so no special handler here if (pIn->flags.pow2Pad == FALSE) { Addr::V1::Lib::HwlPostHandleBaseLvl3xPitch(pIn, expPitch); } return expPitch; } /** **************************************************************************************************** * SiLib::HwlGetPitchAlignmentMicroTiled * * @brief * Compute 1D tiled surface pitch alignment * * @return * pitch alignment **************************************************************************************************** */ UINT_32 SiLib::HwlGetPitchAlignmentMicroTiled( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 numSamples ///< [in] number of samples ) const { UINT_32 pitchAlign; if (flags.qbStereo) { pitchAlign = EgBasedLib::HwlGetPitchAlignmentMicroTiled(tileMode,bpp,flags,numSamples); } else { pitchAlign = 8; } return pitchAlign; } /** **************************************************************************************************** * SiLib::HwlGetSizeAdjustmentMicroTiled * * @brief * Adjust 1D tiled surface pitch and slice size * * @return * Logical slice size in bytes **************************************************************************************************** */ UINT_64 SiLib::HwlGetSizeAdjustmentMicroTiled( UINT_32 thickness, ///< [in] thickness UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 numSamples, ///< [in] number of samples UINT_32 baseAlign, ///< [in] base alignment UINT_32 pitchAlign, ///< [in] pitch alignment UINT_32* pPitch, ///< [in,out] pointer to pitch UINT_32* pHeight ///< [in,out] pointer to height ) const { UINT_64 logicalSliceSize; UINT_64 physicalSliceSize; UINT_32 pitch = *pPitch; UINT_32 height = *pHeight; // Logical slice: pitch * height * bpp * numSamples (no 1D MSAA so actually numSamples == 1) logicalSliceSize = BITS_TO_BYTES(static_cast(pitch) * height * bpp * numSamples); // Physical slice: multiplied by thickness physicalSliceSize = logicalSliceSize * thickness; // Pitch alignment is always 8, so if slice size is not padded to base alignment // (pipe_interleave_size), we need to increase pitch while ((physicalSliceSize % baseAlign) != 0) { pitch += pitchAlign; logicalSliceSize = BITS_TO_BYTES(static_cast(pitch) * height * bpp * numSamples); physicalSliceSize = logicalSliceSize * thickness; } #if !ALT_TEST // // Special workaround for depth/stencil buffer, use 8 bpp to align depth buffer again since // the stencil plane may have larger pitch if the slice size is smaller than base alignment. // // Note: this actually does not work for mipmap but mipmap depth texture is not really // sampled with mipmap. // if (flags.depth && (flags.noStencil == FALSE)) { ADDR_ASSERT(numSamples == 1); UINT_64 logicalSiceSizeStencil = static_cast(pitch) * height; // 1 byte stencil while ((logicalSiceSizeStencil % baseAlign) != 0) { pitch += pitchAlign; // Stencil plane's pitch alignment is the same as depth plane's logicalSiceSizeStencil = static_cast(pitch) * height; } if (pitch != *pPitch) { // If this is a mipmap, this padded one cannot be sampled as a whole mipmap! logicalSliceSize = logicalSiceSizeStencil * BITS_TO_BYTES(bpp); } } #endif *pPitch = pitch; // No adjust for pHeight return logicalSliceSize; } /** **************************************************************************************************** * SiLib::HwlConvertChipFamily * * @brief * Convert familyID defined in atiid.h to ChipFamily and set m_chipFamily/m_chipRevision * @return * ChipFamily **************************************************************************************************** */ ChipFamily SiLib::HwlConvertChipFamily( UINT_32 uChipFamily, ///< [in] chip family defined in atiih.h UINT_32 uChipRevision) ///< [in] chip revision defined in "asic_family"_id.h { ChipFamily family = ADDR_CHIP_FAMILY_SI; switch (uChipFamily) { case FAMILY_SI: m_settings.isSouthernIsland = 1; m_settings.isTahiti = ASICREV_IS_TAHITI_P(uChipRevision); m_settings.isPitCairn = ASICREV_IS_PITCAIRN_PM(uChipRevision); m_settings.isCapeVerde = ASICREV_IS_CAPEVERDE_M(uChipRevision); m_settings.isOland = ASICREV_IS_OLAND_M(uChipRevision); m_settings.isHainan = ASICREV_IS_HAINAN_V(uChipRevision); break; default: ADDR_ASSERT(!"This should be a Fusion"); break; } return family; } /** **************************************************************************************************** * SiLib::HwlSetupTileInfo * * @brief * Setup default value of tile info for SI **************************************************************************************************** */ VOID SiLib::HwlSetupTileInfo( AddrTileMode tileMode, ///< [in] Tile mode ADDR_SURFACE_FLAGS flags, ///< [in] Surface type flags UINT_32 bpp, ///< [in] Bits per pixel UINT_32 pitch, ///< [in] Pitch in pixels UINT_32 height, ///< [in] Height in pixels UINT_32 numSamples, ///< [in] Number of samples ADDR_TILEINFO* pTileInfoIn, ///< [in] Tile info input: NULL for default ADDR_TILEINFO* pTileInfoOut, ///< [out] Tile info output AddrTileType inTileType, ///< [in] Tile type ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] Output ) const { UINT_32 thickness = Thickness(tileMode); ADDR_TILEINFO* pTileInfo = pTileInfoOut; INT index = TileIndexInvalid; // Fail-safe code if (IsLinear(tileMode) == FALSE) { // 128 bpp/thick tiling must be non-displayable. // Fmask reuse color buffer's entry but bank-height field can be from another entry // To simplify the logic, fmask entry should be picked from non-displayable ones if (bpp == 128 || thickness > 1 || flags.fmask || flags.prt) { inTileType = ADDR_NON_DISPLAYABLE; } if (flags.depth || flags.stencil) { inTileType = ADDR_DEPTH_SAMPLE_ORDER; } } // Partial valid fields are not allowed for SI. if (IsTileInfoAllZero(pTileInfo)) { if (IsMacroTiled(tileMode)) { if (flags.prt) { if (numSamples == 1) { if (flags.depth) { switch (bpp) { case 16: index = 3; break; case 32: index = 6; break; default: ADDR_ASSERT_ALWAYS(); break; } } else { switch (bpp) { case 8: index = 21; break; case 16: index = 22; break; case 32: index = 23; break; case 64: index = 24; break; case 128: index = 25; break; default: break; } if (thickness > 1) { ADDR_ASSERT(bpp != 128); index += 5; } } } else { ADDR_ASSERT(numSamples == 4); if (flags.depth) { switch (bpp) { case 16: index = 5; break; case 32: index = 7; break; default: ADDR_ASSERT_ALWAYS(); break; } } else { switch (bpp) { case 8: index = 23; break; case 16: index = 24; break; case 32: index = 25; break; case 64: index = 30; break; default: ADDR_ASSERT_ALWAYS(); break; } } } }//end of PRT part // See table entries 0-7 else if (flags.depth || flags.stencil) { if (flags.compressZ) { if (flags.stencil) { index = 0; } else { // optimal tile index for compressed depth/stencil. switch (numSamples) { case 1: index = 0; break; case 2: case 4: index = 1; break; case 8: index = 2; break; default: break; } } } else // unCompressZ { index = 3; } } else //non PRT & non Depth & non Stencil { // See table entries 9-12 if (inTileType == ADDR_DISPLAYABLE) { switch (bpp) { case 8: index = 10; break; case 16: index = 11; break; case 32: index = 12; break; case 64: index = 12; break; default: break; } } else { // See table entries 13-17 if (thickness == 1) { if (flags.fmask) { UINT_32 fmaskPixelSize = bpp * numSamples; switch (fmaskPixelSize) { case 8: index = 14; break; case 16: index = 15; break; case 32: index = 16; break; case 64: index = 17; break; default: ADDR_ASSERT_ALWAYS(); } } else { switch (bpp) { case 8: index = 14; break; case 16: index = 15; break; case 32: index = 16; break; case 64: index = 17; break; case 128: index = 17; break; default: break; } } } else // thick tiling - entries 18-20 { switch (thickness) { case 4: index = 20; break; case 8: index = 19; break; default: break; } } } } } else { if (tileMode == ADDR_TM_LINEAR_ALIGNED) { index = 8; } else if (tileMode == ADDR_TM_LINEAR_GENERAL) { index = TileIndexLinearGeneral; } else { if (flags.depth || flags.stencil) { index = 4; } else if (inTileType == ADDR_DISPLAYABLE) { index = 9; } else if (thickness == 1) { index = 13; } else { index = 18; } } } if (index >= 0 && index <= 31) { *pTileInfo = m_tileTable[index].info; pOut->tileType = m_tileTable[index].type; } if (index == TileIndexLinearGeneral) { *pTileInfo = m_tileTable[8].info; pOut->tileType = m_tileTable[8].type; } } else { if (pTileInfoIn) { if (flags.stencil && pTileInfoIn->tileSplitBytes == 0) { // Stencil always uses index 0 *pTileInfo = m_tileTable[0].info; } } // Pass through tile type pOut->tileType = inTileType; } pOut->tileIndex = index; pOut->prtTileIndex = flags.prt; } /** **************************************************************************************************** * SiLib::DecodeGbRegs * * @brief * Decodes GB_ADDR_CONFIG and noOfBanks/noOfRanks * * @return * TRUE if all settings are valid * **************************************************************************************************** */ BOOL_32 SiLib::DecodeGbRegs( const ADDR_REGISTER_VALUE* pRegValue) ///< [in] create input { GB_ADDR_CONFIG reg; BOOL_32 valid = TRUE; reg.val = pRegValue->gbAddrConfig; switch (reg.f.pipe_interleave_size) { case ADDR_CONFIG_PIPE_INTERLEAVE_256B: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_256B; break; case ADDR_CONFIG_PIPE_INTERLEAVE_512B: m_pipeInterleaveBytes = ADDR_PIPEINTERLEAVE_512B; break; default: valid = FALSE; ADDR_UNHANDLED_CASE(); break; } switch (reg.f.row_size) { case ADDR_CONFIG_1KB_ROW: m_rowSize = ADDR_ROWSIZE_1KB; break; case ADDR_CONFIG_2KB_ROW: m_rowSize = ADDR_ROWSIZE_2KB; break; case ADDR_CONFIG_4KB_ROW: m_rowSize = ADDR_ROWSIZE_4KB; break; default: valid = FALSE; ADDR_UNHANDLED_CASE(); break; } switch (pRegValue->noOfBanks) { case 0: m_banks = 4; break; case 1: m_banks = 8; break; case 2: m_banks = 16; break; default: valid = FALSE; ADDR_UNHANDLED_CASE(); break; } switch (pRegValue->noOfRanks) { case 0: m_ranks = 1; break; case 1: m_ranks = 2; break; default: valid = FALSE; ADDR_UNHANDLED_CASE(); break; } m_logicalBanks = m_banks * m_ranks; ADDR_ASSERT(m_logicalBanks <= 16); return valid; } /** **************************************************************************************************** * SiLib::HwlInitGlobalParams * * @brief * Initializes global parameters * * @return * TRUE if all settings are valid * **************************************************************************************************** */ BOOL_32 SiLib::HwlInitGlobalParams( const ADDR_CREATE_INPUT* pCreateIn) ///< [in] create input { BOOL_32 valid = TRUE; const ADDR_REGISTER_VALUE* pRegValue = &pCreateIn->regValue; valid = DecodeGbRegs(pRegValue); if (valid) { if (m_settings.isTahiti || m_settings.isPitCairn) { m_pipes = 8; } else if (m_settings.isCapeVerde || m_settings.isOland) { m_pipes = 4; } else { // Hainan is 2-pipe (m_settings.isHainan == 1) m_pipes = 2; } valid = InitTileSettingTable(pRegValue->pTileConfig, pRegValue->noOfEntries); if (valid) { InitEquationTable(); } m_maxSamples = 16; } return valid; } /** **************************************************************************************************** * SiLib::HwlConvertTileInfoToHW * @brief * Entry of si's ConvertTileInfoToHW * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE SiLib::HwlConvertTileInfoToHW( const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ///< [in] input structure ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut ///< [out] output structure ) const { ADDR_E_RETURNCODE retCode = ADDR_OK; retCode = EgBasedLib::HwlConvertTileInfoToHW(pIn, pOut); if (retCode == ADDR_OK) { if (pIn->reverse == FALSE) { if (pIn->pTileInfo->pipeConfig == ADDR_PIPECFG_INVALID) { retCode = ADDR_INVALIDPARAMS; } else { pOut->pTileInfo->pipeConfig = static_cast(pIn->pTileInfo->pipeConfig - 1); } } else { pOut->pTileInfo->pipeConfig = static_cast(pIn->pTileInfo->pipeConfig + 1); } } return retCode; } /** **************************************************************************************************** * SiLib::HwlComputeXmaskCoordYFrom8Pipe * * @brief * Compute the Y coord which will be added to Xmask Y * coord. * @return * Y coord **************************************************************************************************** */ UINT_32 SiLib::HwlComputeXmaskCoordYFrom8Pipe( UINT_32 pipe, ///< [in] pipe id UINT_32 x ///< [in] tile coord x, which is original x coord / 8 ) const { // This function should never be called since it is 6xx/8xx specfic. // Keep this empty implementation to avoid any mis-use. ADDR_ASSERT_ALWAYS(); return 0; } /** **************************************************************************************************** * SiLib::HwlComputeSurfaceCoord2DFromBankPipe * * @brief * Compute surface x,y coordinates from bank/pipe info * @return * N/A **************************************************************************************************** */ VOID SiLib::HwlComputeSurfaceCoord2DFromBankPipe( AddrTileMode tileMode, ///< [in] tile mode UINT_32* pX, ///< [in,out] x coordinate UINT_32* pY, ///< [in,out] y coordinate UINT_32 slice, ///< [in] slice index UINT_32 bank, ///< [in] bank number UINT_32 pipe, ///< [in] pipe number UINT_32 bankSwizzle,///< [in] bank swizzle UINT_32 pipeSwizzle,///< [in] pipe swizzle UINT_32 tileSlices, ///< [in] slices in a micro tile BOOL_32 ignoreSE, ///< [in] TRUE if shader engines are ignored ADDR_TILEINFO* pTileInfo ///< [in] bank structure. **All fields to be valid on entry** ) const { UINT_32 xBit; UINT_32 yBit; UINT_32 yBit3 = 0; UINT_32 yBit4 = 0; UINT_32 yBit5 = 0; UINT_32 yBit6 = 0; UINT_32 xBit3 = 0; UINT_32 xBit4 = 0; UINT_32 xBit5 = 0; UINT_32 numPipes = GetPipePerSurf(pTileInfo->pipeConfig); CoordFromBankPipe xyBits = {0}; ComputeSurfaceCoord2DFromBankPipe(tileMode, *pX, *pY, slice, bank, pipe, bankSwizzle, pipeSwizzle, tileSlices, pTileInfo, &xyBits); yBit3 = xyBits.yBit3; yBit4 = xyBits.yBit4; yBit5 = xyBits.yBit5; yBit6 = xyBits.yBit6; xBit3 = xyBits.xBit3; xBit4 = xyBits.xBit4; xBit5 = xyBits.xBit5; yBit = xyBits.yBits; UINT_32 yBitTemp = 0; if ((pTileInfo->pipeConfig == ADDR_PIPECFG_P4_32x32) || (pTileInfo->pipeConfig == ADDR_PIPECFG_P8_32x64_32x32)) { ADDR_ASSERT(pTileInfo->bankWidth == 1 && pTileInfo->macroAspectRatio > 1); UINT_32 yBitToCheck = QLog2(pTileInfo->banks) - 1; ADDR_ASSERT(yBitToCheck <= 3); yBitTemp = _BIT(yBit, yBitToCheck); xBit3 = 0; } yBit = Bits2Number(4, yBit6, yBit5, yBit4, yBit3); xBit = Bits2Number(3, xBit5, xBit4, xBit3); *pY += yBit * pTileInfo->bankHeight * MicroTileHeight; *pX += xBit * numPipes * pTileInfo->bankWidth * MicroTileWidth; //calculate the bank and pipe bits in x, y UINT_32 xTile; //x in micro tile UINT_32 x3 = 0; UINT_32 x4 = 0; UINT_32 x5 = 0; UINT_32 x6 = 0; UINT_32 y = *pY; UINT_32 pipeBit0 = _BIT(pipe,0); UINT_32 pipeBit1 = _BIT(pipe,1); UINT_32 pipeBit2 = _BIT(pipe,2); UINT_32 y3 = _BIT(y, 3); UINT_32 y4 = _BIT(y, 4); UINT_32 y5 = _BIT(y, 5); UINT_32 y6 = _BIT(y, 6); // bankbit0 after ^x4^x5 UINT_32 bankBit00 = _BIT(bank,0); UINT_32 bankBit0 = 0; switch (pTileInfo->pipeConfig) { case ADDR_PIPECFG_P2: x3 = pipeBit0 ^ y3; break; case ADDR_PIPECFG_P4_8x16: x4 = pipeBit0 ^ y3; x3 = pipeBit0 ^ y4; break; case ADDR_PIPECFG_P4_16x16: x4 = pipeBit1 ^ y4; x3 = pipeBit0 ^ y3 ^ x4; break; case ADDR_PIPECFG_P4_16x32: x4 = pipeBit1 ^ y4; x3 = pipeBit0 ^ y3 ^ x4; break; case ADDR_PIPECFG_P4_32x32: x5 = pipeBit1 ^ y5; x3 = pipeBit0 ^ y3 ^ x5; bankBit0 = yBitTemp ^ x5; x4 = bankBit00 ^ x5 ^ bankBit0; *pX += x5 * 4 * 1 * 8; // x5 * num_pipes * bank_width * 8; break; case ADDR_PIPECFG_P8_16x16_8x16: x3 = pipeBit1 ^ y5; x4 = pipeBit2 ^ y4; x5 = pipeBit0 ^ y3 ^ x4; break; case ADDR_PIPECFG_P8_16x32_8x16: x3 = pipeBit1 ^ y4; x4 = pipeBit2 ^ y5; x5 = pipeBit0 ^ y3 ^ x4; break; case ADDR_PIPECFG_P8_32x32_8x16: x3 = pipeBit1 ^ y4; x5 = pipeBit2 ^ y5; x4 = pipeBit0 ^ y3 ^ x5; break; case ADDR_PIPECFG_P8_16x32_16x16: x4 = pipeBit2 ^ y5; x5 = pipeBit1 ^ y4; x3 = pipeBit0 ^ y3 ^ x4; break; case ADDR_PIPECFG_P8_32x32_16x16: x5 = pipeBit2 ^ y5; x4 = pipeBit1 ^ y4; x3 = pipeBit0 ^ y3 ^ x4; break; case ADDR_PIPECFG_P8_32x32_16x32: x5 = pipeBit2 ^ y5; x4 = pipeBit1 ^ y6; x3 = pipeBit0 ^ y3 ^ x4; break; case ADDR_PIPECFG_P8_32x64_32x32: x6 = pipeBit1 ^ y5; x5 = pipeBit2 ^ y6; x3 = pipeBit0 ^ y3 ^ x5; bankBit0 = yBitTemp ^ x6; x4 = bankBit00 ^ x5 ^ bankBit0; *pX += x6 * 8 * 1 * 8; // x6 * num_pipes * bank_width * 8; break; default: ADDR_ASSERT_ALWAYS(); } xTile = Bits2Number(3, x5, x4, x3); *pX += xTile << 3; } /** **************************************************************************************************** * SiLib::HwlPreAdjustBank * * @brief * Adjust bank before calculating address acoording to bank/pipe * @return * Adjusted bank **************************************************************************************************** */ UINT_32 SiLib::HwlPreAdjustBank( UINT_32 tileX, ///< [in] x coordinate in unit of tile UINT_32 bank, ///< [in] bank ADDR_TILEINFO* pTileInfo ///< [in] tile info ) const { if (((pTileInfo->pipeConfig == ADDR_PIPECFG_P4_32x32) || (pTileInfo->pipeConfig == ADDR_PIPECFG_P8_32x64_32x32)) && (pTileInfo->bankWidth == 1)) { UINT_32 bankBit0 = _BIT(bank, 0); UINT_32 x4 = _BIT(tileX, 1); UINT_32 x5 = _BIT(tileX, 2); bankBit0 = bankBit0 ^ x4 ^ x5; bank |= bankBit0; ADDR_ASSERT(pTileInfo->macroAspectRatio > 1); } return bank; } /** **************************************************************************************************** * SiLib::HwlComputeSurfaceInfo * * @brief * Entry of si's ComputeSurfaceInfo * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE SiLib::HwlComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [out] output structure ) const { pOut->tileIndex = pIn->tileIndex; ADDR_E_RETURNCODE retCode = EgBasedLib::HwlComputeSurfaceInfo(pIn, pOut); UINT_32 tileIndex = static_cast(pOut->tileIndex); if (((pIn->flags.needEquation == TRUE) || (pIn->flags.preferEquation == TRUE)) && (pIn->numSamples <= 1) && (tileIndex < TileTableSize)) { static const UINT_32 SiUncompressDepthTileIndex = 3; if ((pIn->numSlices > 1) && (IsMacroTiled(pOut->tileMode) == TRUE) && ((m_chipFamily == ADDR_CHIP_FAMILY_SI) || (IsPrtTileMode(pOut->tileMode) == FALSE))) { pOut->equationIndex = ADDR_INVALID_EQUATION_INDEX; } else if ((pIn->flags.prt == FALSE) && (m_uncompressDepthEqIndex != 0) && (tileIndex == SiUncompressDepthTileIndex)) { pOut->equationIndex = m_uncompressDepthEqIndex + Log2(pIn->bpp >> 3); } else { pOut->equationIndex = m_equationLookupTable[Log2(pIn->bpp >> 3)][tileIndex]; } if (pOut->equationIndex != ADDR_INVALID_EQUATION_INDEX) { pOut->blockWidth = m_blockWidth[pOut->equationIndex]; pOut->blockHeight = m_blockHeight[pOut->equationIndex]; pOut->blockSlices = m_blockSlices[pOut->equationIndex]; } } else { pOut->equationIndex = ADDR_INVALID_EQUATION_INDEX; } return retCode; } /** **************************************************************************************************** * SiLib::HwlComputeMipLevel * @brief * Compute MipLevel info (including level 0) * @return * TRUE if HWL's handled **************************************************************************************************** */ BOOL_32 SiLib::HwlComputeMipLevel( ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn ///< [in,out] Input structure ) const { // basePitch is calculated from level 0 so we only check this for mipLevel > 0 if (pIn->mipLevel > 0) { // Note: Don't check expand 3x formats(96 bit) as the basePitch is not pow2 even if // we explicity set pow2Pad flag. The 3x base pitch is padded to pow2 but after being // divided by expandX factor (3) - to program texture pitch, the basePitch is never pow2. if (ElemLib::IsExpand3x(pIn->format) == FALSE) { // Sublevel pitches are generated from base level pitch instead of width on SI // If pow2Pad is 0, we don't assert - as this is not really used for a mip chain ADDR_ASSERT((pIn->flags.pow2Pad == FALSE) || ((pIn->basePitch != 0) && IsPow2(pIn->basePitch))); } if (pIn->basePitch != 0) { pIn->width = Max(1u, pIn->basePitch >> pIn->mipLevel); } } // pow2Pad is done in PostComputeMipLevel return TRUE; } /** **************************************************************************************************** * SiLib::HwlCheckLastMacroTiledLvl * * @brief * Sets pOut->last2DLevel to TRUE if it is * @note * **************************************************************************************************** */ VOID SiLib::HwlCheckLastMacroTiledLvl( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ///< [in] Input structure ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [in,out] Output structure (used as input, too) ) const { // pow2Pad covers all mipmap cases if (pIn->flags.pow2Pad) { ADDR_ASSERT(IsMacroTiled(pIn->tileMode)); UINT_32 nextPitch; UINT_32 nextHeight; UINT_32 nextSlices; AddrTileMode nextTileMode; if (pIn->mipLevel == 0 || pIn->basePitch == 0) { // Base level or fail-safe case (basePitch == 0) nextPitch = pOut->pitch >> 1; } else { // Sub levels nextPitch = pIn->basePitch >> (pIn->mipLevel + 1); } // nextHeight must be shifted from this level's original height rather than a pow2 padded // one but this requires original height stored somewhere (pOut->height) ADDR_ASSERT(pOut->height != 0); // next level's height is just current level's >> 1 in pixels nextHeight = pOut->height >> 1; // Special format such as FMT_1 and FMT_32_32_32 can be linear only so we consider block // compressed foramts if (ElemLib::IsBlockCompressed(pIn->format)) { nextHeight = (nextHeight + 3) / 4; } nextHeight = NextPow2(nextHeight); // nextSlices may be 0 if this level's is 1 if (pIn->flags.volume) { nextSlices = Max(1u, pIn->numSlices >> 1); } else { nextSlices = pIn->numSlices; } nextTileMode = ComputeSurfaceMipLevelTileMode(pIn->tileMode, pIn->bpp, nextPitch, nextHeight, nextSlices, pIn->numSamples, pOut->blockWidth, pOut->blockHeight, pOut->pTileInfo); pOut->last2DLevel = IsMicroTiled(nextTileMode); } } /** **************************************************************************************************** * SiLib::HwlDegradeThickTileMode * * @brief * Degrades valid tile mode for thick modes if needed * * @return * Suitable tile mode **************************************************************************************************** */ AddrTileMode SiLib::HwlDegradeThickTileMode( AddrTileMode baseTileMode, ///< base tile mode UINT_32 numSlices, ///< current number of slices UINT_32* pBytesPerTile ///< [in,out] pointer to bytes per slice ) const { return EgBasedLib::HwlDegradeThickTileMode(baseTileMode, numSlices, pBytesPerTile); } /** **************************************************************************************************** * SiLib::HwlTileInfoEqual * * @brief * Return TRUE if all field are equal * @note * Only takes care of current HWL's data **************************************************************************************************** */ BOOL_32 SiLib::HwlTileInfoEqual( const ADDR_TILEINFO* pLeft, ///<[in] Left compare operand const ADDR_TILEINFO* pRight ///<[in] Right compare operand ) const { BOOL_32 equal = FALSE; if (pLeft->pipeConfig == pRight->pipeConfig) { equal = EgBasedLib::HwlTileInfoEqual(pLeft, pRight); } return equal; } /** **************************************************************************************************** * SiLib::GetTileSettings * * @brief * Get tile setting infos by index. * @return * Tile setting info. **************************************************************************************************** */ const TileConfig* SiLib::GetTileSetting( UINT_32 index ///< [in] Tile index ) const { ADDR_ASSERT(index < m_noOfEntries); return &m_tileTable[index]; } /** **************************************************************************************************** * SiLib::HwlPostCheckTileIndex * * @brief * Map a tile setting to index if curIndex is invalid, otherwise check if curIndex matches * tile mode/type/info and change the index if needed * @return * Tile index. **************************************************************************************************** */ INT_32 SiLib::HwlPostCheckTileIndex( const ADDR_TILEINFO* pInfo, ///< [in] Tile Info AddrTileMode mode, ///< [in] Tile mode AddrTileType type, ///< [in] Tile type INT curIndex ///< [in] Current index assigned in HwlSetupTileInfo ) const { INT_32 index = curIndex; if (mode == ADDR_TM_LINEAR_GENERAL) { index = TileIndexLinearGeneral; } else { BOOL_32 macroTiled = IsMacroTiled(mode); // We need to find a new index if either of them is true // 1. curIndex is invalid // 2. tile mode is changed // 3. tile info does not match for macro tiled if ((index == TileIndexInvalid || (mode != m_tileTable[index].mode) || (macroTiled && (HwlTileInfoEqual(pInfo, &m_tileTable[index].info) == FALSE)))) { for (index = 0; index < static_cast(m_noOfEntries); index++) { if (macroTiled) { // macro tile modes need all to match if (HwlTileInfoEqual(pInfo, &m_tileTable[index].info) && (mode == m_tileTable[index].mode) && (type == m_tileTable[index].type)) { break; } } else if (mode == ADDR_TM_LINEAR_ALIGNED) { // linear mode only needs tile mode to match if (mode == m_tileTable[index].mode) { break; } } else { // micro tile modes only need tile mode and tile type to match if (mode == m_tileTable[index].mode && type == m_tileTable[index].type) { break; } } } } } ADDR_ASSERT(index < static_cast(m_noOfEntries)); if (index >= static_cast(m_noOfEntries)) { index = TileIndexInvalid; } return index; } /** **************************************************************************************************** * SiLib::HwlSetupTileCfg * * @brief * Map tile index to tile setting. * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ ADDR_E_RETURNCODE SiLib::HwlSetupTileCfg( UINT_32 bpp, ///< Bits per pixel INT_32 index, ///< Tile index INT_32 macroModeIndex, ///< Index in macro tile mode table(CI) ADDR_TILEINFO* pInfo, ///< [out] Tile Info AddrTileMode* pMode, ///< [out] Tile mode AddrTileType* pType ///< [out] Tile type ) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; // Global flag to control usage of tileIndex if (UseTileIndex(index)) { if (index == TileIndexLinearGeneral) { if (pMode) { *pMode = ADDR_TM_LINEAR_GENERAL; } if (pType) { *pType = ADDR_DISPLAYABLE; } if (pInfo) { pInfo->banks = 2; pInfo->bankWidth = 1; pInfo->bankHeight = 1; pInfo->macroAspectRatio = 1; pInfo->tileSplitBytes = 64; pInfo->pipeConfig = ADDR_PIPECFG_P2; } } else if (static_cast(index) >= m_noOfEntries) { returnCode = ADDR_INVALIDPARAMS; } else { const TileConfig* pCfgTable = GetTileSetting(index); if (pInfo) { *pInfo = pCfgTable->info; } else { if (IsMacroTiled(pCfgTable->mode)) { returnCode = ADDR_INVALIDPARAMS; } } if (pMode) { *pMode = pCfgTable->mode; } if (pType) { *pType = pCfgTable->type; } } } return returnCode; } /** **************************************************************************************************** * SiLib::ReadGbTileMode * * @brief * Convert GB_TILE_MODE HW value to TileConfig. * @return * NA. **************************************************************************************************** */ VOID SiLib::ReadGbTileMode( UINT_32 regValue, ///< [in] GB_TILE_MODE register TileConfig* pCfg ///< [out] output structure ) const { GB_TILE_MODE gbTileMode; gbTileMode.val = regValue; pCfg->type = static_cast(gbTileMode.f.micro_tile_mode); pCfg->info.bankHeight = 1 << gbTileMode.f.bank_height; pCfg->info.bankWidth = 1 << gbTileMode.f.bank_width; pCfg->info.banks = 1 << (gbTileMode.f.num_banks + 1); pCfg->info.macroAspectRatio = 1 << gbTileMode.f.macro_tile_aspect; pCfg->info.tileSplitBytes = 64 << gbTileMode.f.tile_split; pCfg->info.pipeConfig = static_cast(gbTileMode.f.pipe_config + 1); UINT_32 regArrayMode = gbTileMode.f.array_mode; pCfg->mode = static_cast(regArrayMode); if (regArrayMode == 8) //ARRAY_2D_TILED_XTHICK { pCfg->mode = ADDR_TM_2D_TILED_XTHICK; } else if (regArrayMode >= 14) //ARRAY_3D_TILED_XTHICK { pCfg->mode = static_cast(pCfg->mode + 3); } } /** **************************************************************************************************** * SiLib::InitTileSettingTable * * @brief * Initialize the ADDR_TILE_CONFIG table. * @return * TRUE if tile table is correctly initialized **************************************************************************************************** */ BOOL_32 SiLib::InitTileSettingTable( const UINT_32* pCfg, ///< [in] Pointer to table of tile configs UINT_32 noOfEntries ///< [in] Numbe of entries in the table above ) { BOOL_32 initOk = TRUE; ADDR_ASSERT(noOfEntries <= TileTableSize); memset(m_tileTable, 0, sizeof(m_tileTable)); if (noOfEntries != 0) { m_noOfEntries = noOfEntries; } else { m_noOfEntries = TileTableSize; } if (pCfg) // From Client { for (UINT_32 i = 0; i < m_noOfEntries; i++) { ReadGbTileMode(*(pCfg + i), &m_tileTable[i]); } } else { ADDR_ASSERT_ALWAYS(); initOk = FALSE; } if (initOk) { ADDR_ASSERT(m_tileTable[TILEINDEX_LINEAR_ALIGNED].mode == ADDR_TM_LINEAR_ALIGNED); } return initOk; } /** **************************************************************************************************** * SiLib::HwlGetTileIndex * * @brief * Return the virtual/real index for given mode/type/info * @return * ADDR_OK if successful. **************************************************************************************************** */ ADDR_E_RETURNCODE SiLib::HwlGetTileIndex( const ADDR_GET_TILEINDEX_INPUT* pIn, ADDR_GET_TILEINDEX_OUTPUT* pOut) const { ADDR_E_RETURNCODE returnCode = ADDR_OK; pOut->index = HwlPostCheckTileIndex(pIn->pTileInfo, pIn->tileMode, pIn->tileType); return returnCode; } /** **************************************************************************************************** * SiLib::HwlFmaskPreThunkSurfInfo * * @brief * Some preparation before thunking a ComputeSurfaceInfo call for Fmask * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ VOID SiLib::HwlFmaskPreThunkSurfInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pFmaskIn, ///< [in] Input of fmask info const ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut, ///< [in] Output of fmask info ADDR_COMPUTE_SURFACE_INFO_INPUT* pSurfIn, ///< [out] Input of thunked surface info ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut ///< [out] Output of thunked surface info ) const { pSurfIn->tileIndex = pFmaskIn->tileIndex; } /** **************************************************************************************************** * SiLib::HwlFmaskPostThunkSurfInfo * * @brief * Copy hwl extra field after calling thunked ComputeSurfaceInfo * @return * ADDR_E_RETURNCODE **************************************************************************************************** */ VOID SiLib::HwlFmaskPostThunkSurfInfo( const ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut, ///< [in] Output of surface info ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut ///< [out] Output of fmask info ) const { pFmaskOut->macroModeIndex = TileIndexInvalid; pFmaskOut->tileIndex = pSurfOut->tileIndex; } /** **************************************************************************************************** * SiLib::HwlComputeFmaskBits * @brief * Computes fmask bits * @return * Fmask bits **************************************************************************************************** */ UINT_32 SiLib::HwlComputeFmaskBits( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, UINT_32* pNumSamples ) const { UINT_32 numSamples = pIn->numSamples; UINT_32 numFrags = GetNumFragments(numSamples, pIn->numFrags); UINT_32 bpp; if (numFrags != numSamples) // EQAA { ADDR_ASSERT(numFrags <= 8); if (pIn->resolved == FALSE) { if (numFrags == 1) { bpp = 1; numSamples = numSamples == 16 ? 16 : 8; } else if (numFrags == 2) { ADDR_ASSERT(numSamples >= 4); bpp = 2; numSamples = numSamples; } else if (numFrags == 4) { ADDR_ASSERT(numSamples >= 4); bpp = 4; numSamples = numSamples; } else // numFrags == 8 { ADDR_ASSERT(numSamples == 16); bpp = 4; numSamples = numSamples; } } else { if (numFrags == 1) { bpp = (numSamples == 16) ? 16 : 8; numSamples = 1; } else if (numFrags == 2) { ADDR_ASSERT(numSamples >= 4); bpp = numSamples*2; numSamples = 1; } else if (numFrags == 4) { ADDR_ASSERT(numSamples >= 4); bpp = numSamples*4; numSamples = 1; } else // numFrags == 8 { ADDR_ASSERT(numSamples >= 16); bpp = 16*4; numSamples = 1; } } } else // Normal AA { if (pIn->resolved == FALSE) { bpp = ComputeFmaskNumPlanesFromNumSamples(numSamples); numSamples = numSamples == 2 ? 8 : numSamples; } else { // The same as 8XX bpp = ComputeFmaskResolvedBppFromNumSamples(numSamples); numSamples = 1; // 1x sample } } SafeAssign(pNumSamples, numSamples); return bpp; } /** **************************************************************************************************** * SiLib::HwlOptimizeTileMode * * @brief * Optimize tile mode on SI * * @return * N/A * **************************************************************************************************** */ VOID SiLib::HwlOptimizeTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in,out] input output structure ) const { AddrTileMode tileMode = pInOut->tileMode; if ((pInOut->flags.needEquation == TRUE) && (IsMacroTiled(tileMode) == TRUE) && (pInOut->numSamples <= 1)) { UINT_32 thickness = Thickness(tileMode); if (thickness > 1) { tileMode = ADDR_TM_1D_TILED_THICK; } else if (pInOut->numSlices > 1) { tileMode = ADDR_TM_1D_TILED_THIN1; } else { tileMode = ADDR_TM_2D_TILED_THIN1; } } if (tileMode != pInOut->tileMode) { pInOut->tileMode = tileMode; } } /** **************************************************************************************************** * SiLib::HwlOverrideTileMode * * @brief * Override tile modes (for PRT only, avoid client passes in an invalid PRT mode for SI. * * @return * N/A * **************************************************************************************************** */ VOID SiLib::HwlOverrideTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in,out] input output structure ) const { AddrTileMode tileMode = pInOut->tileMode; switch (tileMode) { case ADDR_TM_PRT_TILED_THIN1: tileMode = ADDR_TM_2D_TILED_THIN1; break; case ADDR_TM_PRT_TILED_THICK: tileMode = ADDR_TM_2D_TILED_THICK; break; case ADDR_TM_PRT_2D_TILED_THICK: tileMode = ADDR_TM_2D_TILED_THICK; break; case ADDR_TM_PRT_3D_TILED_THICK: tileMode = ADDR_TM_3D_TILED_THICK; break; default: break; } if (tileMode != pInOut->tileMode) { pInOut->tileMode = tileMode; // Only PRT tile modes are overridden for now. Revisit this once new modes are added above. pInOut->flags.prt = TRUE; } } /** **************************************************************************************************** * SiLib::HwlSetPrtTileMode * * @brief * Set prt tile modes. * * @return * N/A * **************************************************************************************************** */ VOID SiLib::HwlSetPrtTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in,out] input output structure ) const { pInOut->tileMode = ADDR_TM_2D_TILED_THIN1; pInOut->tileType = (pInOut->tileType == ADDR_DEPTH_SAMPLE_ORDER) ? ADDR_DEPTH_SAMPLE_ORDER : ADDR_NON_DISPLAYABLE; pInOut->flags.prt = TRUE; } /** **************************************************************************************************** * SiLib::HwlSelectTileMode * * @brief * Select tile modes. * * @return * N/A * **************************************************************************************************** */ VOID SiLib::HwlSelectTileMode( ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut ///< [in,out] input output structure ) const { AddrTileMode tileMode; AddrTileType tileType; if (pInOut->flags.volume) { if (pInOut->numSlices >= 8) { tileMode = ADDR_TM_2D_TILED_XTHICK; } else if (pInOut->numSlices >= 4) { tileMode = ADDR_TM_2D_TILED_THICK; } else { tileMode = ADDR_TM_2D_TILED_THIN1; } tileType = ADDR_NON_DISPLAYABLE; } else { tileMode = ADDR_TM_2D_TILED_THIN1; if (pInOut->flags.depth || pInOut->flags.stencil) { tileType = ADDR_DEPTH_SAMPLE_ORDER; } else if ((pInOut->bpp <= 32) || (pInOut->flags.display == TRUE) || (pInOut->flags.overlay == TRUE)) { tileType = ADDR_DISPLAYABLE; } else { tileType = ADDR_NON_DISPLAYABLE; } } if (pInOut->flags.prt) { tileMode = ADDR_TM_2D_TILED_THIN1; tileType = (tileType == ADDR_DISPLAYABLE) ? ADDR_NON_DISPLAYABLE : tileType; } pInOut->tileMode = tileMode; pInOut->tileType = tileType; // Optimize tile mode if possible pInOut->flags.opt4Space = TRUE; // Optimize tile mode if possible OptimizeTileMode(pInOut); HwlOverrideTileMode(pInOut); } /** **************************************************************************************************** * SiLib::HwlComputeMaxBaseAlignments * * @brief * Gets maximum alignments * @return * maximum alignments **************************************************************************************************** */ UINT_32 SiLib::HwlComputeMaxBaseAlignments() const { const UINT_32 pipes = HwlGetPipes(&m_tileTable[0].info); // Initial size is 64 KiB for PRT. UINT_32 maxBaseAlign = 64 * 1024; for (UINT_32 i = 0; i < m_noOfEntries; i++) { if ((IsMacroTiled(m_tileTable[i].mode) == TRUE) && (IsPrtTileMode(m_tileTable[i].mode) == FALSE)) { // The maximum tile size is 16 byte-per-pixel and either 8-sample or 8-slice. UINT_32 tileSize = Min(m_tileTable[i].info.tileSplitBytes, MicroTilePixels * 8 * 16); UINT_32 baseAlign = tileSize * pipes * m_tileTable[i].info.banks * m_tileTable[i].info.bankWidth * m_tileTable[i].info.bankHeight; if (baseAlign > maxBaseAlign) { maxBaseAlign = baseAlign; } } } return maxBaseAlign; } /** **************************************************************************************************** * SiLib::HwlComputeMaxMetaBaseAlignments * * @brief * Gets maximum alignments for metadata * @return * maximum alignments for metadata **************************************************************************************************** */ UINT_32 SiLib::HwlComputeMaxMetaBaseAlignments() const { UINT_32 maxPipe = 1; for (UINT_32 i = 0; i < m_noOfEntries; i++) { maxPipe = Max(maxPipe, HwlGetPipes(&m_tileTable[i].info)); } return m_pipeInterleaveBytes * maxPipe; } /** **************************************************************************************************** * SiLib::HwlComputeSurfaceAlignmentsMacroTiled * * @brief * Hardware layer function to compute alignment request for macro tile mode * * @return * N/A * **************************************************************************************************** */ VOID SiLib::HwlComputeSurfaceAlignmentsMacroTiled( AddrTileMode tileMode, ///< [in] tile mode UINT_32 bpp, ///< [in] bits per pixel ADDR_SURFACE_FLAGS flags, ///< [in] surface flags UINT_32 mipLevel, ///< [in] mip level UINT_32 numSamples, ///< [in] number of samples ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut ///< [in,out] Surface output ) const { if ((mipLevel == 0) && (flags.prt)) { UINT_32 macroTileSize = pOut->blockWidth * pOut->blockHeight * numSamples * bpp / 8; if (macroTileSize < PrtTileSize) { UINT_32 numMacroTiles = PrtTileSize / macroTileSize; ADDR_ASSERT((PrtTileSize % macroTileSize) == 0); pOut->pitchAlign *= numMacroTiles; pOut->baseAlign *= numMacroTiles; } } } /** **************************************************************************************************** * SiLib::InitEquationTable * * @brief * Initialize Equation table. * * @return * N/A **************************************************************************************************** */ VOID SiLib::InitEquationTable() { ADDR_EQUATION_KEY equationKeyTable[EquationTableSize]; memset(equationKeyTable, 0, sizeof(equationKeyTable)); memset(m_equationTable, 0, sizeof(m_equationTable)); memset(m_blockWidth, 0, sizeof(m_blockWidth)); memset(m_blockHeight, 0, sizeof(m_blockHeight)); memset(m_blockSlices, 0, sizeof(m_blockSlices)); // Loop all possible bpp for (UINT_32 log2ElementBytes = 0; log2ElementBytes < MaxNumElementBytes; log2ElementBytes++) { // Get bits per pixel UINT_32 bpp = 1 << (log2ElementBytes + 3); // Loop all possible tile index for (INT_32 tileIndex = 0; tileIndex < static_cast(m_noOfEntries); tileIndex++) { UINT_32 equationIndex = ADDR_INVALID_EQUATION_INDEX; TileConfig tileConfig = m_tileTable[tileIndex]; ADDR_SURFACE_FLAGS flags = {{0}}; // Compute tile info, hardcode numSamples to 1 because MSAA is not supported // in swizzle pattern equation HwlComputeMacroModeIndex(tileIndex, flags, bpp, 1, &tileConfig.info, NULL, NULL); // Check if the input is supported if (IsEquationSupported(bpp, tileConfig, tileIndex, log2ElementBytes) == TRUE) { ADDR_EQUATION_KEY key = {{0}}; // Generate swizzle equation key from bpp and tile config key.fields.log2ElementBytes = log2ElementBytes; key.fields.tileMode = tileConfig.mode; // Treat depth micro tile type and non-display micro tile type as the same key // because they have the same equation actually key.fields.microTileType = (tileConfig.type == ADDR_DEPTH_SAMPLE_ORDER) ? ADDR_NON_DISPLAYABLE : tileConfig.type; key.fields.pipeConfig = tileConfig.info.pipeConfig; key.fields.numBanksLog2 = Log2(tileConfig.info.banks); key.fields.bankWidth = tileConfig.info.bankWidth; key.fields.bankHeight = tileConfig.info.bankHeight; key.fields.macroAspectRatio = tileConfig.info.macroAspectRatio; key.fields.prt = ((m_chipFamily == ADDR_CHIP_FAMILY_SI) && ((1 << tileIndex) & SiPrtTileIndexMask)) ? 1 : 0; // Find in the table if the equation has been built based on the key for (UINT_32 i = 0; i < m_numEquations; i++) { if (key.value == equationKeyTable[i].value) { equationIndex = i; break; } } // If found, just fill the index into the lookup table and no need // to generate the equation again. Otherwise, generate the equation. if (equationIndex == ADDR_INVALID_EQUATION_INDEX) { ADDR_EQUATION equation; ADDR_E_RETURNCODE retCode; memset(&equation, 0, sizeof(ADDR_EQUATION)); // Generate the equation if (IsMicroTiled(tileConfig.mode)) { retCode = ComputeMicroTileEquation(log2ElementBytes, tileConfig.mode, tileConfig.type, &equation); } else { retCode = ComputeMacroTileEquation(log2ElementBytes, tileConfig.mode, tileConfig.type, &tileConfig.info, &equation); } // Only fill the equation into the table if the return code is ADDR_OK, // otherwise if the return code is not ADDR_OK, it indicates this is not // a valid input, we do nothing but just fill invalid equation index // into the lookup table. if (retCode == ADDR_OK) { equationIndex = m_numEquations; ADDR_ASSERT(equationIndex < EquationTableSize); m_blockSlices[equationIndex] = Thickness(tileConfig.mode); if (IsMicroTiled(tileConfig.mode)) { m_blockWidth[equationIndex] = MicroTileWidth; m_blockHeight[equationIndex] = MicroTileHeight; } else { const ADDR_TILEINFO* pTileInfo = &tileConfig.info; m_blockWidth[equationIndex] = HwlGetPipes(pTileInfo) * MicroTileWidth * pTileInfo->bankWidth * pTileInfo->macroAspectRatio; m_blockHeight[equationIndex] = MicroTileHeight * pTileInfo->bankHeight * pTileInfo->banks / pTileInfo->macroAspectRatio; if (key.fields.prt) { UINT_32 macroTileSize = m_blockWidth[equationIndex] * m_blockHeight[equationIndex] * bpp / 8; if (macroTileSize < PrtTileSize) { UINT_32 numMacroTiles = PrtTileSize / macroTileSize; ADDR_ASSERT(macroTileSize == (1u << equation.numBits)); ADDR_ASSERT((PrtTileSize % macroTileSize) == 0); UINT_32 numBits = Log2(numMacroTiles); UINT_32 xStart = Log2(m_blockWidth[equationIndex]) + log2ElementBytes; m_blockWidth[equationIndex] *= numMacroTiles; for (UINT_32 i = 0; i < numBits; i++) { equation.addr[equation.numBits + i].valid = 1; equation.addr[equation.numBits + i].index = xStart + i; } equation.numBits += numBits; } } } equationKeyTable[equationIndex] = key; m_equationTable[equationIndex] = equation; m_numEquations++; } } } // Fill the index into the lookup table, if the combination is not supported // fill the invalid equation index m_equationLookupTable[log2ElementBytes][tileIndex] = equationIndex; } if (m_chipFamily == ADDR_CHIP_FAMILY_SI) { // For tile index 3 which is shared between PRT depth and uncompressed depth m_uncompressDepthEqIndex = m_numEquations; for (UINT_32 log2ElemBytes = 0; log2ElemBytes < MaxNumElementBytes; log2ElemBytes++) { TileConfig tileConfig = m_tileTable[3]; ADDR_EQUATION equation; ADDR_E_RETURNCODE retCode; memset(&equation, 0, sizeof(ADDR_EQUATION)); retCode = ComputeMacroTileEquation(log2ElemBytes, tileConfig.mode, tileConfig.type, &tileConfig.info, &equation); if (retCode == ADDR_OK) { UINT_32 equationIndex = m_numEquations; ADDR_ASSERT(equationIndex < EquationTableSize); m_blockSlices[equationIndex] = 1; const ADDR_TILEINFO* pTileInfo = &tileConfig.info; m_blockWidth[equationIndex] = HwlGetPipes(pTileInfo) * MicroTileWidth * pTileInfo->bankWidth * pTileInfo->macroAspectRatio; m_blockHeight[equationIndex] = MicroTileHeight * pTileInfo->bankHeight * pTileInfo->banks / pTileInfo->macroAspectRatio; m_equationTable[equationIndex] = equation; m_numEquations++; } } } } } /** **************************************************************************************************** * SiLib::IsEquationSupported * * @brief * Check if it is supported for given bpp and tile config to generate a equation. * * @return * TRUE if supported **************************************************************************************************** */ BOOL_32 SiLib::IsEquationSupported( UINT_32 bpp, ///< Bits per pixel TileConfig tileConfig, ///< Tile config INT_32 tileIndex, ///< Tile index UINT_32 elementBytesLog2 ///< Log2 of element bytes ) const { BOOL_32 supported = TRUE; // Linear tile mode is not supported in swizzle pattern equation if (IsLinear(tileConfig.mode)) { supported = FALSE; } // These tile modes are for Tex2DArray and Tex3D which has depth (num_slice > 1) use, // which is not supported in swizzle pattern equation due to slice rotation else if ((tileConfig.mode == ADDR_TM_2D_TILED_THICK) || (tileConfig.mode == ADDR_TM_2D_TILED_XTHICK) || (tileConfig.mode == ADDR_TM_3D_TILED_THIN1) || (tileConfig.mode == ADDR_TM_3D_TILED_THICK) || (tileConfig.mode == ADDR_TM_3D_TILED_XTHICK)) { supported = FALSE; } // Only 8bpp(stencil), 16bpp and 32bpp is supported for depth else if ((tileConfig.type == ADDR_DEPTH_SAMPLE_ORDER) && (bpp > 32)) { supported = FALSE; } // Tile split is not supported in swizzle pattern equation else if (IsMacroTiled(tileConfig.mode)) { UINT_32 thickness = Thickness(tileConfig.mode); if (((bpp >> 3) * MicroTilePixels * thickness) > tileConfig.info.tileSplitBytes) { supported = FALSE; } if ((supported == TRUE) && (m_chipFamily == ADDR_CHIP_FAMILY_SI)) { supported = m_EquationSupport[tileIndex][elementBytesLog2]; } } return supported; } } // V1 } // Addr } // rocr ROCR-Runtime-rocm-5.0.0/src/image/addrlib/src/r800/siaddrlib.h000066400000000000000000000317521420110115200234730ustar00rootroot00000000000000/* * Copyright © 2007-2019 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS, AUTHORS * AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. */ /** **************************************************************************************************** * @file siaddrlib.h * @brief Contains the R800Lib class definition. **************************************************************************************************** */ #ifndef __SI_ADDR_LIB_H__ #define __SI_ADDR_LIB_H__ #include "addrlib1.h" #include "egbaddrlib.h" namespace rocr { namespace Addr { namespace V1 { /** **************************************************************************************************** * @brief Describes the information in tile mode table **************************************************************************************************** */ struct TileConfig { AddrTileMode mode; AddrTileType type; ADDR_TILEINFO info; }; /** **************************************************************************************************** * @brief SI specific settings structure. **************************************************************************************************** */ struct SiChipSettings { UINT_32 isSouthernIsland : 1; UINT_32 isTahiti : 1; UINT_32 isPitCairn : 1; UINT_32 isCapeVerde : 1; // Oland/Hainan are of GFXIP 6.0, similar with SI UINT_32 isOland : 1; UINT_32 isHainan : 1; // CI UINT_32 isSeaIsland : 1; UINT_32 isBonaire : 1; UINT_32 isKaveri : 1; UINT_32 isSpectre : 1; UINT_32 isSpooky : 1; UINT_32 isKalindi : 1; UINT_32 isHawaii : 1; // VI UINT_32 isVolcanicIslands : 1; UINT_32 isIceland : 1; UINT_32 isTonga : 1; UINT_32 isFiji : 1; UINT_32 isPolaris10 : 1; UINT_32 isPolaris11 : 1; UINT_32 isPolaris12 : 1; UINT_32 isVegaM : 1; UINT_32 isCarrizo : 1; }; /** **************************************************************************************************** * @brief This class is the SI specific address library * function set. **************************************************************************************************** */ class SiLib : public EgBasedLib { public: /// Creates SiLib object static Addr::Lib* CreateObj(const Client* pClient) { VOID* pMem = Object::ClientAlloc(sizeof(SiLib), pClient); return (pMem != NULL) ? new (pMem) SiLib(pClient) : NULL; } protected: SiLib(const Client* pClient); virtual ~SiLib(); // Hwl interface - defined in AddrLib1 virtual ADDR_E_RETURNCODE HwlComputeSurfaceInfo( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual ADDR_E_RETURNCODE HwlConvertTileInfoToHW( const ADDR_CONVERT_TILEINFOTOHW_INPUT* pIn, ADDR_CONVERT_TILEINFOTOHW_OUTPUT* pOut) const; virtual UINT_64 HwlComputeXmaskAddrFromCoord( UINT_32 pitch, UINT_32 height, UINT_32 x, UINT_32 y, UINT_32 slice, UINT_32 numSlices, UINT_32 factor, BOOL_32 isLinear, BOOL_32 isWidth8, BOOL_32 isHeight8, ADDR_TILEINFO* pTileInfo, UINT_32* pBitPosition) const; virtual VOID HwlComputeXmaskCoordFromAddr( UINT_64 addr, UINT_32 bitPosition, UINT_32 pitch, UINT_32 height, UINT_32 numSlices, UINT_32 factor, BOOL_32 isLinear, BOOL_32 isWidth8, BOOL_32 isHeight8, ADDR_TILEINFO* pTileInfo, UINT_32* pX, UINT_32* pY, UINT_32* pSlice) const; virtual ADDR_E_RETURNCODE HwlGetTileIndex( const ADDR_GET_TILEINDEX_INPUT* pIn, ADDR_GET_TILEINDEX_OUTPUT* pOut) const; virtual BOOL_32 HwlComputeMipLevel( ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn) const; virtual ChipFamily HwlConvertChipFamily( UINT_32 uChipFamily, UINT_32 uChipRevision); virtual BOOL_32 HwlInitGlobalParams( const ADDR_CREATE_INPUT* pCreateIn); virtual ADDR_E_RETURNCODE HwlSetupTileCfg( UINT_32 bpp, INT_32 index, INT_32 macroModeIndex, ADDR_TILEINFO* pInfo, AddrTileMode* pMode = 0, AddrTileType* pType = 0) const; virtual VOID HwlComputeTileDataWidthAndHeightLinear( UINT_32* pMacroWidth, UINT_32* pMacroHeight, UINT_32 bpp, ADDR_TILEINFO* pTileInfo) const; virtual UINT_64 HwlComputeHtileBytes( UINT_32 pitch, UINT_32 height, UINT_32 bpp, BOOL_32 isLinear, UINT_32 numSlices, UINT_64* pSliceBytes, UINT_32 baseAlign) const; virtual ADDR_E_RETURNCODE ComputeBankEquation( UINT_32 log2BytesPP, UINT_32 threshX, UINT_32 threshY, ADDR_TILEINFO* pTileInfo, ADDR_EQUATION* pEquation) const; virtual ADDR_E_RETURNCODE ComputePipeEquation( UINT_32 log2BytesPP, UINT_32 threshX, UINT_32 threshY, ADDR_TILEINFO* pTileInfo, ADDR_EQUATION* pEquation) const; virtual UINT_32 ComputePipeFromCoord( UINT_32 x, UINT_32 y, UINT_32 slice, AddrTileMode tileMode, UINT_32 pipeSwizzle, BOOL_32 ignoreSE, ADDR_TILEINFO* pTileInfo) const; virtual UINT_32 HwlGetPipes(const ADDR_TILEINFO* pTileInfo) const; /// Pre-handler of 3x pitch (96 bit) adjustment virtual UINT_32 HwlPreHandleBaseLvl3xPitch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32 expPitch) const; /// Post-handler of 3x pitch adjustment virtual UINT_32 HwlPostHandleBaseLvl3xPitch( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, UINT_32 expPitch) const; /// Dummy function to finalize the inheritance virtual UINT_32 HwlComputeXmaskCoordYFrom8Pipe( UINT_32 pipe, UINT_32 x) const; // Sub-hwl interface - defined in EgBasedLib virtual VOID HwlSetupTileInfo( AddrTileMode tileMode, ADDR_SURFACE_FLAGS flags, UINT_32 bpp, UINT_32 pitch, UINT_32 height, UINT_32 numSamples, ADDR_TILEINFO* inputTileInfo, ADDR_TILEINFO* outputTileInfo, AddrTileType inTileType, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual UINT_32 HwlGetPitchAlignmentMicroTiled( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples) const; virtual UINT_64 HwlGetSizeAdjustmentMicroTiled( UINT_32 thickness, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples, UINT_32 baseAlign, UINT_32 pitchAlign, UINT_32 *pPitch, UINT_32 *pHeight) const; virtual VOID HwlCheckLastMacroTiledLvl( const ADDR_COMPUTE_SURFACE_INFO_INPUT* pIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; virtual BOOL_32 HwlTileInfoEqual( const ADDR_TILEINFO* pLeft, const ADDR_TILEINFO* pRight) const; virtual AddrTileMode HwlDegradeThickTileMode( AddrTileMode baseTileMode, UINT_32 numSlices, UINT_32* pBytesPerTile) const; virtual VOID HwlOverrideTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; virtual VOID HwlOptimizeTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; virtual VOID HwlSelectTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; /// Overwrite tile setting to PRT virtual VOID HwlSetPrtTileMode(ADDR_COMPUTE_SURFACE_INFO_INPUT* pInOut) const; virtual BOOL_32 HwlSanityCheckMacroTiled( ADDR_TILEINFO* pTileInfo) const { return TRUE; } virtual UINT_32 HwlGetPitchAlignmentLinear(UINT_32 bpp, ADDR_SURFACE_FLAGS flags) const; virtual UINT_64 HwlGetSizeAdjustmentLinear( AddrTileMode tileMode, UINT_32 bpp, UINT_32 numSamples, UINT_32 baseAlign, UINT_32 pitchAlign, UINT_32 *pPitch, UINT_32 *pHeight, UINT_32 *pHeightAlign) const; virtual VOID HwlComputeSurfaceCoord2DFromBankPipe( AddrTileMode tileMode, UINT_32* pX, UINT_32* pY, UINT_32 slice, UINT_32 bank, UINT_32 pipe, UINT_32 bankSwizzle, UINT_32 pipeSwizzle, UINT_32 tileSlices, BOOL_32 ignoreSE, ADDR_TILEINFO* pTileInfo) const; virtual UINT_32 HwlPreAdjustBank( UINT_32 tileX, UINT_32 bank, ADDR_TILEINFO* pTileInfo) const; virtual INT_32 HwlPostCheckTileIndex( const ADDR_TILEINFO* pInfo, AddrTileMode mode, AddrTileType type, INT curIndex = TileIndexInvalid) const; virtual VOID HwlFmaskPreThunkSurfInfo( const ADDR_COMPUTE_FMASK_INFO_INPUT* pFmaskIn, const ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut, ADDR_COMPUTE_SURFACE_INFO_INPUT* pSurfIn, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut) const; virtual VOID HwlFmaskPostThunkSurfInfo( const ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pSurfOut, ADDR_COMPUTE_FMASK_INFO_OUTPUT* pFmaskOut) const; virtual UINT_32 HwlComputeFmaskBits( const ADDR_COMPUTE_FMASK_INFO_INPUT* pIn, UINT_32* pNumSamples) const; virtual BOOL_32 HwlReduceBankWidthHeight( UINT_32 tileSize, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 numSamples, UINT_32 bankHeightAlign, UINT_32 pipes, ADDR_TILEINFO* pTileInfo) const { return TRUE; } virtual UINT_32 HwlComputeMaxBaseAlignments() const; virtual UINT_32 HwlComputeMaxMetaBaseAlignments() const; virtual VOID HwlComputeSurfaceAlignmentsMacroTiled( AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags, UINT_32 mipLevel, UINT_32 numSamples, ADDR_COMPUTE_SURFACE_INFO_OUTPUT* pOut) const; // Get equation table pointer and number of equations virtual UINT_32 HwlGetEquationTableInfo(const ADDR_EQUATION** ppEquationTable) const { *ppEquationTable = m_equationTable; return m_numEquations; } // Check if it is supported for given bpp and tile config to generate an equation BOOL_32 IsEquationSupported( UINT_32 bpp, TileConfig tileConfig, INT_32 tileIndex, UINT_32 elementBytesLog2) const; // Protected non-virtual functions VOID ComputeTileCoordFromPipeAndElemIdx( UINT_32 elemIdx, UINT_32 pipe, AddrPipeCfg pipeCfg, UINT_32 pitchInMacroTile, UINT_32 x, UINT_32 y, UINT_32* pX, UINT_32* pY) const; UINT_32 TileCoordToMaskElementIndex( UINT_32 tx, UINT_32 ty, AddrPipeCfg pipeConfig, UINT_32 *macroShift, UINT_32 *elemIdxBits) const; BOOL_32 DecodeGbRegs( const ADDR_REGISTER_VALUE* pRegValue); const TileConfig* GetTileSetting( UINT_32 index) const; // Initialize equation table VOID InitEquationTable(); UINT_32 GetPipePerSurf(AddrPipeCfg pipeConfig) const; static const UINT_32 TileTableSize = 32; TileConfig m_tileTable[TileTableSize]; UINT_32 m_noOfEntries; // Max number of bpp (8bpp/16bpp/32bpp/64bpp/128bpp) static const UINT_32 MaxNumElementBytes = 5; static const BOOL_32 m_EquationSupport[TileTableSize][MaxNumElementBytes]; // Prt tile mode index mask static const UINT_32 SiPrtTileIndexMask = ((1 << 3) | (1 << 5) | (1 << 6) | (1 << 7) | (1 << 21) | (1 << 22) | (1 << 23) | (1 << 24) | (1 << 25) | (1 << 30)); // More than half slots in tile mode table can't support equation static const UINT_32 EquationTableSize = (MaxNumElementBytes * TileTableSize) / 2; // Equation table ADDR_EQUATION m_equationTable[EquationTableSize]; UINT_32 m_numMacroBits[EquationTableSize]; UINT_32 m_blockWidth[EquationTableSize]; UINT_32 m_blockHeight[EquationTableSize]; UINT_32 m_blockSlices[EquationTableSize]; // Number of equation entries in the table UINT_32 m_numEquations; // Equation lookup table according to bpp and tile index UINT_32 m_equationLookupTable[MaxNumElementBytes][TileTableSize]; UINT_32 m_uncompressDepthEqIndex; SiChipSettings m_settings; private: VOID ReadGbTileMode(UINT_32 regValue, TileConfig* pCfg) const; BOOL_32 InitTileSettingTable(const UINT_32 *pSetting, UINT_32 noOfEntries); }; } // V1 } // Addr } // rocr #endif ROCR-Runtime-rocm-5.0.0/src/image/addrlib/util/000077500000000000000000000000001420110115200210525ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/addrlib/util/macros.h000066400000000000000000000233101420110115200225060ustar00rootroot00000000000000/* * Copyright © 2014 Intel Corporation * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice (including the next * paragraph) shall be included in all copies or substantial portions of the * Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. */ #ifndef UTIL_MACROS_H #define UTIL_MACROS_H #include /* Compute the size of an array */ #ifndef ARRAY_SIZE # define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) #endif /* For compatibility with Clang's __has_builtin() */ #ifndef __has_builtin # define __has_builtin(x) 0 #endif /** * __builtin_expect macros */ #if !defined(HAVE___BUILTIN_EXPECT) # define __builtin_expect(x, y) (x) #endif #ifndef likely # ifdef HAVE___BUILTIN_EXPECT # define likely(x) __builtin_expect(!!(x), 1) # define unlikely(x) __builtin_expect(!!(x), 0) # else # define likely(x) (x) # define unlikely(x) (x) # endif #endif /** * Static (compile-time) assertion. */ #define STATIC_ASSERT(COND) do { \ static_assert(COND, "Addrlib legacy static_assert failure."); \ } while(false) /** * Unreachable macro. Useful for suppressing "control reaches end of non-void * function" warnings. */ #if defined(HAVE___BUILTIN_UNREACHABLE) || __has_builtin(__builtin_unreachable) #define unreachable(str) \ do { \ assert(!str); \ __builtin_unreachable(); \ } while (0) #elif defined (_MSC_VER) #define unreachable(str) \ do { \ assert(!str); \ __assume(0); \ } while (0) #else #define unreachable(str) assert(!str) #endif /** * Assume macro. Useful for expressing our assumptions to the compiler, * typically for purposes of silencing warnings. */ #if __has_builtin(__builtin_assume) #define assume(expr) \ do { \ assert(expr); \ __builtin_assume(expr); \ } while (0) #elif defined HAVE___BUILTIN_UNREACHABLE #define assume(expr) ((expr) ? ((void) 0) \ : (assert(!"assumption failed"), \ __builtin_unreachable())) #elif defined (_MSC_VER) #define assume(expr) __assume(expr) #else #define assume(expr) assert(expr) #endif /* Attribute const is used for functions that have no effects other than their * return value, and only rely on the argument values to compute the return * value. As a result, calls to it can be CSEed. Note that using memory * pointed to by the arguments is not allowed for const functions. */ #ifdef HAVE_FUNC_ATTRIBUTE_CONST #define ATTRIBUTE_CONST __attribute__((__const__)) #else #define ATTRIBUTE_CONST #endif #ifdef HAVE_FUNC_ATTRIBUTE_FLATTEN #define FLATTEN __attribute__((__flatten__)) #else #define FLATTEN #endif #ifdef HAVE_FUNC_ATTRIBUTE_FORMAT #define PRINTFLIKE(f, a) __attribute__ ((format(__printf__, f, a))) #else #define PRINTFLIKE(f, a) #endif #ifdef HAVE_FUNC_ATTRIBUTE_MALLOC #define MALLOCLIKE __attribute__((__malloc__)) #else #define MALLOCLIKE #endif /* Forced function inlining */ /* Note: Clang also sets __GNUC__ (see other cases below) */ #ifndef ALWAYS_INLINE # if defined(__GNUC__) # define ALWAYS_INLINE inline __attribute__((always_inline)) # elif defined(_MSC_VER) # define ALWAYS_INLINE __forceinline # else # define ALWAYS_INLINE inline # endif #endif /* Used to optionally mark structures with misaligned elements or size as * packed, to trade off performance for space. */ #ifdef HAVE_FUNC_ATTRIBUTE_PACKED #define PACKED __attribute__((__packed__)) #else #define PACKED #endif /* Attribute pure is used for functions that have no effects other than their * return value. As a result, calls to it can be dead code eliminated. */ #ifdef HAVE_FUNC_ATTRIBUTE_PURE #define ATTRIBUTE_PURE __attribute__((__pure__)) #else #define ATTRIBUTE_PURE #endif #ifdef HAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL #define ATTRIBUTE_RETURNS_NONNULL __attribute__((__returns_nonnull__)) #else #define ATTRIBUTE_RETURNS_NONNULL #endif #ifndef NORETURN # ifdef _MSC_VER # define NORETURN __declspec(noreturn) # elif defined HAVE_FUNC_ATTRIBUTE_NORETURN # define NORETURN __attribute__((__noreturn__)) # else # define NORETURN # endif #endif #ifdef __cplusplus /** * Macro function that evaluates to true if T is a trivially * destructible type -- that is, if its (non-virtual) destructor * performs no action and all member variables and base classes are * trivially destructible themselves. */ # if (defined(__clang__) && defined(__has_feature)) # if __has_feature(has_trivial_destructor) # define HAS_TRIVIAL_DESTRUCTOR(T) __has_trivial_destructor(T) # endif # elif defined(__GNUC__) # if ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 3))) # define HAS_TRIVIAL_DESTRUCTOR(T) __has_trivial_destructor(T) # endif # elif defined(_MSC_VER) && !defined(__INTEL_COMPILER) # define HAS_TRIVIAL_DESTRUCTOR(T) __has_trivial_destructor(T) # endif # ifndef HAS_TRIVIAL_DESTRUCTOR /* It's always safe (if inefficient) to assume that a * destructor is non-trivial. */ # define HAS_TRIVIAL_DESTRUCTOR(T) (false) # endif #endif /** * PUBLIC/USED macros * * If we build the library with gcc's -fvisibility=hidden flag, we'll * use the PUBLIC macro to mark functions that are to be exported. * * We also need to define a USED attribute, so the optimizer doesn't * inline a static function that we later use in an alias. - ajax */ #ifndef PUBLIC # if defined(__GNUC__) # define PUBLIC __attribute__((visibility("default"))) # define USED __attribute__((used)) # elif defined(_MSC_VER) # define PUBLIC __declspec(dllexport) # define USED # else # define PUBLIC # define USED # endif #endif /** * UNUSED marks variables (or sometimes functions) that have to be defined, * but are sometimes (or always) unused beyond that. A common case is for * a function parameter to be used in some build configurations but not others. * Another case is fallback vfuncs that don't do anything with their params. * * Note that this should not be used for identifiers used in `assert()`; * see ASSERTED below. */ #ifdef HAVE_FUNC_ATTRIBUTE_UNUSED #define UNUSED __attribute__((unused)) #else #define UNUSED #endif /** * Use ASSERTED to indicate that an identifier is unused outside of an `assert()`, * so that assert-free builds don't get "unused variable" warnings. */ #ifdef NDEBUG #define ASSERTED UNUSED #else #define ASSERTED #endif #ifdef HAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT #define MUST_CHECK __attribute__((warn_unused_result)) #else #define MUST_CHECK #endif #if defined(__GNUC__) #define ATTRIBUTE_NOINLINE __attribute__((noinline)) #else #define ATTRIBUTE_NOINLINE #endif /** * Check that STRUCT::FIELD can hold MAXVAL. We use a lot of bitfields * in Mesa/gallium. We have to be sure they're of sufficient size to * hold the largest expected value. * Note that with MSVC, enums are signed and enum bitfields need one extra * high bit (always zero) to ensure the max value is handled correctly. * This macro will detect that with MSVC, but not GCC. */ #define ASSERT_BITFIELD_SIZE(STRUCT, FIELD, MAXVAL) \ do { \ ASSERTED STRUCT s; \ s.FIELD = (MAXVAL); \ assert((int) s.FIELD == (MAXVAL) && "Insufficient bitfield size!"); \ } while (0) /** Compute ceiling of integer quotient of A divided by B. */ #define DIV_ROUND_UP( A, B ) ( ((A) + (B) - 1) / (B) ) /** Clamp X to [MIN,MAX]. Turn NaN into MIN, arbitrarily. */ #define CLAMP( X, MIN, MAX ) ( (X)>(MIN) ? ((X)>(MAX) ? (MAX) : (X)) : (MIN) ) /** Minimum of two values: */ #define MIN2( A, B ) ( (A)<(B) ? (A) : (B) ) /** Maximum of two values: */ #define MAX2( A, B ) ( (A)>(B) ? (A) : (B) ) /** Minimum and maximum of three values: */ #define MIN3( A, B, C ) ((A) < (B) ? MIN2(A, C) : MIN2(B, C)) #define MAX3( A, B, C ) ((A) > (B) ? MAX2(A, C) : MAX2(B, C)) /** Align a value to a power of two */ #define ALIGN_POT(x, pot_align) (((x) + (pot_align) - 1) & ~((pot_align) - 1)) /** * Macro for declaring an explicit conversion operator. Defaults to an * implicit conversion if C++11 is not supported. */ #if __cplusplus >= 201103L #define EXPLICIT_CONVERSION explicit #elif defined(__cplusplus) #define EXPLICIT_CONVERSION #endif /** Set a single bit */ #define BITFIELD_BIT(b) (1u << (b)) /** Set all bits up to excluding bit b */ #define BITFIELD_MASK(b) \ ((b) == 32 ? (~0u) : BITFIELD_BIT((b) % 32) - 1) /** Set count bits starting from bit b */ #define BITFIELD_RANGE(b, count) \ (BITFIELD_MASK((b) + (count)) & ~BITFIELD_MASK(b)) /** Set a single bit */ #define BITFIELD64_BIT(b) (1ull << (b)) /** Set all bits up to excluding bit b */ #define BITFIELD64_MASK(b) \ ((b) == 64 ? (~0ull) : BITFIELD64_BIT(b) - 1) /** Set count bits starting from bit b */ #define BITFIELD64_RANGE(b, count) \ (BITFIELD64_MASK((b) + (count)) & ~BITFIELD64_MASK(b)) #endif /* UTIL_MACROS_H */ ROCR-Runtime-rocm-5.0.0/src/image/blit_kernel.cpp000066400000000000000000001117471420110115200215050ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "blit_kernel.h" #if (defined(WIN32) || defined(_WIN32)) #define NOMINMAX #endif #include #include #include #include #include "image_manager.h" #include "image_runtime.h" #include "util.h" #undef HSA_ARGUMENT_ALIGN_BYTES #define HSA_ARGUMENT_ALIGN_BYTES 16 #include "core/inc/hsa_internal.h" #include "core/inc/hsa_ext_amd_impl.h" #include "core/inc/hsa_table_interface.h" namespace rocr { namespace image { extern uint8_t blit_object_gfx7xx[14608]; extern uint8_t blit_object_gfx8xx[15424]; extern uint8_t blit_object_gfx9xx[15432]; extern uint8_t ocl_blit_object_gfx700[]; extern uint8_t ocl_blit_object_gfx701[]; extern uint8_t ocl_blit_object_gfx702[]; extern uint8_t ocl_blit_object_gfx801[]; extern uint8_t ocl_blit_object_gfx802[]; extern uint8_t ocl_blit_object_gfx803[]; extern uint8_t ocl_blit_object_gfx805[]; extern uint8_t ocl_blit_object_gfx810[]; extern uint8_t ocl_blit_object_gfx900[]; extern uint8_t ocl_blit_object_gfx902[]; extern uint8_t ocl_blit_object_gfx904[]; extern uint8_t ocl_blit_object_gfx906[]; extern uint8_t ocl_blit_object_gfx908[]; extern uint8_t ocl_blit_object_gfx909[]; extern uint8_t ocl_blit_object_gfx90a[]; extern uint8_t ocl_blit_object_gfx90c[]; extern uint8_t ocl_blit_object_gfx1010[]; extern uint8_t ocl_blit_object_gfx1011[]; extern uint8_t ocl_blit_object_gfx1012[]; extern uint8_t ocl_blit_object_gfx1013[]; extern uint8_t ocl_blit_object_gfx1030[]; extern uint8_t ocl_blit_object_gfx1031[]; extern uint8_t ocl_blit_object_gfx1032[]; extern uint8_t ocl_blit_object_gfx1033[]; extern uint8_t ocl_blit_object_gfx1034[]; extern uint8_t ocl_blit_object_gfx1035[]; // Arguments inserted by OCL compiler, all zero here. struct OCLHiddenArgs { uint64_t offset_x; uint64_t offset_y; uint64_t offset_z; void* printf_buffer; void* enqueue; void* enqueue2; void* multi_grid; }; static void* Allocate(hsa_agent_t agent, size_t size) { //use the host accessible kernarg pool hsa_amd_memory_pool_t pool = ImageRuntime::instance()->kernarg_pool(); void* ptr = NULL; hsa_status_t status = AMD::hsa_amd_memory_pool_allocate(pool, size, 0, &ptr); assert(status == HSA_STATUS_SUCCESS); if (status != HSA_STATUS_SUCCESS) return NULL; status = AMD::hsa_amd_agents_allow_access(1, &agent, NULL, ptr); assert(status == HSA_STATUS_SUCCESS); if (status != HSA_STATUS_SUCCESS) { AMD::hsa_amd_memory_pool_free(ptr); return NULL; } return ptr; } BlitKernel::BlitKernel() { } BlitKernel::~BlitKernel() {} hsa_status_t BlitKernel::Initialize() { return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::Cleanup() { for (std::pair pair : code_executable_map_) { HSA::hsa_executable_destroy(pair.second); } code_executable_map_.clear(); code_object_map_.clear(); return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::BuildBlitCode( hsa_agent_t agent, std::vector& blit_code_catalog) { // Find existing kernels in the list that have compatible ISA. hsa_isa_t agent_isa = {0}; hsa_status_t status = HSA::hsa_agent_get_info(agent, HSA_AGENT_INFO_ISA, &agent_isa); if (HSA_STATUS_SUCCESS != status) { return status; } std::lock_guard lock(lock_); for (std::pair pair : code_executable_map_) { bool isa_compatible = false; hsa_isa_t code_isa = {pair.first}; status = HSA::hsa_isa_compatible(code_isa, agent_isa, &isa_compatible); if (HSA_STATUS_SUCCESS != status) { return status; } if (isa_compatible) { return PopulateKernelCode(agent, pair.second, blit_code_catalog); } } // No existing compatible kernels. Build new kernels. hsa_code_object_t code_object = {0}; // Get the target name char agent_name[64] = {0}; status = HSA::hsa_agent_get_info(agent, HSA_AGENT_INFO_NAME, &agent_name); if (HSA_STATUS_SUCCESS != status) { return status; } // Get the patched code object uint8_t* patched_code_object; status = BlitKernel::GetPatchedBlitObject(agent_name, &patched_code_object); if (HSA_STATUS_SUCCESS != status) { return status; } // Pass the patched code object code_object.handle = reinterpret_cast(patched_code_object); code_object_map_[agent_isa.handle] = code_object; // Create executable. hsa_executable_t executable = {0}; status = HSA::hsa_executable_create(HSA_PROFILE_FULL, HSA_EXECUTABLE_STATE_UNFROZEN, "", &executable); if (HSA_STATUS_SUCCESS != status) { return status; } code_executable_map_[agent_isa.handle] = executable; // Load code object. status = HSA::hsa_executable_load_code_object(executable, agent, code_object, ""); if (HSA_STATUS_SUCCESS != status) { return status; } // Freeze executable. status = HSA::hsa_executable_freeze(executable, ""); if (HSA_STATUS_SUCCESS != status) { return status; } return PopulateKernelCode(agent, executable, blit_code_catalog); } hsa_status_t BlitKernel::CopyBufferToImage( BlitQueue& blit_queue, const std::vector& blit_code_catalog, const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, const Image& dst_image, const hsa_ext_image_region_t& image_region) { if (dst_image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { ImageManager* manager = ImageRuntime::instance()->image_manager(dst_image.component); const uint32_t element_size = manager->GetImageProperty(dst_image.component, dst_image.desc.format, dst_image.desc.geometry).element_size; const size_t dst_origin = image_region.offset.x * element_size; char* dst_memory = reinterpret_cast(dst_image.data) + dst_origin; const size_t size = image_region.range.x * element_size; return HSA::hsa_memory_copy(dst_memory, src_memory, size); } const Image* dst_image_view = NULL; hsa_status_t status = ConvertImage(dst_image, &dst_image_view); if (HSA_STATUS_SUCCESS != status) { return status; } assert(dst_image_view != NULL); hsa_kernel_dispatch_packet_t packet = {0}; const BlitCodeInfo& blit_code = blit_code_catalog.at(KERNEL_OP_COPY_BUFFER_TO_IMAGE); packet.kernel_object = blit_code.code_handle_; packet.group_segment_size = blit_code.group_segment_size_; packet.private_segment_size = blit_code.private_segment_size_; // Setup kernel argument. /* buffer is start of output pixel in destination buffer format.x is element count format.y is element size format.z is max(dword per pixel, 1) format.w is texture type. pixelOrigin is start pixel address. */ struct KernelArgs { const void* buffer; uint64_t image[5]; int32_t pixelOrigin[4]; uint32_t format[4]; uint64_t pitch; uint64_t slice_pitch; OCLHiddenArgs ocl; }; KernelArgs* args = (KernelArgs*)Allocate(dst_image_view->component, sizeof(KernelArgs)); assert(args != NULL); memset(args, 0, sizeof(KernelArgs)); args->buffer = src_memory; for(auto& img : args->image) img = dst_image_view->Convert(); args->pixelOrigin[0] = image_region.offset.x; args->pixelOrigin[1] = image_region.offset.y; args->pixelOrigin[2] = image_region.offset.z; ImageManager* manager = ImageRuntime::instance()->image_manager(dst_image_view->component); const uint32_t element_size = manager->GetImageProperty(dst_image_view->component, dst_image_view->desc.format, dst_image_view->desc.geometry).element_size; // Try to minimize the read operation to buffer by reading the buffer // up to one DWORD at a time. uint32_t buffer_read_count = element_size / sizeof(uint32_t); buffer_read_count = (buffer_read_count == 0) ? 1 : buffer_read_count; const uint32_t num_channel = GetNumChannel(*dst_image_view); const uint32_t size_per_channel = element_size / num_channel; args->format[0] = num_channel; args->format[1] = size_per_channel; args->format[2] = buffer_read_count; args->format[3] = dst_image_view->desc.geometry; unsigned long buffer_pitch[2] = {0, 0}; CalcBufferRowSlicePitchesInPixel(dst_image_view->desc.geometry, element_size, image_region.range, src_row_pitch, src_slice_pitch, buffer_pitch); args->pitch = buffer_pitch[0]; args->slice_pitch = buffer_pitch[1]; packet.kernarg_address = args; // Setup packet dimension and working size. CalcWorkingSize(*dst_image_view, image_region.range, packet); status = LaunchKernel(blit_queue, packet); if (&dst_image != dst_image_view) { Image::Destroy(dst_image_view); } AMD::hsa_amd_memory_pool_free(args); return status; } hsa_status_t BlitKernel::CopyImageToBuffer( BlitQueue& blit_queue, const std::vector& blit_code_catalog, const Image& src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t& image_region) { if (src_image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { ImageManager* manager = ImageRuntime::instance()->image_manager(src_image.component); const uint32_t element_size = manager->GetImageProperty(src_image.component, src_image.desc.format, src_image.desc.geometry).element_size; const size_t src_origin = image_region.offset.x * element_size; const char* src_memory = reinterpret_cast(src_image.data) + src_origin; const size_t size = image_region.range.x * element_size; return HSA::hsa_memory_copy(dst_memory, src_memory, size); } const Image* src_image_view = NULL; hsa_status_t status = ConvertImage(src_image, &src_image_view); if (HSA_STATUS_SUCCESS != status) { return status; } assert(src_image_view != NULL); hsa_kernel_dispatch_packet_t packet = {0}; const BlitCodeInfo& blit_code = blit_code_catalog.at(KERNEL_OP_COPY_IMAGE_TO_BUFFER); packet.kernel_object = blit_code.code_handle_; packet.group_segment_size = blit_code.group_segment_size_; packet.private_segment_size = blit_code.private_segment_size_; // Setup kernel argument. /* buffer is start of output pixel in destination buffer format.x is element count format.y is element size format.z is max(dword per pixel, 1) format.w is texture type. pixelOrigin is start pixel address. */ struct KernelArgs { uint64_t image[5]; void* buffer; int32_t pixelOrigin[4]; uint32_t format[4]; uint64_t pitch; uint64_t slice_pitch; OCLHiddenArgs ocl; }; KernelArgs* args = (KernelArgs*)Allocate(src_image_view->component, sizeof(KernelArgs)); assert(args != NULL); memset(args, 0, sizeof(KernelArgs)); for(auto &img : args->image) img = src_image_view->Convert(); args->buffer = dst_memory; args->pixelOrigin[0] = image_region.offset.x; args->pixelOrigin[1] = image_region.offset.y; args->pixelOrigin[2] = image_region.offset.z; ImageManager* manager = ImageRuntime::instance()->image_manager(src_image_view->component); const uint32_t element_size = manager->GetImageProperty(src_image_view->component, src_image_view->desc.format, src_image_view->desc.geometry).element_size; // Try to minimize the write operation to buffer by reading the buffer // up to one DWORD at a time. uint32_t buffer_write_count = element_size / sizeof(uint32_t); buffer_write_count = (buffer_write_count == 0) ? 1 : buffer_write_count; const uint32_t num_channel = GetNumChannel(*src_image_view); const uint32_t size_per_channel = element_size / num_channel; args->format[0] = num_channel; args->format[1] = size_per_channel; args->format[2] = buffer_write_count; args->format[3] = src_image_view->desc.geometry; unsigned long buffer_pitch[2] = {0, 0}; CalcBufferRowSlicePitchesInPixel(src_image_view->desc.geometry, element_size, image_region.range, dst_row_pitch, dst_slice_pitch, buffer_pitch); args->pitch = buffer_pitch[0]; args->slice_pitch = buffer_pitch[1]; packet.kernarg_address = args; // Setup packet dimension and working size. CalcWorkingSize(*src_image_view, image_region.range, packet); status = LaunchKernel(blit_queue, packet); if (&src_image != src_image_view) { Image::Destroy(src_image_view); } AMD::hsa_amd_memory_pool_free(args); return status; } hsa_status_t BlitKernel::CopyImage( BlitQueue& blit_queue, const std::vector& blit_code_catalog, const Image& dst_image, const Image& src_image, const hsa_dim3_t& dst_origin, const hsa_dim3_t& src_origin, const hsa_dim3_t size, KernelOp copy_type) { assert(src_image.component.handle == dst_image.component.handle); const Image* src_image_view = &src_image; const Image* dst_image_view = &dst_image; const BlitCodeInfo* blit_code = NULL; if (copy_type == KERNEL_OP_COPY_IMAGE_DEFAULT) { // Linear to linear image copy. hsa_status_t status = ConvertImage(src_image, &src_image_view); if (HSA_STATUS_SUCCESS != status) { return status; } assert(src_image_view != NULL); status = ConvertImage(dst_image, &dst_image_view); if (HSA_STATUS_SUCCESS != status) { return status; } assert(dst_image_view != NULL); const hsa_ext_image_geometry_t src_geometry = src_image_view->desc.geometry; const hsa_ext_image_geometry_t dst_geometry = dst_image_view->desc.geometry; if (src_geometry != HSA_EXT_IMAGE_GEOMETRY_1DB && dst_geometry != HSA_EXT_IMAGE_GEOMETRY_1DB) { blit_code = &blit_code_catalog.at(KERNEL_OP_COPY_IMAGE_DEFAULT); } else if (src_geometry == HSA_EXT_IMAGE_GEOMETRY_1DB && dst_geometry != HSA_EXT_IMAGE_GEOMETRY_1DB) { blit_code = &blit_code_catalog.at(KERNEL_OP_COPY_IMAGE_1DB_TO_REG); } else if (src_geometry != HSA_EXT_IMAGE_GEOMETRY_1DB && dst_geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { blit_code = &blit_code_catalog.at(KERNEL_OP_COPY_IMAGE_REG_TO_1DB); } else { blit_code = &blit_code_catalog.at(KERNEL_OP_COPY_IMAGE_1DB); } } else { blit_code = &blit_code_catalog.at(copy_type); } hsa_kernel_dispatch_packet_t packet = {0}; packet.kernel_object = blit_code->code_handle_; packet.group_segment_size = blit_code->group_segment_size_; packet.private_segment_size = blit_code->private_segment_size_; // Setup kernel argument. struct KernelArgs { uint64_t src[5]; uint64_t dst[5]; int32_t srcOrigin[4]; int32_t dstOrigin[4]; int32_t srcFormat; int32_t dstFormat; OCLHiddenArgs ocl; }; KernelArgs* args = (KernelArgs*)Allocate(dst_image_view->component, sizeof(KernelArgs)); assert(args != NULL); memset(args, 0, sizeof(KernelArgs)); for(auto& img : args->src) img = src_image_view->Convert(); args->srcFormat = src_image_view->desc.geometry; args->srcOrigin[0] = src_origin.x; args->srcOrigin[1] = src_origin.y; args->srcOrigin[2] = src_origin.z; for(auto& img : args->dst) img = dst_image_view->Convert(); args->dstFormat = dst_image_view->desc.geometry; args->dstOrigin[0] = dst_origin.x; args->dstOrigin[1] = dst_origin.y; args->dstOrigin[2] = dst_origin.z; packet.kernarg_address = args; // Setup packet dimension and working size. CalcWorkingSize(*src_image_view, *dst_image_view, size, packet); hsa_status_t status = LaunchKernel(blit_queue, packet); if (&src_image != src_image_view) { Image::Destroy(src_image_view); } if (&dst_image != dst_image_view) { Image::Destroy(dst_image_view); } AMD::hsa_amd_memory_pool_free(args); return status; } hsa_status_t BlitKernel::FillImage( BlitQueue& blit_queue, const std::vector& blit_code_catalog, const Image& image, const void* pattern, const hsa_ext_image_region_t& region) { hsa_kernel_dispatch_packet_t packet = {0}; const BlitCodeInfo& blit_code = (image.desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB) ? blit_code_catalog.at(KERNEL_OP_CLEAR_IMAGE) : blit_code_catalog.at(KERNEL_OP_CLEAR_IMAGE_1DB); packet.kernel_object = blit_code.code_handle_; packet.group_segment_size = blit_code.group_segment_size_; packet.private_segment_size = blit_code.private_segment_size_; // Setup kernel argument. struct KernelArgs { uint64_t image[5]; int32_t format; uint32_t type; uint32_t data[4]; int32_t origin[4]; OCLHiddenArgs ocl; }; KernelArgs* args = (KernelArgs*)Allocate(image.component, sizeof(KernelArgs)); assert(args != NULL); memset(args, 0, sizeof(KernelArgs)); for(auto &img : args->image) img = image.Convert(); args->format = image.desc.geometry; for(int i=0; i<4; i++) args->data[i] = ((const uint32_t*)pattern)[i]; args->origin[0] = region.offset.x; args->origin[1] = region.offset.y; args->origin[2] = region.offset.z; args->type = GetImageAccessType(image); packet.kernarg_address = args; // Setup packet dimension and working size. CalcWorkingSize(image, region.range, packet); hsa_status_t status = LaunchKernel(blit_queue, packet); AMD::hsa_amd_memory_pool_free(args); return status; } const char *BlitKernel::kernel_name_[KERNEL_OP_COUNT] = { "&__copy_image_to_buffer_kernel", "&__copy_buffer_to_image_kernel", "&__copy_image_default_kernel", "&__copy_image_linear_to_standard_kernel", "&__copy_image_standard_to_linear_kernel", "&__copy_image_1db_kernel", "&__copy_image_1db_to_reg_kernel", "&__copy_image_reg_to_1db_kernel", "&__clear_image_kernel", "&__clear_image_1db_kernel"}; const char *BlitKernel::ocl_kernel_name_[KERNEL_OP_COUNT] = { "copy_image_to_buffer.kd", "copy_buffer_to_image.kd", "copy_image_default.kd", "copy_image_linear_to_standard.kd", "copy_image_standard_to_linear.kd", "copy_image_1db.kd", "copy_image_1db_to_reg.kd", "copy_image_reg_to_1db.kd", "clear_image.kd", "clear_image_1db.kd"}; hsa_status_t BlitKernel::PopulateKernelCode( hsa_agent_t agent, hsa_executable_t executable, std::vector& blit_code_catalog) { blit_code_catalog.clear(); for (int i = 0; i < KERNEL_OP_COUNT; ++i) { // Get symbol handle. hsa_executable_symbol_t kernel_symbol = {0}; hsa_status_t status = HSA::hsa_executable_get_symbol_by_name(executable, ocl_kernel_name_[i], &agent, &kernel_symbol); if (HSA_STATUS_SUCCESS != status) { blit_code_catalog.clear(); return status; } // Get code handle. BlitCodeInfo blit_code = {0}; status = HSA::hsa_executable_symbol_get_info( kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &blit_code.code_handle_); if (HSA_STATUS_SUCCESS != status) { blit_code_catalog.clear(); return status; } status = HSA::hsa_executable_symbol_get_info( kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE, &blit_code.group_segment_size_); if (HSA_STATUS_SUCCESS != status) { blit_code_catalog.clear(); return status; } status = HSA::hsa_executable_symbol_get_info( kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE, &blit_code.private_segment_size_); if (HSA_STATUS_SUCCESS != status) { blit_code_catalog.clear(); return status; } blit_code_catalog.push_back(blit_code); } assert(blit_code_catalog.size() == KERNEL_OP_COUNT); return HSA_STATUS_SUCCESS; } void BlitKernel::CalcBufferRowSlicePitchesInPixel( hsa_ext_image_geometry_t geometry, uint32_t element_size, const hsa_dim3_t& copy_size, size_t in_row_pitch_byte, size_t in_slice_pitch_byte, unsigned long* out_pitch_pixel) { const bool is_1d_array = (geometry == HSA_EXT_IMAGE_GEOMETRY_1DA) ? true : false; out_pitch_pixel[0] = std::max(static_cast(copy_size.x), static_cast(in_row_pitch_byte / element_size)); out_pitch_pixel[1] = (is_1d_array) ? out_pitch_pixel[0] : (std::max( static_cast(out_pitch_pixel[0] * copy_size.y), static_cast(in_slice_pitch_byte / element_size))); assert((out_pitch_pixel[0] <= out_pitch_pixel[1])); } uint32_t BlitKernel::GetDimSize(const Image& image) { static const uint32_t kDimSizeTable[] = { 1, // HSA_EXT_IMAGE_GEOMETRY_1D 2, // HSA_EXT_IMAGE_GEOMETRY_2D 3, // HSA_EXT_IMAGE_GEOMETRY_3D 2, // HSA_EXT_IMAGE_GEOMETRY_1DA 3, // HSA_EXT_IMAGE_GEOMETRY_2DA 1, // HSA_EXT_IMAGE_GEOMETRY_1DB 2, // HSA_EXT_IMAGE_GEOMETRY_2DDEPTH 3, // HSA_EXT_IMAGE_GEOMETRY_2DADEPTH }; return kDimSizeTable[image.desc.geometry]; } uint32_t BlitKernel::GetNumChannel(const Image& image) { static const uint32_t kNumChannelTable[] = { 1, // HSA_EXT_IMAGE_CHANNEL_ORDER_A, 1, // HSA_EXT_IMAGE_CHANNEL_ORDER_R, 1, // HSA_EXT_IMAGE_CHANNEL_ORDER_RX, 2, // HSA_EXT_IMAGE_CHANNEL_ORDER_RG, 2, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGX, 2, // HSA_EXT_IMAGE_CHANNEL_ORDER_RA, 3, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGB, 3, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX, 4, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, 4, // HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA, 4, // HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB, 4, // HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR, 3, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB, 3, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX, 4, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA, 4, // HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA, 1, // HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY, 1, // HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE, 1, // HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH, 1, // HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL }; return kNumChannelTable[image.desc.format.channel_order]; } uint32_t BlitKernel::GetImageAccessType(const Image& image) { enum AccessType { ACCESS_TYPE_F = 0, ACCESS_TYPE_I = 1, ACCESS_TYPE_UI = 2, }; static const uint32_t kAccessType[] = { ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT8 ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT16 ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8 ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT16 ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24 ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555 ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565 ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010 ACCESS_TYPE_I, // HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT8 ACCESS_TYPE_I, // HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT16 ACCESS_TYPE_I, // HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT32 ACCESS_TYPE_UI, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8 ACCESS_TYPE_UI, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16 ACCESS_TYPE_UI, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 ACCESS_TYPE_F, // HSA_EXT_IMAGE_CHANNEL_TYPE_HALF_FLOAT ACCESS_TYPE_F // HSA_EXT_IMAGE_CHANNEL_TYPE_FLOAT }; return kAccessType[image.desc.format.channel_type]; } void BlitKernel::CalcWorkingSize(const Image& image, const hsa_dim3_t& range, hsa_kernel_dispatch_packet_t& packet) { switch (image.desc.geometry) { case HSA_EXT_IMAGE_GEOMETRY_1D: case HSA_EXT_IMAGE_GEOMETRY_1DB: case HSA_EXT_IMAGE_GEOMETRY_1DA: packet.setup = 2; packet.grid_size_x = range.x; packet.grid_size_y = range.y; packet.grid_size_z = 1; packet.workgroup_size_x = 64; packet.workgroup_size_y = packet.workgroup_size_z = 1; break; case HSA_EXT_IMAGE_GEOMETRY_2D: case HSA_EXT_IMAGE_GEOMETRY_2DDEPTH: case HSA_EXT_IMAGE_GEOMETRY_2DADEPTH: case HSA_EXT_IMAGE_GEOMETRY_2DA: packet.setup = 3; packet.grid_size_x = range.x; packet.grid_size_y = range.y; packet.grid_size_z = range.z; packet.workgroup_size_x = packet.workgroup_size_y = 8; packet.workgroup_size_z = 1; break; case HSA_EXT_IMAGE_GEOMETRY_3D: packet.setup = 3; packet.grid_size_x = range.x; packet.grid_size_y = range.y; packet.grid_size_z = range.z; packet.workgroup_size_x = packet.workgroup_size_y = 4; packet.workgroup_size_z = 4; break; } } void BlitKernel::CalcWorkingSize(const Image& src_image, const Image& dst_image, const hsa_dim3_t& range, hsa_kernel_dispatch_packet_t& packet) { if (GetDimSize(src_image) < GetDimSize(dst_image)) { CalcWorkingSize(src_image, range, packet); } else { CalcWorkingSize(dst_image, range, packet); } } hsa_status_t BlitKernel::ConvertImage(const Image& original_image, const Image** new_image) { // To simplify the kernel, some particular image channel types are converted // to a new channel type, while preserving the actual per pixel size. // E.g.: a UNORM SIGNED INT8 is converted into UNSIGNED INT8. This way the // kernel can just use read_imageui on all images. static const uint32_t kTypeConvertTable[] = { HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8, // HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT8 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16, // HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT16 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT16 HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8, // HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT8 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16, // HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT16 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32, // HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT32 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32, // HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16, // HSA_EXT_IMAGE_CHANNEL_TYPE_HALF_FLOAT HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 // HSA_EXT_IMAGE_CHANNEL_TYPE_FLOAT }; // To simplify the kernel, some particular image channel orders are converted // to a new channel order, while preserving the actual per pixel size. // E.g.: a CHANNEL ORDER A is converted into CHANNEL ORDER R. This way the // kernel can just read the first components of vector4 on all images. static const uint32_t kOrderConvertTable[] = { HSA_EXT_IMAGE_CHANNEL_ORDER_R, // HSA_EXT_IMAGE_CHANNEL_ORDER_A HSA_EXT_IMAGE_CHANNEL_ORDER_R, // HSA_EXT_IMAGE_CHANNEL_ORDER_R HSA_EXT_IMAGE_CHANNEL_ORDER_R, // HSA_EXT_IMAGE_CHANNEL_ORDER_RX HSA_EXT_IMAGE_CHANNEL_ORDER_RG, // HSA_EXT_IMAGE_CHANNEL_ORDER_RG HSA_EXT_IMAGE_CHANNEL_ORDER_RG, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGX HSA_EXT_IMAGE_CHANNEL_ORDER_RG, // HSA_EXT_IMAGE_CHANNEL_ORDER_RA HSA_EXT_IMAGE_CHANNEL_ORDER_RGB, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGB HSA_EXT_IMAGE_CHANNEL_ORDER_RGB, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, // HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, // HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, // HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA, // HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA HSA_EXT_IMAGE_CHANNEL_ORDER_R, // HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY HSA_EXT_IMAGE_CHANNEL_ORDER_R, // HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE HSA_EXT_IMAGE_CHANNEL_ORDER_R, // HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH HSA_EXT_IMAGE_CHANNEL_ORDER_RG // HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL }; const uint32_t current_type = original_image.desc.format.channel_type; uint32_t converted_type = kTypeConvertTable[current_type]; const uint32_t current_order = original_image.desc.format.channel_order; uint32_t converted_order = kOrderConvertTable[current_order]; if ((current_type == converted_type) && (current_order == converted_order)) { *new_image = &original_image; return HSA_STATUS_SUCCESS; } // Handle formats that drop channels on conversion, only usable with RGB(X) if((current_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555) || (current_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565) || (current_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010)) { converted_order = HSA_EXT_IMAGE_CHANNEL_ORDER_R; } // For internal book keeping, depth isn't a HW type. const hsa_ext_image_geometry_t current_geometry = original_image.desc.geometry; hsa_ext_image_geometry_t converted_geometry = current_geometry; if (converted_geometry == HSA_EXT_IMAGE_GEOMETRY_2DDEPTH) { converted_geometry = HSA_EXT_IMAGE_GEOMETRY_2D; } else if (converted_geometry == HSA_EXT_IMAGE_GEOMETRY_2DADEPTH) { converted_geometry = HSA_EXT_IMAGE_GEOMETRY_2DA; } hsa_ext_image_format_t new_format = { static_cast(converted_type), static_cast(converted_order)}; Image* new_image_handle = Image::Create(original_image.component); *new_image_handle=original_image; new_image_handle->desc.geometry = converted_geometry; hsa_status_t status = ImageRuntime::instance() ->image_manager(new_image_handle->component) ->ModifyImageSrd(*new_image_handle, new_format); if (status != HSA_STATUS_SUCCESS) { return status; } *new_image = new_image_handle; return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::LaunchKernel(BlitQueue& blit_queue, hsa_kernel_dispatch_packet_t& packet) { static const uint16_t kInvalidPacketHeader = HSA_PACKET_TYPE_INVALID; static const uint16_t kDispatchPacketHeader = (HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) | (0 << HSA_PACKET_HEADER_BARRIER) | (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) | (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE); // Copying the packet content to the queue buffer is not atomic, so it is // possible that the packet has a valid packet type but invalid content. // To make sure packet processor does not read invalid packet, we first // initialized the packet type to invalid. packet.header = kInvalidPacketHeader; // Setup completion signal. hsa_signal_t kernel_signal = {0}; hsa_status_t status = HSA::hsa_signal_create(1, 0, NULL, &kernel_signal); if (HSA_STATUS_SUCCESS != status) { return status; } packet.completion_signal = kernel_signal; // Populate the queue. hsa_queue_t* queue = blit_queue.queue_; const uint32_t bitmask = queue->size - 1; // Reserve write index. uint64_t write_index = HSA::hsa_queue_add_write_index_scacq_screl(queue, 1); while (true) { // Wait until we have room in the queue; const uint64_t read_index = HSA::hsa_queue_load_read_index_relaxed(queue); if ((write_index - read_index) < queue->size) { break; } } // Populate queue buffer with AQL packet. hsa_kernel_dispatch_packet_t* queue_buffer = reinterpret_cast(queue->base_address); queue_buffer[write_index & bitmask] = packet; std::atomic_thread_fence(std::memory_order_release); // Enable packet. queue_buffer[write_index & bitmask].header = kDispatchPacketHeader; // Update doorbel register. HSA::hsa_signal_store_screlease(queue->doorbell_signal, write_index); // Wait for the packet to finish. if (HSA::hsa_signal_wait_scacquire(kernel_signal, HSA_SIGNAL_CONDITION_LT, 1, uint64_t(-1), HSA_WAIT_STATE_ACTIVE) != 0) { status = HSA::hsa_signal_destroy(kernel_signal); assert(status == HSA_STATUS_SUCCESS); // Signal wait returned unexpected value. return HSA_STATUS_ERROR; } // Cleanup status = HSA::hsa_signal_destroy(kernel_signal); assert(status == HSA_STATUS_SUCCESS); return HSA_STATUS_SUCCESS; } hsa_status_t BlitKernel::GetPatchedBlitObject(const char* agent_name, uint8_t** blit_code_object) { std::string sname(agent_name); if (sname == "gfx700") { *blit_code_object = ocl_blit_object_gfx700; } else if (sname == "gfx701") { *blit_code_object = ocl_blit_object_gfx701; } else if (sname == "gfx702") { *blit_code_object = ocl_blit_object_gfx702; } else if (sname == "gfx801") { *blit_code_object = ocl_blit_object_gfx801; } else if (sname == "gfx802") { *blit_code_object = ocl_blit_object_gfx802; } else if (sname == "gfx803") { *blit_code_object = ocl_blit_object_gfx803; } else if (sname == "gfx805") { *blit_code_object = ocl_blit_object_gfx805; } else if (sname == "gfx810") { *blit_code_object = ocl_blit_object_gfx810; } else if (sname == "gfx900") { *blit_code_object = ocl_blit_object_gfx900; } else if (sname == "gfx902") { *blit_code_object = ocl_blit_object_gfx902; } else if (sname == "gfx904") { *blit_code_object = ocl_blit_object_gfx904; } else if (sname == "gfx906") { *blit_code_object = ocl_blit_object_gfx906; } else if (sname == "gfx908") { *blit_code_object = ocl_blit_object_gfx908; } else if (sname == "gfx909") { *blit_code_object = ocl_blit_object_gfx909; } else if (sname == "gfx90a") { *blit_code_object = ocl_blit_object_gfx90a; } else if (sname == "gfx90c") { *blit_code_object = ocl_blit_object_gfx90c; } else if (sname == "gfx1010") { *blit_code_object = ocl_blit_object_gfx1010; } else if (sname == "gfx1011") { *blit_code_object = ocl_blit_object_gfx1011; } else if (sname == "gfx1012") { *blit_code_object = ocl_blit_object_gfx1012; } else if (sname == "gfx1013") { *blit_code_object = ocl_blit_object_gfx1013; } else if (sname == "gfx1030") { *blit_code_object = ocl_blit_object_gfx1030; } else if (sname == "gfx1031") { *blit_code_object = ocl_blit_object_gfx1031; } else if (sname == "gfx1032") { *blit_code_object = ocl_blit_object_gfx1032; } else if (sname == "gfx1033") { *blit_code_object = ocl_blit_object_gfx1033; } else if (sname == "gfx1034") { *blit_code_object = ocl_blit_object_gfx1034; } else if (sname == "gfx1035") { *blit_code_object = ocl_blit_object_gfx1035; } else { return HSA_STATUS_ERROR_INVALID_ISA_NAME; } return HSA_STATUS_SUCCESS; } } // namespace image } // namespace rocr #undef HSA_ARGUMENT_ALIGN_BYTES ROCR-Runtime-rocm-5.0.0/src/image/blit_kernel.h000066400000000000000000000140231420110115200211370ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_BLIT_KERNEL_H #define HSA_RUNTIME_EXT_IMAGE_BLIT_KERNEL_H #include #include #include #include #include #include "inc/hsa.h" #include "resource.h" namespace rocr { namespace image { typedef struct BlitQueue { hsa_queue_t* queue_; volatile std::atomic cached_index_; } BlitQueue; typedef struct BlitCodeInfo { uint64_t code_handle_; uint32_t group_segment_size_; uint32_t private_segment_size_; } BlitCodeInfo; class BlitKernel { public: typedef enum KernelOp { KERNEL_OP_COPY_IMAGE_TO_BUFFER = 0, KERNEL_OP_COPY_BUFFER_TO_IMAGE = 1, KERNEL_OP_COPY_IMAGE_DEFAULT = 2, KERNEL_OP_COPY_IMAGE_LINEAR_TO_STANDARD = 3, KERNEL_OP_COPY_IMAGE_STANDARD_TO_LINEAR = 4, KERNEL_OP_COPY_IMAGE_1DB = 5, KERNEL_OP_COPY_IMAGE_1DB_TO_REG = 6, KERNEL_OP_COPY_IMAGE_REG_TO_1DB = 7, KERNEL_OP_CLEAR_IMAGE = 8, KERNEL_OP_CLEAR_IMAGE_1DB = 9, KERNEL_OP_COUNT = 10 } KernelOp; explicit BlitKernel(); ~BlitKernel(); hsa_status_t Initialize(); hsa_status_t Cleanup(); hsa_status_t BuildBlitCode(hsa_agent_t agent, std::vector& blit_code_catalog); hsa_status_t CopyBufferToImage( BlitQueue& blit_queue, const std::vector& blit_code_catalog, const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, const Image& dst_image, const hsa_ext_image_region_t& image_region); hsa_status_t CopyImageToBuffer( BlitQueue& blit_queue, const std::vector& blit_code_catalog, const Image& src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t& image_region); hsa_status_t CopyImage(BlitQueue& blit_queue, const std::vector& blit_code_catalog, const Image& dst_image, const Image& src_image, const hsa_dim3_t& dst_origin, const hsa_dim3_t& src_origin, const hsa_dim3_t size, KernelOp copy_type); hsa_status_t FillImage(BlitQueue& blit_queue, const std::vector& blit_code_catalog, const Image& image, const void* pattern, const hsa_ext_image_region_t& region); private: hsa_status_t PopulateKernelCode( hsa_agent_t agent, hsa_executable_t executable, std::vector& blit_code_catalog); inline void CalcBufferRowSlicePitchesInPixel( hsa_ext_image_geometry_t geometry, uint32_t element_size, const hsa_dim3_t& copy_size, size_t in_row_pitch_byte, size_t in_slice_pitch_byte, unsigned long* out_pitch_pixel); inline uint32_t GetDimSize(const Image& image); inline uint32_t GetNumChannel(const Image& image); inline uint32_t GetImageAccessType(const Image& image); void CalcWorkingSize(const Image& image, const hsa_dim3_t& range, hsa_kernel_dispatch_packet_t& packet); void CalcWorkingSize(const Image& src_image, const Image& dst_image, const hsa_dim3_t& range, hsa_kernel_dispatch_packet_t& packet); hsa_status_t ConvertImage(const Image& original_image, const Image** new_image); hsa_status_t LaunchKernel(BlitQueue& queue, hsa_kernel_dispatch_packet_t& packet); // The kernels' name. static const char* kernel_name_[KERNEL_OP_COUNT]; static const char* ocl_kernel_name_[KERNEL_OP_COUNT]; // Mapping of ISA and kernel object. std::unordered_map code_object_map_; // Mapping of ISA and kernel executable. std::unordered_map code_executable_map_; std::mutex lock_; DISALLOW_COPY_AND_ASSIGN(BlitKernel); // Get the patched code object hsa_status_t GetPatchedBlitObject(const char* agent_name, uint8_t** code_object_handle); }; } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_BLIT_KERNEL_H ROCR-Runtime-rocm-5.0.0/src/image/blit_object_gfx7xx.cpp000066400000000000000000001553211420110115200230020ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include namespace rocr { namespace image { uint8_t blit_object_gfx7xx[] = {127, 69, 76, 70, 2, 1, 1, 64, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 224, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 16, 55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 56, 0, 2, 0, 64, 0, 8, 0, 1, 0, 2, 0, 0, 96, 6, 0, 0, 0, 184, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 96, 5, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 220, 37, 0, 0, 0, 0, 0, 0, 220, 37, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 46, 115, 104, 115, 116, 114, 116, 97, 98, 0, 46, 115, 116, 114, 116, 97, 98, 0, 46, 110, 111, 116, 101, 0, 46, 104, 115, 97, 100, 97, 116, 97, 95, 114, 101, 97, 100, 111, 110, 108, 121, 95, 97, 103, 101, 110, 116, 0, 46, 104, 115, 97, 116, 101, 120, 116, 0, 46, 115, 121, 109, 116, 97, 98, 0, 46, 115, 121, 109, 116, 97, 98, 0, 46, 114, 101, 108, 97, 46, 104, 115, 97, 116, 101, 120, 116, 0, 0, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 69, 88, 80, 95, 69, 80, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 76, 79, 71, 69, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 76, 79, 71, 95, 73, 78, 86, 95, 69, 80, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 116, 111, 95, 98, 117, 102, 102, 101, 114, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 98, 117, 102, 102, 101, 114, 95, 116, 111, 95, 105, 109, 97, 103, 101, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 100, 101, 102, 97, 117, 108, 116, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 108, 105, 110, 101, 97, 114, 95, 116, 111, 95, 115, 116, 97, 110, 100, 97, 114, 100, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 115, 116, 97, 110, 100, 97, 114, 100, 95, 116, 111, 95, 108, 105, 110, 101, 97, 114, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 116, 111, 95, 114, 101, 103, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 114, 101, 103, 95, 116, 111, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 108, 101, 97, 114, 95, 105, 109, 97, 103, 101, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 108, 101, 97, 114, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 95, 95, 104, 115, 97, 95, 115, 101, 99, 116, 105, 111, 110, 46, 104, 115, 97, 100, 97, 116, 97, 95, 114, 101, 97, 100, 111, 110, 108, 121, 95, 97, 103, 101, 110, 116, 0, 95, 95, 104, 115, 97, 95, 115, 101, 99, 116, 105, 111, 110, 46, 104, 115, 97, 116, 101, 120, 116, 0, 0, 0, 0, 4, 0, 0, 0, 8, 0, 0, 0, 1, 0, 0, 0, 65, 77, 68, 0, 1, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 12, 0, 0, 0, 2, 0, 0, 0, 65, 77, 68, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 0, 4, 0, 0, 0, 26, 0, 0, 0, 3, 0, 0, 0, 65, 77, 68, 0, 4, 0, 7, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 77, 68, 0, 65, 77, 68, 71, 80, 85, 0, 0, 4, 0, 0, 0, 41, 0, 0, 0, 4, 0, 0, 0, 65, 77, 68, 0, 25, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 65, 77, 68, 32, 72, 83, 65, 32, 82, 117, 110, 116, 105, 109, 101, 32, 70, 105, 110, 97, 108, 105, 122, 101, 114, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 26, 0, 0, 0, 5, 0, 0, 0, 65, 77, 68, 0, 22, 0, 45, 104, 115, 97, 95, 99, 97, 108, 108, 95, 99, 111, 110, 118, 101, 110, 116, 105, 111, 110, 61, 48, 0, 37, 0, 0, 0, 0, 128, 63, 0, 0, 0, 0, 0, 96, 129, 63, 119, 62, 26, 57, 0, 192, 130, 63, 138, 105, 216, 57, 0, 32, 132, 63, 29, 70, 81, 58, 0, 160, 133, 63, 124, 54, 172, 57, 0, 0, 135, 63, 180, 12, 123, 58, 0, 128, 136, 63, 4, 116, 64, 58, 0, 0, 138, 63, 170, 171, 38, 58, 0, 128, 139, 63, 31, 15, 46, 58, 0, 0, 141, 63, 219, 250, 86, 58, 0, 160, 142, 63, 104, 49, 7, 57, 0, 32, 144, 63, 24, 226, 14, 58, 0, 192, 145, 63, 234, 220, 244, 56, 0, 64, 147, 63, 120, 89, 81, 58, 0, 224, 148, 63, 71, 125, 39, 58, 0, 128, 150, 63, 185, 105, 33, 58, 0, 32, 152, 63, 140, 130, 63, 58, 0, 224, 153, 63, 65, 38, 11, 55, 0, 128, 155, 63, 157, 155, 211, 57, 0, 32, 157, 63, 57, 205, 118, 58, 0, 224, 158, 63, 4, 147, 41, 58, 0, 160, 160, 63, 125, 136, 2, 58, 0, 96, 162, 63, 24, 24, 2, 58, 0, 32, 164, 63, 112, 173, 40, 58, 0, 224, 165, 63, 77, 181, 118, 58, 0, 192, 167, 63, 78, 59, 217, 57, 0, 160, 169, 63, 117, 90, 45, 56, 0, 96, 171, 63, 173, 205, 81, 58, 0, 64, 173, 63, 82, 247, 65, 58, 0, 32, 175, 63, 107, 197, 91, 58, 0, 32, 177, 63, 116, 96, 253, 56, 0, 0, 179, 63, 149, 32, 14, 58, 0, 0, 181, 63, 127, 102, 30, 57, 0, 224, 182, 63, 25, 143, 108, 58, 0, 224, 184, 63, 59, 122, 93, 58, 0, 224, 186, 63, 144, 213, 122, 58, 0, 0, 189, 63, 245, 57, 138, 57, 0, 0, 191, 63, 179, 205, 60, 58, 0, 32, 193, 63, 166, 204, 196, 57, 0, 64, 195, 63, 68, 155, 89, 57, 0, 96, 197, 63, 42, 66, 101, 57, 0, 128, 199, 63, 138, 76, 215, 57, 0, 160, 201, 63, 51, 236, 77, 58, 0, 224, 203, 63, 239, 79, 193, 57, 0, 32, 206, 63, 163, 130, 17, 57, 0, 96, 208, 63, 187, 246, 204, 56, 0, 160, 210, 63, 31, 217, 129, 57, 0, 224, 212, 63, 94, 213, 26, 58, 0, 64, 215, 63, 90, 153, 31, 57, 0, 128, 217, 63, 19, 174, 104, 58, 0, 224, 219, 63, 190, 188, 93, 58, 0, 96, 222, 63, 94, 130, 244, 55, 0, 192, 224, 63, 194, 238, 205, 57, 0, 32, 227, 63, 149, 75, 124, 58, 0, 160, 229, 63, 59, 55, 72, 58, 0, 32, 232, 63, 129, 82, 75, 58, 0, 192, 234, 63, 221, 231, 198, 55, 0, 64, 237, 63, 237, 1, 243, 57, 0, 224, 239, 63, 123, 51, 23, 57, 0, 128, 242, 63, 44, 158, 59, 56, 0, 32, 245, 63, 164, 162, 47, 57, 0, 192, 247, 63, 152, 251, 6, 58, 0, 128, 250, 63, 220, 182, 236, 56, 0, 32, 253, 63, 103, 96, 112, 58, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 59, 65, 172, 41, 52, 0, 0, 126, 60, 252, 176, 168, 53, 0, 192, 189, 60, 234, 131, 141, 54, 0, 16, 252, 60, 120, 14, 27, 54, 0, 240, 28, 61, 254, 185, 135, 54, 0, 160, 59, 61, 101, 236, 49, 54, 0, 16, 90, 61, 25, 113, 221, 54, 0, 80, 120, 61, 69, 0, 195, 53, 0, 32, 139, 61, 81, 119, 155, 55, 0, 0, 154, 61, 13, 203, 235, 55, 0, 208, 168, 61, 131, 159, 131, 55, 0, 128, 183, 61, 229, 138, 82, 55, 0, 16, 198, 61, 24, 235, 162, 55, 0, 144, 212, 61, 149, 116, 218, 54, 0, 240, 226, 61, 183, 30, 169, 54, 0, 48, 241, 61, 21, 183, 131, 55, 0, 96, 255, 61, 219, 49, 17, 55, 0, 176, 6, 62, 104, 62, 63, 56, 0, 176, 13, 62, 151, 106, 21, 56, 0, 160, 20, 62, 15, 124, 41, 56, 0, 128, 27, 62, 15, 16, 126, 56, 0, 96, 34, 62, 101, 182, 21, 56, 0, 48, 41, 62, 161, 227, 229, 55, 0, 240, 47, 62, 83, 56, 24, 56, 0, 176, 54, 62, 157, 113, 254, 53, 0, 80, 61, 62, 8, 129, 68, 56, 0, 240, 67, 62, 144, 50, 80, 56, 0, 144, 74, 62, 232, 57, 53, 55, 0, 16, 81, 62, 241, 15, 94, 56, 0, 144, 87, 62, 64, 167, 100, 56, 0, 16, 94, 62, 45, 116, 134, 55, 0, 112, 100, 62, 205, 227, 123, 56, 0, 224, 106, 62, 62, 173, 133, 54, 0, 48, 113, 62, 21, 183, 3, 56, 0, 128, 119, 62, 220, 203, 173, 55, 0, 192, 125, 62, 175, 54, 12, 56, 0, 0, 130, 62, 211, 82, 22, 55, 0, 16, 133, 62, 57, 113, 146, 56, 0, 32, 136, 62, 215, 252, 197, 56, 0, 48, 139, 62, 213, 85, 174, 56, 0, 64, 142, 62, 105, 193, 24, 56, 0, 64, 145, 62, 231, 253, 160, 56, 0, 64, 148, 62, 239, 9, 173, 56, 0, 64, 151, 62, 225, 186, 98, 56, 0, 48, 154, 62, 76, 205, 238, 56, 0, 48, 157, 62, 210, 170, 152, 55, 0, 32, 160, 62, 26, 26, 66, 55, 0, 0, 163, 62, 14, 225, 197, 56, 0, 240, 165, 62, 238, 42, 191, 55, 0, 208, 168, 62, 45, 135, 45, 56, 0, 176, 171, 62, 138, 46, 238, 55, 0, 128, 174, 62, 172, 223, 222, 56, 0, 96, 177, 62, 185, 242, 2, 56, 0, 48, 180, 62, 155, 30, 72, 56, 0, 0, 183, 62, 43, 170, 14, 56, 0, 192, 185, 62, 93, 251, 235, 56, 0, 144, 188, 62, 221, 95, 37, 56, 0, 80, 191, 62, 130, 59, 120, 56, 0, 16, 194, 62, 30, 218, 81, 56, 0, 208, 196, 62, 5, 27, 78, 55, 0, 128, 199, 62, 155, 67, 143, 56, 0, 48, 202, 62, 16, 14, 202, 56, 0, 224, 204, 62, 139, 192, 202, 56, 0, 144, 207, 62, 95, 246, 145, 56, 0, 64, 210, 62, 203, 33, 129, 55, 0, 224, 212, 62, 154, 154, 108, 56, 0, 128, 215, 62, 35, 153, 148, 56, 0, 32, 218, 62, 204, 123, 119, 56, 0, 192, 220, 62, 38, 45, 177, 55, 0, 80, 223, 62, 211, 206, 166, 56, 0, 224, 225, 62, 230, 211, 235, 56, 0, 112, 228, 62, 205, 227, 251, 56, 0, 0, 231, 62, 194, 133, 215, 56, 0, 144, 233, 62, 0, 126, 126, 56, 0, 16, 236, 62, 197, 146, 243, 56, 0, 160, 238, 62, 131, 9, 212, 55, 0, 32, 241, 62, 124, 26, 8, 56, 0, 160, 243, 62, 173, 195, 132, 55, 0, 16, 246, 62, 35, 233, 204, 56, 0, 144, 248, 62, 175, 95, 15, 56, 0, 0, 251, 62, 56, 253, 145, 56, 0, 112, 253, 62, 188, 71, 172, 56, 0, 224, 255, 62, 43, 4, 151, 56, 0, 32, 1, 63, 210, 82, 41, 57, 0, 80, 2, 63, 212, 206, 111, 57, 0, 144, 3, 63, 115, 112, 249, 55, 0, 192, 4, 63, 174, 158, 94, 56, 0, 240, 5, 63, 74, 200, 101, 56, 0, 32, 7, 63, 163, 11, 19, 56, 0, 64, 8, 63, 22, 207, 121, 57, 0, 112, 9, 63, 201, 202, 56, 57, 0, 160, 10, 63, 244, 210, 195, 56, 0, 192, 11, 63, 236, 93, 117, 57, 0, 240, 12, 63, 103, 180, 230, 56, 0, 16, 14, 63, 184, 15, 92, 57, 0, 64, 15, 63, 224, 188, 62, 56, 0, 96, 16, 63, 146, 209, 220, 56, 0, 128, 17, 63, 223, 107, 24, 57, 0, 160, 18, 63, 76, 231, 45, 57, 0, 192, 19, 63, 68, 9, 47, 57, 0, 224, 20, 63, 97, 255, 27, 57, 0, 0, 22, 63, 68, 237, 233, 56, 0, 32, 23, 63, 200, 109, 104, 56, 0, 48, 24, 63, 167, 153, 107, 57, 0, 80, 25, 63, 137, 156, 9, 57, 0, 112, 26, 63, 115, 118, 162, 55, 0, 128, 27, 63, 163, 218, 11, 57, 0, 144, 28, 63, 171, 105, 112, 57, 0, 176, 29, 63, 255, 73, 132, 56, 0, 192, 30, 63, 56, 53, 1, 57, 0, 208, 31, 63, 104, 194, 45, 57, 0, 224, 32, 63, 35, 244, 71, 57, 0, 240, 33, 63, 124, 241, 79, 57, 0, 0, 35, 63, 14, 225, 69, 57, 0, 16, 36, 63, 245, 232, 41, 57, 0, 32, 37, 63, 176, 93, 248, 56, 0, 48, 38, 63, 153, 95, 115, 56, 0, 48, 39, 63, 219, 8, 108, 57, 0, 64, 40, 63, 0, 230, 9, 57, 0, 80, 41, 63, 111, 153, 180, 55, 0, 80, 42, 63, 204, 51, 18, 57, 0, 80, 43, 63, 217, 234, 124, 57, 0, 96, 44, 63, 205, 181, 173, 56, 0, 96, 45, 63, 26, 38, 32, 57, 0, 96, 46, 63, 54, 238, 88, 57, 0, 112, 47, 63, 5, 73, 170, 53, 0, 112, 48, 63, 30, 209, 203, 55, 0, 112, 49, 63, 244, 253, 5, 56, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 254, 63, 248, 3, 254, 56, 0, 0, 252, 63, 193, 15, 252, 57, 0, 0, 250, 63, 201, 179, 140, 58, 0, 0, 248, 63, 16, 62, 248, 58, 0, 0, 246, 63, 48, 123, 64, 59, 0, 0, 244, 63, 96, 141, 137, 59, 0, 0, 242, 63, 72, 214, 185, 59, 0, 0, 240, 63, 241, 240, 240, 59, 0, 0, 239, 63, 127, 220, 186, 58, 0, 0, 237, 63, 108, 7, 102, 59, 0, 0, 235, 63, 166, 178, 189, 59, 0, 0, 234, 63, 161, 14, 234, 57, 0, 0, 232, 63, 247, 88, 75, 59, 0, 0, 230, 63, 72, 180, 194, 59, 0, 0, 229, 63, 172, 96, 150, 58, 0, 0, 227, 63, 228, 56, 142, 59, 0, 0, 225, 63, 14, 120, 252, 59, 0, 0, 224, 63, 56, 112, 96, 59, 0, 0, 222, 63, 77, 92, 233, 59, 0, 0, 221, 63, 76, 145, 79, 59, 0, 0, 219, 63, 239, 97, 235, 59, 0, 0, 218, 63, 79, 27, 104, 59, 0, 0, 217, 63, 178, 1, 89, 56, 0, 0, 215, 63, 229, 53, 148, 59, 0, 0, 214, 63, 89, 3, 174, 58, 0, 0, 212, 63, 3, 123, 199, 59, 0, 0, 211, 63, 109, 26, 80, 59, 0, 0, 210, 63, 33, 13, 210, 57, 0, 0, 208, 63, 204, 159, 182, 59, 0, 0, 207, 63, 81, 233, 72, 59, 0, 0, 206, 63, 185, 83, 52, 58, 0, 0, 204, 63, 205, 204, 204, 59, 0, 0, 203, 63, 192, 39, 135, 59, 0, 0, 202, 63, 205, 15, 11, 59, 0, 0, 201, 63, 209, 73, 123, 57, 0, 0, 199, 63, 125, 12, 206, 59, 0, 0, 198, 63, 106, 12, 152, 59, 0, 0, 197, 63, 247, 144, 75, 59, 0, 0, 196, 63, 21, 190, 220, 58, 0, 0, 195, 63, 49, 12, 195, 57, 0, 0, 193, 63, 214, 187, 228, 59, 0, 0, 192, 63, 193, 192, 192, 59, 0, 0, 191, 63, 232, 47, 160, 59, 0, 0, 190, 63, 12, 250, 130, 59, 0, 0, 189, 63, 142, 32, 82, 59, 0, 0, 188, 63, 24, 200, 36, 59, 0, 0, 187, 63, 135, 156, 251, 58, 0, 0, 186, 63, 140, 46, 186, 58, 0, 0, 185, 63, 233, 15, 133, 58, 0, 0, 184, 63, 3, 23, 56, 58, 0, 0, 183, 63, 162, 181, 251, 57, 0, 0, 182, 63, 97, 11, 182, 57, 0, 0, 181, 63, 170, 104, 158, 57, 0, 0, 180, 63, 65, 11, 180, 57, 0, 0, 179, 63, 41, 53, 246, 57, 0, 0, 178, 63, 67, 22, 50, 58, 0, 0, 177, 63, 192, 157, 126, 58, 0, 0, 176, 63, 11, 44, 176, 58, 0, 0, 175, 63, 26, 119, 235, 58, 0, 0, 174, 63, 185, 130, 24, 59, 0, 0, 173, 63, 176, 86, 64, 59, 0, 0, 172, 63, 8, 35, 109, 59, 0, 0, 171, 63, 227, 105, 143, 59, 0, 0, 170, 63, 171, 170, 170, 59, 0, 0, 169, 63, 72, 74, 200, 59, 0, 0, 168, 63, 87, 63, 232, 59, 0, 0, 168, 63, 129, 10, 168, 57, 0, 0, 167, 63, 230, 20, 188, 58, 0, 0, 166, 63, 114, 136, 43, 59, 0, 0, 165, 63, 5, 106, 125, 59, 0, 0, 164, 63, 30, 207, 169, 59, 0, 0, 163, 63, 61, 10, 215, 59, 0, 0, 163, 63, 246, 199, 75, 57, 0, 0, 162, 63, 172, 12, 223, 58, 0, 0, 161, 63, 93, 98, 86, 59, 0, 0, 160, 63, 161, 160, 160, 59, 0, 0, 159, 63, 254, 9, 216, 59, 0, 0, 159, 63, 57, 47, 11, 58, 0, 0, 158, 63, 72, 90, 25, 59, 0, 0, 157, 63, 158, 216, 137, 59, 0, 0, 156, 63, 97, 225, 200, 59, 0, 0, 156, 63, 193, 9, 156, 57, 0, 0, 155, 63, 62, 223, 24, 59, 0, 0, 154, 63, 217, 231, 144, 59, 0, 0, 153, 63, 219, 34, 215, 59, 0, 0, 153, 63, 139, 210, 120, 58, 0, 0, 152, 63, 19, 144, 81, 59, 0, 0, 151, 63, 237, 37, 180, 59, 0, 0, 151, 63, 46, 1, 23, 56, 0, 0, 150, 63, 216, 180, 31, 59, 0, 0, 149, 63, 104, 37, 160, 59, 0, 0, 148, 63, 79, 9, 242, 59, 0, 0, 148, 63, 41, 1, 11, 59, 0, 0, 147, 63, 196, 133, 154, 59, 0, 0, 146, 63, 132, 19, 241, 59, 0, 0, 146, 63, 37, 73, 18, 59, 0, 0, 145, 63, 197, 179, 162, 59, 0, 0, 144, 63, 9, 188, 253, 59, 0, 0, 144, 63, 198, 112, 52, 59, 0, 0, 143, 63, 238, 35, 184, 59, 0, 0, 143, 63, 208, 206, 59, 58, 0, 0, 142, 63, 218, 106, 112, 59, 0, 0, 141, 63, 2, 82, 218, 59, 0, 0, 141, 63, 35, 44, 247, 58, 0, 0, 140, 63, 4, 156, 162, 59, 0, 0, 140, 63, 193, 8, 140, 57, 0, 0, 139, 63, 148, 104, 96, 59, 0, 0, 138, 63, 252, 242, 216, 59, 0, 0, 138, 63, 225, 240, 5, 59, 0, 0, 137, 63, 138, 64, 174, 59, 0, 0, 137, 63, 215, 57, 86, 58, 0, 0, 136, 63, 137, 136, 136, 59, 0, 0, 135, 63, 136, 128, 247, 59, 0, 0, 135, 63, 190, 86, 79, 59, 0, 0, 134, 63, 68, 5, 217, 59, 0, 0, 134, 63, 252, 20, 23, 59, 0, 0, 133, 63, 97, 55, 191, 59, 0, 0, 133, 63, 77, 33, 208, 58, 0, 0, 132, 63, 200, 249, 169, 59, 0, 0, 132, 63, 8, 33, 132, 58, 0, 0, 131, 63, 82, 48, 153, 59, 0, 0, 131, 63, 188, 116, 19, 58, 0, 0, 130, 63, 191, 191, 140, 59, 0, 0, 130, 63, 33, 8, 130, 57, 0, 0, 129, 63, 169, 141, 132, 59, 0, 0, 129, 63, 4, 2, 129, 56, 0, 0, 128, 63, 129, 128, 128, 59, 0, 0, 128, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 194, 0, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 11, 0, 11, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 64, 192, 127, 0, 140, 191, 0, 255, 2, 135, 255, 255, 0, 0, 0, 255, 128, 147, 16, 0, 16, 0, 1, 255, 1, 135, 255, 255, 0, 0, 2, 8, 2, 147, 0, 9, 0, 147, 1, 10, 1, 147, 0, 7, 132, 192, 4, 7, 66, 192, 26, 7, 70, 192, 28, 135, 1, 192, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 74, 0, 2, 2, 74, 1, 4, 0, 128, 4, 0, 2, 209, 1, 27, 0, 0, 8, 0, 2, 209, 0, 25, 0, 0, 0, 4, 4, 74, 8, 4, 128, 135, 3, 4, 8, 125, 0, 106, 128, 135, 0, 36, 128, 190, 188, 0, 136, 191, 6, 7, 65, 192, 14, 7, 66, 192, 16, 7, 4, 192, 18, 7, 69, 192, 32, 135, 4, 192, 34, 7, 134, 192, 159, 0, 6, 48, 159, 2, 8, 48, 127, 0, 140, 191, 5, 0, 212, 210, 1, 25, 0, 0, 4, 0, 210, 210, 4, 25, 0, 0, 4, 11, 8, 74, 5, 0, 210, 210, 1, 27, 0, 0, 5, 9, 8, 74, 5, 0, 210, 210, 1, 25, 0, 0, 5, 106, 74, 210, 5, 1, 2, 0, 4, 7, 6, 80, 4, 0, 14, 74, 5, 2, 16, 74, 8, 4, 18, 74, 159, 4, 12, 48, 0, 3, 200, 192, 128, 2, 20, 126, 127, 0, 140, 191, 0, 95, 0, 240, 7, 7, 4, 0, 0, 0, 212, 210, 2, 29, 0, 0, 1, 0, 210, 210, 6, 29, 0, 0, 1, 1, 0, 74, 1, 0, 210, 210, 2, 31, 0, 0, 1, 1, 0, 74, 1, 0, 210, 210, 2, 29, 0, 0, 1, 106, 74, 210, 1, 11, 2, 0, 0, 7, 0, 80, 2, 0, 212, 210, 1, 19, 0, 0, 0, 0, 210, 210, 0, 19, 0, 0, 0, 5, 0, 74, 1, 0, 210, 210, 1, 19, 0, 0, 3, 106, 74, 210, 1, 21, 0, 0, 11, 2, 4, 126, 0, 5, 8, 80, 30, 7, 65, 192, 8, 7, 66, 192, 127, 0, 140, 191, 2, 132, 0, 191, 83, 0, 133, 191, 10, 7, 68, 192, 2, 130, 0, 191, 41, 0, 132, 191, 3, 132, 0, 191, 29, 0, 133, 191, 3, 130, 0, 191, 12, 0, 132, 191, 0, 0, 194, 210, 3, 5, 1, 0, 112, 15, 140, 191, 144, 16, 4, 52, 0, 106, 74, 210, 4, 0, 2, 0, 5, 2, 6, 126, 3, 3, 2, 80, 2, 15, 4, 56, 0, 0, 112, 220, 0, 2, 0, 0, 109, 0, 130, 191, 3, 129, 0, 191, 107, 0, 132, 191, 0, 0, 194, 210, 3, 3, 1, 0, 112, 15, 140, 191, 136, 16, 4, 52, 127, 0, 140, 191, 0, 106, 74, 210, 8, 0, 2, 0, 9, 2, 6, 126, 3, 3, 2, 80, 2, 15, 4, 56, 0, 0, 104, 220, 0, 2, 0, 0, 94, 0, 130, 191, 0, 0, 194, 210, 3, 5, 1, 0, 0, 106, 74, 210, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 80, 112, 15, 140, 191, 0, 0, 116, 220, 0, 7, 0, 0, 84, 0, 130, 191, 2, 129, 0, 191, 82, 0, 132, 191, 3, 132, 0, 191, 25, 0, 133, 191, 3, 130, 0, 191, 11, 0, 132, 191, 0, 0, 194, 210, 3, 3, 1, 0, 127, 0, 140, 191, 0, 106, 74, 210, 8, 0, 2, 0, 9, 2, 4, 126, 2, 3, 2, 80, 112, 15, 140, 191, 0, 0, 104, 220, 0, 7, 0, 0, 67, 0, 130, 191, 3, 129, 0, 191, 65, 0, 132, 191, 12, 7, 65, 192, 127, 0, 140, 191, 0, 106, 74, 210, 2, 6, 2, 0, 3, 2, 4, 126, 2, 9, 2, 80, 112, 15, 140, 191, 0, 0, 96, 220, 0, 7, 0, 0, 55, 0, 130, 191, 0, 0, 194, 210, 3, 5, 1, 0, 0, 106, 74, 210, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 80, 112, 15, 140, 191, 0, 0, 112, 220, 0, 7, 0, 0, 45, 0, 130, 191, 3, 132, 0, 191, 34, 0, 133, 191, 3, 130, 0, 191, 14, 0, 132, 191, 112, 15, 140, 191, 144, 16, 0, 52, 0, 15, 10, 56, 1, 0, 194, 210, 3, 5, 1, 0, 1, 106, 74, 210, 4, 2, 2, 0, 5, 2, 6, 126, 3, 5, 4, 80, 144, 20, 6, 52, 3, 19, 12, 56, 0, 0, 116, 220, 1, 5, 0, 0, 27, 0, 130, 191, 3, 129, 0, 191, 25, 0, 132, 191, 112, 15, 140, 191, 136, 16, 0, 52, 1, 0, 194, 210, 3, 5, 1, 0, 0, 15, 0, 56, 144, 18, 6, 52, 0, 7, 0, 56, 152, 20, 6, 52, 1, 106, 74, 210, 4, 2, 2, 0, 5, 2, 8, 126, 4, 5, 4, 80, 0, 7, 0, 56, 0, 0, 112, 220, 1, 0, 0, 0, 9, 0, 130, 191, 0, 0, 194, 210, 3, 5, 1, 0, 0, 106, 74, 210, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 80, 112, 15, 140, 191, 0, 0, 120, 220, 0, 7, 0, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 132, 0, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 0, 19, 0, 19, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 64, 192, 127, 0, 140, 191, 0, 255, 2, 135, 255, 255, 0, 0, 0, 255, 128, 147, 16, 0, 16, 0, 1, 255, 1, 135, 255, 255, 0, 0, 2, 8, 2, 147, 0, 9, 0, 147, 1, 10, 1, 147, 0, 7, 132, 192, 4, 7, 66, 192, 22, 7, 70, 192, 24, 135, 1, 192, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 74, 0, 2, 2, 74, 1, 4, 0, 128, 4, 0, 2, 209, 1, 27, 0, 0, 8, 0, 2, 209, 0, 25, 0, 0, 0, 4, 4, 74, 8, 4, 128, 135, 3, 4, 8, 125, 0, 106, 128, 135, 0, 36, 128, 190, 193, 0, 136, 191, 10, 7, 65, 192, 18, 7, 66, 192, 20, 7, 4, 192, 28, 135, 4, 192, 30, 7, 134, 192, 159, 0, 6, 48, 159, 2, 8, 48, 127, 0, 140, 191, 5, 0, 212, 210, 1, 25, 0, 0, 4, 0, 210, 210, 4, 25, 0, 0, 4, 11, 8, 74, 5, 0, 210, 210, 1, 27, 0, 0, 5, 9, 8, 74, 5, 0, 210, 210, 1, 25, 0, 0, 5, 106, 74, 210, 5, 1, 2, 0, 4, 7, 6, 80, 159, 4, 8, 48, 6, 0, 212, 210, 2, 29, 0, 0, 4, 0, 210, 210, 4, 29, 0, 0, 4, 13, 8, 74, 6, 0, 210, 210, 2, 31, 0, 0, 6, 9, 8, 74, 6, 0, 210, 210, 2, 29, 0, 0, 5, 106, 74, 210, 6, 11, 2, 0, 4, 7, 6, 80, 4, 0, 212, 210, 5, 19, 0, 0, 3, 0, 210, 210, 3, 19, 0, 0, 3, 9, 6, 74, 4, 0, 210, 210, 5, 19, 0, 0, 6, 106, 74, 210, 4, 5, 0, 0, 3, 2, 10, 126, 3, 11, 14, 80, 4, 0, 30, 74, 5, 2, 32, 74, 8, 4, 34, 74, 26, 7, 65, 192, 6, 7, 68, 192, 127, 0, 140, 191, 2, 132, 0, 191, 77, 0, 133, 191, 2, 130, 0, 191, 39, 0, 132, 191, 3, 130, 0, 191, 13, 0, 132, 191, 3, 0, 194, 210, 6, 5, 1, 0, 3, 106, 74, 210, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 80, 0, 0, 48, 220, 3, 0, 0, 3, 112, 0, 140, 191, 144, 6, 12, 44, 5, 0, 144, 210, 3, 1, 65, 2, 57, 0, 130, 191, 3, 129, 0, 191, 13, 0, 132, 191, 3, 0, 194, 210, 6, 3, 1, 0, 3, 106, 74, 210, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 80, 0, 0, 40, 220, 3, 0, 0, 3, 112, 0, 140, 191, 136, 6, 12, 44, 5, 0, 144, 210, 3, 1, 33, 2, 42, 0, 130, 191, 3, 0, 194, 210, 6, 5, 1, 0, 3, 106, 74, 210, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 80, 0, 0, 52, 220, 3, 0, 0, 5, 33, 0, 130, 191, 2, 129, 0, 191, 29, 0, 132, 191, 3, 130, 0, 191, 9, 0, 132, 191, 3, 0, 194, 210, 6, 3, 1, 0, 3, 106, 74, 210, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 80, 0, 0, 40, 220, 3, 0, 0, 5, 19, 0, 130, 191, 3, 129, 0, 191, 7, 0, 132, 191, 3, 106, 74, 210, 8, 12, 2, 0, 9, 2, 10, 126, 5, 15, 8, 80, 0, 0, 32, 220, 3, 0, 0, 5, 10, 0, 130, 191, 3, 0, 194, 210, 6, 5, 1, 0, 3, 106, 74, 210, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 80, 0, 0, 48, 220, 3, 0, 0, 5, 1, 0, 130, 191, 2, 2, 10, 126, 3, 2, 12, 126, 5, 2, 16, 126, 4, 2, 14, 126, 55, 0, 130, 191, 3, 129, 0, 191, 17, 0, 132, 191, 3, 0, 194, 210, 6, 5, 1, 0, 3, 106, 74, 210, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 80, 0, 0, 48, 220, 3, 0, 0, 3, 112, 0, 140, 191, 152, 6, 16, 44, 7, 0, 144, 210, 3, 33, 33, 2, 6, 0, 144, 210, 3, 17, 33, 2, 5, 0, 144, 210, 3, 1, 33, 2, 36, 0, 130, 191, 3, 0, 194, 210, 6, 5, 1, 0, 3, 106, 74, 210, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 80, 0, 0, 48, 220, 3, 0, 0, 5, 3, 130, 0, 191, 14, 0, 132, 191, 3, 106, 74, 210, 3, 9, 1, 0, 4, 106, 80, 210, 4, 1, 169, 1, 0, 0, 48, 220, 3, 0, 0, 3, 112, 0, 140, 191, 144, 6, 16, 44, 7, 0, 144, 210, 3, 1, 65, 2, 144, 10, 12, 44, 5, 0, 144, 210, 5, 1, 65, 2, 12, 0, 130, 191, 6, 106, 74, 210, 3, 25, 1, 0, 7, 106, 80, 210, 4, 1, 169, 1, 0, 0, 48, 220, 6, 0, 0, 8, 3, 106, 74, 210, 3, 9, 1, 0, 4, 106, 80, 210, 4, 1, 169, 1, 0, 0, 52, 220, 3, 0, 0, 6, 8, 7, 65, 192, 127, 0, 140, 191, 0, 3, 194, 192, 128, 2, 36, 126, 112, 0, 140, 191, 0, 95, 32, 240, 15, 5, 1, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 0, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 64, 192, 127, 0, 140, 191, 0, 255, 2, 135, 255, 255, 0, 0, 0, 255, 128, 147, 16, 0, 16, 0, 1, 255, 1, 135, 255, 255, 0, 0, 2, 8, 2, 147, 0, 9, 0, 147, 1, 10, 1, 147, 0, 7, 132, 192, 4, 7, 66, 192, 18, 7, 70, 192, 20, 135, 1, 192, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 74, 0, 2, 2, 74, 1, 4, 0, 128, 4, 0, 12, 209, 1, 27, 0, 0, 8, 0, 12, 209, 0, 25, 0, 0, 0, 4, 4, 74, 8, 4, 128, 136, 3, 4, 6, 125, 0, 106, 234, 136, 126, 4, 128, 190, 0, 106, 254, 138, 22, 0, 136, 191, 6, 7, 132, 192, 10, 7, 65, 192, 12, 7, 2, 192, 127, 0, 140, 191, 0, 9, 198, 192, 2, 0, 6, 74, 3, 2, 8, 74, 4, 4, 10, 74, 128, 2, 12, 126, 127, 0, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 14, 7, 130, 192, 0, 11, 196, 192, 127, 0, 140, 191, 4, 0, 14, 74, 5, 2, 16, 74, 6, 4, 18, 74, 128, 2, 20, 126, 112, 15, 140, 191, 0, 95, 32, 240, 7, 3, 2, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 133, 0, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 21, 0, 21, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 64, 192, 127, 0, 140, 191, 0, 255, 2, 135, 255, 255, 0, 0, 0, 255, 128, 147, 16, 0, 16, 0, 1, 255, 1, 135, 255, 255, 0, 0, 2, 8, 2, 147, 0, 9, 0, 147, 1, 10, 1, 147, 0, 7, 132, 192, 4, 7, 66, 192, 18, 7, 70, 192, 20, 135, 1, 192, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 74, 0, 2, 2, 74, 1, 4, 0, 128, 4, 0, 12, 209, 1, 27, 0, 0, 8, 0, 12, 209, 0, 25, 0, 0, 0, 4, 4, 74, 8, 4, 128, 136, 3, 4, 6, 125, 0, 106, 234, 136, 126, 4, 128, 190, 0, 106, 254, 138, 212, 2, 136, 191, 6, 7, 65, 192, 10, 7, 66, 192, 12, 7, 4, 192, 127, 0, 140, 191, 0, 3, 198, 192, 4, 0, 6, 74, 5, 2, 8, 74, 8, 4, 10, 74, 128, 2, 12, 126, 127, 0, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 112, 15, 140, 191, 106, 0, 16, 209, 3, 7, 1, 0, 3, 0, 0, 210, 3, 1, 169, 1, 242, 6, 28, 124, 106, 36, 130, 190, 223, 0, 136, 191, 128, 6, 8, 124, 106, 36, 132, 190, 128, 2, 6, 126, 4, 126, 254, 138, 217, 0, 136, 191, 255, 3, 136, 190, 28, 46, 77, 59, 8, 6, 8, 124, 106, 36, 136, 190, 255, 6, 6, 16, 82, 184, 78, 65, 8, 126, 254, 138, 242, 6, 6, 16, 208, 0, 136, 191, 255, 6, 14, 54, 255, 255, 255, 127, 242, 14, 16, 8, 255, 3, 138, 190, 0, 0, 128, 61, 106, 1, 22, 208, 8, 21, 0, 0, 126, 4, 138, 190, 10, 106, 254, 138, 7, 129, 16, 126, 69, 0, 136, 191, 129, 16, 18, 52, 255, 16, 16, 74, 0, 0, 128, 0, 255, 18, 18, 74, 0, 0, 0, 1, 255, 16, 20, 54, 0, 0, 127, 0, 255, 18, 18, 54, 0, 0, 1, 0, 9, 21, 18, 74, 144, 18, 20, 44, 128, 2, 22, 126, 10, 0, 194, 210, 10, 7, 1, 0, 255, 3, 141, 190, 85, 85, 85, 85, 255, 3, 140, 190, 85, 85, 85, 85, 12, 106, 74, 210, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 80, 0, 0, 52, 220, 12, 0, 0, 12, 255, 3, 141, 190, 85, 85, 85, 85, 255, 3, 140, 190, 85, 85, 85, 85, 10, 106, 74, 210, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 80, 0, 0, 52, 220, 10, 0, 0, 10, 255, 16, 16, 54, 255, 255, 127, 0, 240, 18, 18, 56, 240, 16, 16, 56, 9, 17, 16, 8, 113, 1, 140, 191, 13, 17, 18, 16, 12, 17, 18, 62, 255, 2, 28, 126, 171, 170, 170, 62, 255, 3, 140, 190, 0, 0, 128, 62, 7, 127, 30, 126, 12, 18, 28, 62, 12, 0, 130, 210, 12, 17, 38, 132, 193, 30, 30, 74, 14, 0, 130, 210, 9, 29, 194, 3, 9, 19, 32, 16, 13, 17, 24, 62, 15, 11, 16, 126, 14, 33, 24, 62, 255, 3, 140, 190, 244, 253, 5, 56, 12, 0, 130, 210, 8, 25, 48, 132, 112, 0, 140, 191, 12, 23, 24, 6, 8, 21, 16, 64, 0, 112, 49, 63, 12, 19, 30, 8, 255, 18, 28, 58, 0, 0, 0, 128, 8, 31, 26, 6, 10, 126, 254, 138, 8, 17, 18, 16, 21, 0, 136, 191, 8, 19, 20, 16, 255, 2, 22, 126, 171, 170, 42, 62, 255, 3, 140, 190, 37, 73, 18, 62, 12, 16, 22, 62, 8, 23, 22, 66, 205, 204, 76, 62, 8, 23, 22, 66, 0, 0, 128, 62, 8, 23, 22, 66, 171, 170, 170, 62, 10, 23, 20, 16, 241, 18, 28, 16, 15, 0, 130, 210, 9, 227, 41, 132, 15, 17, 26, 8, 255, 20, 24, 58, 0, 0, 0, 128, 255, 16, 16, 58, 0, 0, 0, 128, 10, 4, 254, 190, 8, 27, 20, 8, 15, 29, 18, 8, 15, 21, 20, 6, 12, 19, 18, 8, 255, 26, 22, 54, 0, 240, 255, 255, 9, 21, 18, 6, 13, 23, 16, 8, 9, 17, 16, 6, 255, 16, 18, 16, 0, 160, 42, 56, 11, 19, 18, 64, 0, 160, 42, 56, 8, 19, 16, 64, 0, 80, 213, 62, 11, 17, 18, 64, 0, 80, 213, 62, 255, 18, 20, 16, 59, 170, 184, 66, 10, 17, 20, 126, 191, 20, 24, 54, 131, 24, 24, 52, 255, 3, 139, 190, 85, 85, 85, 85, 255, 3, 138, 190, 85, 85, 85, 85, 12, 106, 74, 210, 10, 24, 2, 0, 11, 2, 26, 126, 13, 106, 80, 210, 13, 1, 169, 1, 0, 0, 52, 220, 12, 0, 0, 12, 255, 3, 138, 190, 0, 80, 213, 62, 10, 11, 28, 126, 11, 0, 130, 210, 10, 22, 38, 132, 14, 19, 30, 64, 0, 0, 49, 188, 8, 23, 16, 6, 14, 31, 22, 64, 239, 47, 228, 183, 8, 23, 22, 6, 255, 2, 28, 126, 171, 170, 42, 62, 255, 3, 138, 190, 171, 170, 42, 61, 10, 22, 28, 62, 14, 0, 130, 210, 14, 23, 194, 3, 11, 23, 30, 16, 14, 31, 22, 62, 255, 3, 138, 190, 8, 227, 130, 180, 255, 3, 139, 190, 24, 114, 177, 66, 112, 0, 140, 191, 13, 23, 26, 62, 12, 0, 8, 208, 8, 21, 0, 0, 11, 18, 4, 124, 12, 23, 26, 62, 106, 12, 140, 135, 11, 18, 2, 124, 134, 20, 16, 48, 12, 27, 20, 6, 106, 12, 234, 136, 10, 17, 16, 86, 255, 2, 20, 126, 0, 0, 128, 127, 255, 3, 138, 190, 208, 142, 206, 194, 8, 21, 16, 0, 10, 18, 22, 124, 128, 16, 16, 0, 3, 15, 10, 125, 242, 16, 16, 16, 255, 2, 18, 126, 0, 0, 192, 127, 255, 3, 138, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 10, 125, 10, 0, 4, 209, 3, 21, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 210, 8, 19, 42, 0, 3, 19, 4, 125, 8, 19, 16, 0, 7, 19, 136, 125, 8, 7, 14, 0, 242, 6, 10, 125, 242, 14, 6, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 3, 138, 190, 61, 10, 135, 63, 3, 0, 130, 210, 3, 21, 28, 4, 4, 4, 254, 190, 2, 126, 254, 138, 242, 2, 6, 126, 2, 4, 254, 190, 106, 0, 16, 209, 4, 7, 1, 0, 4, 0, 0, 210, 4, 1, 169, 1, 242, 8, 28, 124, 2, 106, 254, 135, 223, 0, 136, 191, 128, 8, 8, 124, 106, 36, 132, 190, 128, 2, 8, 126, 4, 126, 254, 138, 217, 0, 136, 191, 255, 3, 136, 190, 28, 46, 77, 59, 8, 8, 8, 124, 106, 36, 136, 190, 255, 8, 8, 16, 82, 184, 78, 65, 8, 126, 254, 138, 242, 8, 8, 16, 208, 0, 136, 191, 255, 8, 14, 54, 255, 255, 255, 127, 242, 14, 16, 8, 255, 3, 138, 190, 0, 0, 128, 61, 106, 1, 22, 208, 8, 21, 0, 0, 126, 4, 138, 190, 10, 106, 254, 138, 7, 129, 16, 126, 69, 0, 136, 191, 129, 16, 18, 52, 255, 16, 16, 74, 0, 0, 128, 0, 255, 18, 18, 74, 0, 0, 0, 1, 255, 16, 20, 54, 0, 0, 127, 0, 255, 18, 18, 54, 0, 0, 1, 0, 9, 21, 18, 74, 144, 18, 20, 44, 128, 2, 22, 126, 10, 0, 194, 210, 10, 7, 1, 0, 255, 3, 141, 190, 85, 85, 85, 85, 255, 3, 140, 190, 85, 85, 85, 85, 12, 106, 74, 210, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 80, 0, 0, 52, 220, 12, 0, 0, 12, 255, 3, 141, 190, 85, 85, 85, 85, 255, 3, 140, 190, 85, 85, 85, 85, 10, 106, 74, 210, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 80, 0, 0, 52, 220, 10, 0, 0, 10, 255, 16, 16, 54, 255, 255, 127, 0, 240, 18, 18, 56, 240, 16, 16, 56, 9, 17, 16, 8, 113, 1, 140, 191, 13, 17, 18, 16, 12, 17, 18, 62, 255, 2, 28, 126, 171, 170, 170, 62, 255, 3, 140, 190, 0, 0, 128, 62, 7, 127, 30, 126, 12, 18, 28, 62, 12, 0, 130, 210, 12, 17, 38, 132, 193, 30, 30, 74, 14, 0, 130, 210, 9, 29, 194, 3, 9, 19, 32, 16, 13, 17, 24, 62, 15, 11, 16, 126, 14, 33, 24, 62, 255, 3, 140, 190, 244, 253, 5, 56, 12, 0, 130, 210, 8, 25, 48, 132, 112, 0, 140, 191, 12, 23, 24, 6, 8, 21, 16, 64, 0, 112, 49, 63, 12, 19, 30, 8, 255, 18, 28, 58, 0, 0, 0, 128, 8, 31, 26, 6, 10, 126, 254, 138, 8, 17, 18, 16, 21, 0, 136, 191, 8, 19, 20, 16, 255, 2, 22, 126, 171, 170, 42, 62, 255, 3, 140, 190, 37, 73, 18, 62, 12, 16, 22, 62, 8, 23, 22, 66, 205, 204, 76, 62, 8, 23, 22, 66, 0, 0, 128, 62, 8, 23, 22, 66, 171, 170, 170, 62, 10, 23, 20, 16, 241, 18, 28, 16, 15, 0, 130, 210, 9, 227, 41, 132, 15, 17, 26, 8, 255, 20, 24, 58, 0, 0, 0, 128, 255, 16, 16, 58, 0, 0, 0, 128, 10, 4, 254, 190, 8, 27, 20, 8, 15, 29, 18, 8, 15, 21, 20, 6, 12, 19, 18, 8, 255, 26, 22, 54, 0, 240, 255, 255, 9, 21, 18, 6, 13, 23, 16, 8, 9, 17, 16, 6, 255, 16, 18, 16, 0, 160, 42, 56, 11, 19, 18, 64, 0, 160, 42, 56, 8, 19, 16, 64, 0, 80, 213, 62, 11, 17, 18, 64, 0, 80, 213, 62, 255, 18, 20, 16, 59, 170, 184, 66, 10, 17, 20, 126, 191, 20, 24, 54, 131, 24, 24, 52, 255, 3, 139, 190, 85, 85, 85, 85, 255, 3, 138, 190, 85, 85, 85, 85, 12, 106, 74, 210, 10, 24, 2, 0, 11, 2, 26, 126, 13, 106, 80, 210, 13, 1, 169, 1, 0, 0, 52, 220, 12, 0, 0, 12, 255, 3, 138, 190, 0, 80, 213, 62, 10, 11, 28, 126, 11, 0, 130, 210, 10, 22, 38, 132, 14, 19, 30, 64, 0, 0, 49, 188, 8, 23, 16, 6, 14, 31, 22, 64, 239, 47, 228, 183, 8, 23, 22, 6, 255, 2, 28, 126, 171, 170, 42, 62, 255, 3, 138, 190, 171, 170, 42, 61, 10, 22, 28, 62, 14, 0, 130, 210, 14, 23, 194, 3, 11, 23, 30, 16, 14, 31, 22, 62, 255, 3, 138, 190, 8, 227, 130, 180, 255, 3, 139, 190, 24, 114, 177, 66, 112, 0, 140, 191, 13, 23, 26, 62, 12, 0, 8, 208, 8, 21, 0, 0, 11, 18, 4, 124, 12, 23, 26, 62, 106, 12, 140, 135, 11, 18, 2, 124, 134, 20, 16, 48, 12, 27, 20, 6, 106, 12, 234, 136, 10, 17, 16, 86, 255, 2, 20, 126, 0, 0, 128, 127, 255, 3, 138, 190, 208, 142, 206, 194, 8, 21, 16, 0, 10, 18, 22, 124, 128, 16, 16, 0, 4, 15, 10, 125, 242, 16, 16, 16, 255, 2, 18, 126, 0, 0, 192, 127, 255, 3, 138, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 10, 125, 10, 0, 4, 209, 4, 21, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 210, 8, 19, 42, 0, 4, 19, 4, 125, 8, 19, 16, 0, 7, 19, 136, 125, 8, 9, 14, 0, 242, 8, 10, 125, 242, 14, 8, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 3, 138, 190, 61, 10, 135, 63, 4, 0, 130, 210, 4, 21, 28, 4, 4, 4, 254, 190, 2, 126, 254, 138, 242, 2, 8, 126, 2, 4, 254, 190, 14, 7, 132, 192, 106, 0, 16, 209, 5, 7, 1, 0, 5, 0, 0, 210, 5, 1, 169, 1, 127, 0, 140, 191, 8, 0, 34, 74, 9, 2, 36, 74, 10, 4, 38, 74, 242, 10, 28, 124, 106, 36, 130, 190, 223, 0, 136, 191, 128, 10, 8, 124, 106, 36, 132, 190, 128, 2, 10, 126, 4, 126, 254, 138, 217, 0, 136, 191, 255, 3, 136, 190, 28, 46, 77, 59, 8, 10, 8, 124, 106, 36, 136, 190, 255, 10, 10, 16, 82, 184, 78, 65, 8, 126, 254, 138, 242, 10, 10, 16, 208, 0, 136, 191, 255, 10, 14, 54, 255, 255, 255, 127, 242, 14, 16, 8, 255, 3, 138, 190, 0, 0, 128, 61, 106, 1, 22, 208, 8, 21, 0, 0, 126, 4, 138, 190, 10, 106, 254, 138, 7, 129, 16, 126, 69, 0, 136, 191, 129, 16, 18, 52, 255, 16, 16, 74, 0, 0, 128, 0, 255, 18, 18, 74, 0, 0, 0, 1, 255, 16, 20, 54, 0, 0, 127, 0, 255, 18, 18, 54, 0, 0, 1, 0, 9, 21, 18, 74, 144, 18, 20, 44, 128, 2, 22, 126, 10, 0, 194, 210, 10, 7, 1, 0, 255, 3, 141, 190, 85, 85, 85, 85, 255, 3, 140, 190, 85, 85, 85, 85, 12, 106, 74, 210, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 80, 0, 0, 52, 220, 12, 0, 0, 12, 255, 3, 141, 190, 85, 85, 85, 85, 255, 3, 140, 190, 85, 85, 85, 85, 10, 106, 74, 210, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 80, 0, 0, 52, 220, 10, 0, 0, 10, 255, 16, 16, 54, 255, 255, 127, 0, 240, 18, 18, 56, 240, 16, 16, 56, 9, 17, 16, 8, 113, 1, 140, 191, 13, 17, 18, 16, 12, 17, 18, 62, 255, 2, 28, 126, 171, 170, 170, 62, 255, 3, 140, 190, 0, 0, 128, 62, 7, 127, 30, 126, 12, 18, 28, 62, 12, 0, 130, 210, 12, 17, 38, 132, 193, 30, 30, 74, 14, 0, 130, 210, 9, 29, 194, 3, 9, 19, 32, 16, 13, 17, 24, 62, 15, 11, 16, 126, 14, 33, 24, 62, 255, 3, 140, 190, 244, 253, 5, 56, 12, 0, 130, 210, 8, 25, 48, 132, 112, 0, 140, 191, 12, 23, 24, 6, 8, 21, 16, 64, 0, 112, 49, 63, 12, 19, 26, 8, 255, 18, 28, 58, 0, 0, 0, 128, 8, 27, 30, 6, 10, 126, 254, 138, 8, 17, 18, 16, 21, 0, 136, 191, 8, 19, 20, 16, 255, 2, 22, 126, 171, 170, 42, 62, 255, 3, 140, 190, 37, 73, 18, 62, 12, 16, 22, 62, 8, 23, 22, 66, 205, 204, 76, 62, 8, 23, 22, 66, 0, 0, 128, 62, 8, 23, 22, 66, 171, 170, 170, 62, 10, 23, 20, 16, 241, 18, 28, 16, 13, 0, 130, 210, 9, 227, 41, 132, 13, 17, 30, 8, 255, 20, 24, 58, 0, 0, 0, 128, 255, 16, 16, 58, 0, 0, 0, 128, 10, 4, 254, 190, 8, 31, 16, 8, 13, 29, 20, 8, 13, 17, 16, 6, 12, 21, 18, 8, 255, 30, 20, 54, 0, 240, 255, 255, 9, 17, 16, 6, 15, 21, 18, 8, 8, 19, 16, 6, 255, 16, 18, 16, 0, 160, 42, 56, 10, 19, 18, 64, 0, 160, 42, 56, 8, 19, 16, 64, 0, 80, 213, 62, 10, 17, 18, 64, 0, 80, 213, 62, 255, 18, 22, 16, 59, 170, 184, 66, 11, 17, 22, 126, 191, 22, 24, 54, 131, 24, 24, 52, 255, 3, 139, 190, 85, 85, 85, 85, 255, 3, 138, 190, 85, 85, 85, 85, 12, 106, 74, 210, 10, 24, 2, 0, 11, 2, 26, 126, 13, 106, 80, 210, 13, 1, 169, 1, 0, 0, 52, 220, 12, 0, 0, 12, 255, 3, 138, 190, 0, 80, 213, 62, 11, 11, 28, 126, 10, 0, 130, 210, 10, 20, 38, 132, 14, 19, 30, 64, 0, 0, 49, 188, 8, 21, 16, 6, 14, 31, 20, 64, 239, 47, 228, 183, 8, 21, 20, 6, 255, 2, 28, 126, 171, 170, 42, 62, 255, 3, 138, 190, 171, 170, 42, 61, 10, 20, 28, 62, 14, 0, 130, 210, 14, 21, 194, 3, 10, 21, 30, 16, 14, 31, 20, 62, 255, 3, 138, 190, 8, 227, 130, 180, 255, 3, 139, 190, 24, 114, 177, 66, 112, 0, 140, 191, 13, 21, 26, 62, 12, 0, 8, 208, 8, 21, 0, 0, 11, 18, 4, 124, 12, 21, 26, 62, 106, 12, 140, 135, 11, 18, 2, 124, 134, 22, 16, 48, 12, 27, 20, 6, 106, 12, 234, 136, 10, 17, 16, 86, 255, 2, 20, 126, 0, 0, 128, 127, 255, 3, 138, 190, 208, 142, 206, 194, 8, 21, 16, 0, 10, 18, 22, 124, 128, 16, 16, 0, 5, 15, 10, 125, 242, 16, 16, 16, 255, 2, 18, 126, 0, 0, 192, 127, 255, 3, 138, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 10, 125, 10, 0, 4, 209, 5, 21, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 210, 8, 19, 42, 0, 5, 19, 4, 125, 8, 19, 16, 0, 7, 19, 136, 125, 8, 11, 14, 0, 242, 10, 10, 125, 242, 14, 10, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 3, 138, 190, 61, 10, 135, 63, 5, 0, 130, 210, 5, 21, 28, 4, 4, 4, 254, 190, 2, 126, 254, 138, 242, 2, 10, 126, 2, 4, 254, 190, 8, 7, 65, 192, 127, 0, 140, 191, 0, 3, 194, 192, 128, 2, 40, 126, 127, 0, 140, 191, 0, 95, 32, 240, 17, 3, 1, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 0, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 64, 192, 127, 0, 140, 191, 0, 255, 2, 135, 255, 255, 0, 0, 0, 255, 128, 147, 16, 0, 16, 0, 1, 255, 1, 135, 255, 255, 0, 0, 2, 8, 2, 147, 0, 9, 0, 147, 1, 10, 1, 147, 0, 7, 132, 192, 4, 7, 66, 192, 18, 7, 70, 192, 20, 135, 1, 192, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 74, 0, 2, 2, 74, 1, 4, 0, 128, 4, 0, 12, 209, 1, 27, 0, 0, 8, 0, 12, 209, 0, 25, 0, 0, 0, 4, 4, 74, 8, 4, 128, 136, 3, 4, 6, 125, 0, 106, 234, 136, 126, 4, 128, 190, 0, 106, 254, 138, 22, 0, 136, 191, 6, 7, 132, 192, 10, 7, 65, 192, 12, 7, 2, 192, 127, 0, 140, 191, 0, 9, 198, 192, 2, 0, 6, 74, 3, 2, 8, 74, 4, 4, 10, 74, 128, 2, 12, 126, 127, 0, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 14, 7, 130, 192, 0, 11, 196, 192, 127, 0, 140, 191, 4, 0, 14, 74, 5, 2, 16, 74, 6, 4, 18, 74, 128, 2, 20, 126, 112, 15, 140, 191, 0, 95, 32, 240, 7, 3, 2, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 129, 0, 172, 0, 144, 0, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 0, 5, 0, 5, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 0, 192, 127, 0, 140, 191, 0, 255, 0, 135, 255, 255, 0, 0, 0, 8, 0, 147, 0, 7, 65, 192, 18, 135, 0, 192, 127, 0, 140, 191, 0, 2, 0, 128, 0, 0, 0, 74, 1, 0, 8, 125, 106, 36, 128, 190, 15, 0, 136, 191, 6, 7, 132, 192, 10, 7, 1, 192, 127, 0, 140, 191, 0, 9, 134, 192, 2, 0, 2, 74, 127, 0, 140, 191, 0, 32, 12, 224, 1, 1, 3, 128, 14, 7, 1, 192, 0, 11, 130, 192, 127, 0, 140, 191, 2, 0, 0, 74, 112, 15, 140, 191, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 0, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 64, 192, 127, 0, 140, 191, 0, 255, 2, 135, 255, 255, 0, 0, 0, 255, 128, 147, 16, 0, 16, 0, 1, 255, 1, 135, 255, 255, 0, 0, 2, 8, 2, 147, 0, 9, 0, 147, 1, 10, 1, 147, 0, 7, 66, 192, 18, 135, 1, 192, 127, 0, 140, 191, 2, 4, 2, 128, 2, 0, 0, 74, 3, 0, 8, 125, 106, 36, 130, 190, 22, 0, 136, 191, 2, 7, 196, 192, 10, 7, 2, 192, 127, 0, 140, 191, 0, 13, 136, 192, 4, 0, 6, 74, 127, 0, 140, 191, 0, 32, 12, 224, 3, 3, 4, 128, 14, 7, 130, 192, 0, 15, 198, 192, 1, 10, 1, 128, 0, 8, 0, 128, 127, 0, 140, 191, 1, 6, 1, 128, 0, 5, 0, 128, 4, 0, 14, 74, 1, 4, 18, 74, 0, 2, 16, 74, 128, 2, 20, 126, 112, 15, 140, 191, 0, 95, 32, 240, 7, 3, 3, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 193, 0, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 7, 0, 7, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 64, 192, 127, 0, 140, 191, 0, 255, 2, 135, 255, 255, 0, 0, 0, 255, 128, 147, 16, 0, 16, 0, 1, 255, 1, 135, 255, 255, 0, 0, 2, 8, 2, 147, 0, 9, 0, 147, 1, 10, 1, 147, 0, 7, 66, 192, 18, 135, 1, 192, 127, 0, 140, 191, 2, 4, 2, 128, 2, 0, 0, 74, 3, 0, 8, 125, 106, 36, 130, 190, 23, 0, 136, 191, 2, 7, 196, 192, 10, 7, 66, 192, 127, 0, 140, 191, 12, 135, 4, 192, 0, 13, 200, 192, 1, 10, 1, 128, 0, 8, 0, 128, 127, 0, 140, 191, 1, 9, 1, 128, 0, 5, 0, 128, 4, 0, 6, 74, 1, 4, 10, 74, 0, 2, 8, 74, 128, 2, 12, 126, 0, 95, 0, 240, 3, 1, 4, 0, 14, 7, 0, 192, 0, 15, 130, 192, 127, 0, 140, 191, 0, 0, 0, 74, 112, 15, 140, 191, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 0, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 64, 192, 127, 0, 140, 191, 0, 255, 2, 135, 255, 255, 0, 0, 0, 255, 128, 147, 16, 0, 16, 0, 1, 255, 1, 135, 255, 255, 0, 0, 2, 8, 2, 147, 0, 9, 0, 147, 1, 10, 1, 147, 0, 7, 132, 192, 4, 7, 66, 192, 24, 7, 70, 192, 26, 135, 1, 192, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 74, 0, 2, 2, 74, 1, 4, 0, 128, 4, 0, 2, 209, 1, 27, 0, 0, 8, 0, 2, 209, 0, 25, 0, 0, 0, 4, 4, 74, 8, 4, 128, 135, 3, 4, 8, 125, 0, 106, 128, 135, 0, 36, 128, 190, 46, 0, 136, 191, 20, 7, 132, 192, 127, 0, 140, 191, 8, 0, 14, 74, 9, 2, 16, 74, 10, 4, 18, 74, 28, 7, 1, 192, 6, 7, 66, 192, 127, 0, 140, 191, 2, 130, 0, 191, 26, 0, 133, 191, 2, 129, 0, 191, 11, 0, 132, 191, 12, 7, 132, 192, 0, 5, 198, 192, 127, 0, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 23, 0, 130, 191, 2, 128, 0, 191, 21, 0, 132, 191, 8, 7, 132, 192, 0, 5, 198, 192, 127, 0, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 10, 0, 130, 191, 16, 7, 132, 192, 0, 5, 198, 192, 127, 0, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 0, 172, 0, 144, 0, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 0, 5, 0, 5, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 0, 192, 127, 0, 140, 191, 0, 255, 0, 135, 255, 255, 0, 0, 0, 8, 0, 147, 0, 7, 65, 192, 24, 135, 0, 192, 127, 0, 140, 191, 0, 2, 0, 128, 0, 0, 0, 74, 1, 0, 8, 125, 106, 36, 128, 190, 41, 0, 136, 191, 20, 7, 1, 192, 127, 0, 140, 191, 2, 0, 0, 74, 28, 7, 1, 192, 6, 7, 66, 192, 127, 0, 140, 191, 2, 130, 0, 191, 24, 0, 133, 191, 2, 129, 0, 191, 10, 0, 132, 191, 12, 7, 132, 192, 0, 5, 130, 192, 127, 0, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 21, 0, 130, 191, 2, 128, 0, 191, 19, 0, 132, 191, 8, 7, 132, 192, 0, 5, 130, 192, 127, 0, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 9, 0, 130, 191, 16, 7, 132, 192, 0, 5, 130, 192, 127, 0, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 2, 0, 0, 0, 0, 0, 0, 40, 0, 0, 0, 1, 0, 4, 0, 8, 2, 0, 0, 0, 0, 0, 0, 8, 4, 0, 0, 0, 0, 0, 0, 76, 0, 0, 0, 1, 0, 4, 0, 16, 6, 0, 0, 0, 0, 0, 0, 8, 4, 0, 0, 0, 0, 0, 0, 118, 0, 0, 0, 26, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 112, 4, 0, 0, 0, 0, 0, 0, 149, 0, 0, 0, 26, 0, 5, 0, 0, 5, 0, 0, 0, 0, 0, 0, 132, 4, 0, 0, 0, 0, 0, 0, 180, 0, 0, 0, 26, 0, 5, 0, 0, 10, 0, 0, 0, 0, 0, 0, 220, 1, 0, 0, 0, 0, 0, 0, 209, 0, 0, 0, 26, 0, 5, 0, 0, 12, 0, 0, 0, 0, 0, 0, 212, 12, 0, 0, 0, 0, 0, 0, 249, 0, 0, 0, 26, 0, 5, 0, 0, 25, 0, 0, 0, 0, 0, 0, 220, 1, 0, 0, 0, 0, 0, 0, 33, 1, 0, 0, 26, 0, 5, 0, 0, 27, 0, 0, 0, 0, 0, 0, 116, 1, 0, 0, 0, 0, 0, 0, 58, 1, 0, 0, 26, 0, 5, 0, 0, 29, 0, 0, 0, 0, 0, 0, 168, 1, 0, 0, 0, 0, 0, 0, 90, 1, 0, 0, 26, 0, 5, 0, 0, 31, 0, 0, 0, 0, 0, 0, 172, 1, 0, 0, 0, 0, 0, 0, 122, 1, 0, 0, 26, 0, 5, 0, 0, 33, 0, 0, 0, 0, 0, 0, 56, 2, 0, 0, 0, 0, 0, 0, 144, 1, 0, 0, 26, 0, 5, 0, 0, 36, 0, 0, 0, 0, 0, 0, 220, 1, 0, 0, 0, 0, 0, 0, 170, 1, 0, 0, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 207, 1, 0, 0, 3, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 112, 14, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 120, 14, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152, 14, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160, 14, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 16, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 16, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 18, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28, 18, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 18, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 18, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 172, 19, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 180, 19, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 204, 21, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 212, 21, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 244, 21, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 252, 21, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100, 23, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 108, 23, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 0, 0, 0, 0, 88, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 3, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 1, 0, 0, 0, 0, 0, 0, 229, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 240, 2, 0, 0, 0, 0, 0, 0, 200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 0, 0, 0, 1, 0, 0, 0, 3, 0, 160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 184, 3, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 1, 0, 0, 0, 7, 0, 192, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 220, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 66, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 224, 51, 0, 0, 0, 0, 0, 0, 128, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 53, 0, 0, 0, 0, 0, 0, 176, 1, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 5, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0}; } } ROCR-Runtime-rocm-5.0.0/src/image/blit_object_gfx8xx.cpp000066400000000000000000001623211420110115200230010ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include namespace rocr { namespace image { uint8_t blit_object_gfx8xx[] = {127, 69, 76, 70, 2, 1, 1, 64, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 224, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 64, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 56, 0, 2, 0, 64, 0, 8, 0, 1, 0, 2, 0, 0, 96, 6, 0, 0, 0, 184, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 96, 5, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 41, 0, 0, 0, 0, 0, 0, 12, 41, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 46, 115, 104, 115, 116, 114, 116, 97, 98, 0, 46, 115, 116, 114, 116, 97, 98, 0, 46, 110, 111, 116, 101, 0, 46, 104, 115, 97, 100, 97, 116, 97, 95, 114, 101, 97, 100, 111, 110, 108, 121, 95, 97, 103, 101, 110, 116, 0, 46, 104, 115, 97, 116, 101, 120, 116, 0, 46, 115, 121, 109, 116, 97, 98, 0, 46, 115, 121, 109, 116, 97, 98, 0, 46, 114, 101, 108, 97, 46, 104, 115, 97, 116, 101, 120, 116, 0, 0, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 69, 88, 80, 95, 69, 80, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 76, 79, 71, 69, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 76, 79, 71, 95, 73, 78, 86, 95, 69, 80, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 116, 111, 95, 98, 117, 102, 102, 101, 114, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 98, 117, 102, 102, 101, 114, 95, 116, 111, 95, 105, 109, 97, 103, 101, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 100, 101, 102, 97, 117, 108, 116, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 108, 105, 110, 101, 97, 114, 95, 116, 111, 95, 115, 116, 97, 110, 100, 97, 114, 100, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 115, 116, 97, 110, 100, 97, 114, 100, 95, 116, 111, 95, 108, 105, 110, 101, 97, 114, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 116, 111, 95, 114, 101, 103, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 114, 101, 103, 95, 116, 111, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 108, 101, 97, 114, 95, 105, 109, 97, 103, 101, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 108, 101, 97, 114, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 95, 95, 104, 115, 97, 95, 115, 101, 99, 116, 105, 111, 110, 46, 104, 115, 97, 100, 97, 116, 97, 95, 114, 101, 97, 100, 111, 110, 108, 121, 95, 97, 103, 101, 110, 116, 0, 95, 95, 104, 115, 97, 95, 115, 101, 99, 116, 105, 111, 110, 46, 104, 115, 97, 116, 101, 120, 116, 0, 0, 0, 0, 4, 0, 0, 0, 8, 0, 0, 0, 1, 0, 0, 0, 65, 77, 68, 0, 1, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 12, 0, 0, 0, 2, 0, 0, 0, 65, 77, 68, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 0, 4, 0, 0, 0, 26, 0, 0, 0, 3, 0, 0, 0, 65, 77, 68, 0, 4, 0, 7, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 77, 68, 0, 65, 77, 68, 71, 80, 85, 0, 0, 4, 0, 0, 0, 41, 0, 0, 0, 4, 0, 0, 0, 65, 77, 68, 0, 25, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 65, 77, 68, 32, 72, 83, 65, 32, 82, 117, 110, 116, 105, 109, 101, 32, 70, 105, 110, 97, 108, 105, 122, 101, 114, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 26, 0, 0, 0, 5, 0, 0, 0, 65, 77, 68, 0, 22, 0, 45, 104, 115, 97, 95, 99, 97, 108, 108, 95, 99, 111, 110, 118, 101, 110, 116, 105, 111, 110, 61, 48, 0, 197, 0, 0, 0, 0, 128, 63, 0, 0, 0, 0, 0, 96, 129, 63, 119, 62, 26, 57, 0, 192, 130, 63, 138, 105, 216, 57, 0, 32, 132, 63, 29, 70, 81, 58, 0, 160, 133, 63, 124, 54, 172, 57, 0, 0, 135, 63, 180, 12, 123, 58, 0, 128, 136, 63, 4, 116, 64, 58, 0, 0, 138, 63, 170, 171, 38, 58, 0, 128, 139, 63, 31, 15, 46, 58, 0, 0, 141, 63, 219, 250, 86, 58, 0, 160, 142, 63, 104, 49, 7, 57, 0, 32, 144, 63, 24, 226, 14, 58, 0, 192, 145, 63, 234, 220, 244, 56, 0, 64, 147, 63, 120, 89, 81, 58, 0, 224, 148, 63, 71, 125, 39, 58, 0, 128, 150, 63, 185, 105, 33, 58, 0, 32, 152, 63, 140, 130, 63, 58, 0, 224, 153, 63, 65, 38, 11, 55, 0, 128, 155, 63, 157, 155, 211, 57, 0, 32, 157, 63, 57, 205, 118, 58, 0, 224, 158, 63, 4, 147, 41, 58, 0, 160, 160, 63, 125, 136, 2, 58, 0, 96, 162, 63, 24, 24, 2, 58, 0, 32, 164, 63, 112, 173, 40, 58, 0, 224, 165, 63, 77, 181, 118, 58, 0, 192, 167, 63, 78, 59, 217, 57, 0, 160, 169, 63, 117, 90, 45, 56, 0, 96, 171, 63, 173, 205, 81, 58, 0, 64, 173, 63, 82, 247, 65, 58, 0, 32, 175, 63, 107, 197, 91, 58, 0, 32, 177, 63, 116, 96, 253, 56, 0, 0, 179, 63, 149, 32, 14, 58, 0, 0, 181, 63, 127, 102, 30, 57, 0, 224, 182, 63, 25, 143, 108, 58, 0, 224, 184, 63, 59, 122, 93, 58, 0, 224, 186, 63, 144, 213, 122, 58, 0, 0, 189, 63, 245, 57, 138, 57, 0, 0, 191, 63, 179, 205, 60, 58, 0, 32, 193, 63, 166, 204, 196, 57, 0, 64, 195, 63, 68, 155, 89, 57, 0, 96, 197, 63, 42, 66, 101, 57, 0, 128, 199, 63, 138, 76, 215, 57, 0, 160, 201, 63, 51, 236, 77, 58, 0, 224, 203, 63, 239, 79, 193, 57, 0, 32, 206, 63, 163, 130, 17, 57, 0, 96, 208, 63, 187, 246, 204, 56, 0, 160, 210, 63, 31, 217, 129, 57, 0, 224, 212, 63, 94, 213, 26, 58, 0, 64, 215, 63, 90, 153, 31, 57, 0, 128, 217, 63, 19, 174, 104, 58, 0, 224, 219, 63, 190, 188, 93, 58, 0, 96, 222, 63, 94, 130, 244, 55, 0, 192, 224, 63, 194, 238, 205, 57, 0, 32, 227, 63, 149, 75, 124, 58, 0, 160, 229, 63, 59, 55, 72, 58, 0, 32, 232, 63, 129, 82, 75, 58, 0, 192, 234, 63, 221, 231, 198, 55, 0, 64, 237, 63, 237, 1, 243, 57, 0, 224, 239, 63, 123, 51, 23, 57, 0, 128, 242, 63, 44, 158, 59, 56, 0, 32, 245, 63, 164, 162, 47, 57, 0, 192, 247, 63, 152, 251, 6, 58, 0, 128, 250, 63, 220, 182, 236, 56, 0, 32, 253, 63, 103, 96, 112, 58, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 59, 65, 172, 41, 52, 0, 0, 126, 60, 252, 176, 168, 53, 0, 192, 189, 60, 234, 131, 141, 54, 0, 16, 252, 60, 120, 14, 27, 54, 0, 240, 28, 61, 254, 185, 135, 54, 0, 160, 59, 61, 101, 236, 49, 54, 0, 16, 90, 61, 25, 113, 221, 54, 0, 80, 120, 61, 69, 0, 195, 53, 0, 32, 139, 61, 81, 119, 155, 55, 0, 0, 154, 61, 13, 203, 235, 55, 0, 208, 168, 61, 131, 159, 131, 55, 0, 128, 183, 61, 229, 138, 82, 55, 0, 16, 198, 61, 24, 235, 162, 55, 0, 144, 212, 61, 149, 116, 218, 54, 0, 240, 226, 61, 183, 30, 169, 54, 0, 48, 241, 61, 21, 183, 131, 55, 0, 96, 255, 61, 219, 49, 17, 55, 0, 176, 6, 62, 104, 62, 63, 56, 0, 176, 13, 62, 151, 106, 21, 56, 0, 160, 20, 62, 15, 124, 41, 56, 0, 128, 27, 62, 15, 16, 126, 56, 0, 96, 34, 62, 101, 182, 21, 56, 0, 48, 41, 62, 161, 227, 229, 55, 0, 240, 47, 62, 83, 56, 24, 56, 0, 176, 54, 62, 157, 113, 254, 53, 0, 80, 61, 62, 8, 129, 68, 56, 0, 240, 67, 62, 144, 50, 80, 56, 0, 144, 74, 62, 232, 57, 53, 55, 0, 16, 81, 62, 241, 15, 94, 56, 0, 144, 87, 62, 64, 167, 100, 56, 0, 16, 94, 62, 45, 116, 134, 55, 0, 112, 100, 62, 205, 227, 123, 56, 0, 224, 106, 62, 62, 173, 133, 54, 0, 48, 113, 62, 21, 183, 3, 56, 0, 128, 119, 62, 220, 203, 173, 55, 0, 192, 125, 62, 175, 54, 12, 56, 0, 0, 130, 62, 211, 82, 22, 55, 0, 16, 133, 62, 57, 113, 146, 56, 0, 32, 136, 62, 215, 252, 197, 56, 0, 48, 139, 62, 213, 85, 174, 56, 0, 64, 142, 62, 105, 193, 24, 56, 0, 64, 145, 62, 231, 253, 160, 56, 0, 64, 148, 62, 239, 9, 173, 56, 0, 64, 151, 62, 225, 186, 98, 56, 0, 48, 154, 62, 76, 205, 238, 56, 0, 48, 157, 62, 210, 170, 152, 55, 0, 32, 160, 62, 26, 26, 66, 55, 0, 0, 163, 62, 14, 225, 197, 56, 0, 240, 165, 62, 238, 42, 191, 55, 0, 208, 168, 62, 45, 135, 45, 56, 0, 176, 171, 62, 138, 46, 238, 55, 0, 128, 174, 62, 172, 223, 222, 56, 0, 96, 177, 62, 185, 242, 2, 56, 0, 48, 180, 62, 155, 30, 72, 56, 0, 0, 183, 62, 43, 170, 14, 56, 0, 192, 185, 62, 93, 251, 235, 56, 0, 144, 188, 62, 221, 95, 37, 56, 0, 80, 191, 62, 130, 59, 120, 56, 0, 16, 194, 62, 30, 218, 81, 56, 0, 208, 196, 62, 5, 27, 78, 55, 0, 128, 199, 62, 155, 67, 143, 56, 0, 48, 202, 62, 16, 14, 202, 56, 0, 224, 204, 62, 139, 192, 202, 56, 0, 144, 207, 62, 95, 246, 145, 56, 0, 64, 210, 62, 203, 33, 129, 55, 0, 224, 212, 62, 154, 154, 108, 56, 0, 128, 215, 62, 35, 153, 148, 56, 0, 32, 218, 62, 204, 123, 119, 56, 0, 192, 220, 62, 38, 45, 177, 55, 0, 80, 223, 62, 211, 206, 166, 56, 0, 224, 225, 62, 230, 211, 235, 56, 0, 112, 228, 62, 205, 227, 251, 56, 0, 0, 231, 62, 194, 133, 215, 56, 0, 144, 233, 62, 0, 126, 126, 56, 0, 16, 236, 62, 197, 146, 243, 56, 0, 160, 238, 62, 131, 9, 212, 55, 0, 32, 241, 62, 124, 26, 8, 56, 0, 160, 243, 62, 173, 195, 132, 55, 0, 16, 246, 62, 35, 233, 204, 56, 0, 144, 248, 62, 175, 95, 15, 56, 0, 0, 251, 62, 56, 253, 145, 56, 0, 112, 253, 62, 188, 71, 172, 56, 0, 224, 255, 62, 43, 4, 151, 56, 0, 32, 1, 63, 210, 82, 41, 57, 0, 80, 2, 63, 212, 206, 111, 57, 0, 144, 3, 63, 115, 112, 249, 55, 0, 192, 4, 63, 174, 158, 94, 56, 0, 240, 5, 63, 74, 200, 101, 56, 0, 32, 7, 63, 163, 11, 19, 56, 0, 64, 8, 63, 22, 207, 121, 57, 0, 112, 9, 63, 201, 202, 56, 57, 0, 160, 10, 63, 244, 210, 195, 56, 0, 192, 11, 63, 236, 93, 117, 57, 0, 240, 12, 63, 103, 180, 230, 56, 0, 16, 14, 63, 184, 15, 92, 57, 0, 64, 15, 63, 224, 188, 62, 56, 0, 96, 16, 63, 146, 209, 220, 56, 0, 128, 17, 63, 223, 107, 24, 57, 0, 160, 18, 63, 76, 231, 45, 57, 0, 192, 19, 63, 68, 9, 47, 57, 0, 224, 20, 63, 97, 255, 27, 57, 0, 0, 22, 63, 68, 237, 233, 56, 0, 32, 23, 63, 200, 109, 104, 56, 0, 48, 24, 63, 167, 153, 107, 57, 0, 80, 25, 63, 137, 156, 9, 57, 0, 112, 26, 63, 115, 118, 162, 55, 0, 128, 27, 63, 163, 218, 11, 57, 0, 144, 28, 63, 171, 105, 112, 57, 0, 176, 29, 63, 255, 73, 132, 56, 0, 192, 30, 63, 56, 53, 1, 57, 0, 208, 31, 63, 104, 194, 45, 57, 0, 224, 32, 63, 35, 244, 71, 57, 0, 240, 33, 63, 124, 241, 79, 57, 0, 0, 35, 63, 14, 225, 69, 57, 0, 16, 36, 63, 245, 232, 41, 57, 0, 32, 37, 63, 176, 93, 248, 56, 0, 48, 38, 63, 153, 95, 115, 56, 0, 48, 39, 63, 219, 8, 108, 57, 0, 64, 40, 63, 0, 230, 9, 57, 0, 80, 41, 63, 111, 153, 180, 55, 0, 80, 42, 63, 204, 51, 18, 57, 0, 80, 43, 63, 217, 234, 124, 57, 0, 96, 44, 63, 205, 181, 173, 56, 0, 96, 45, 63, 26, 38, 32, 57, 0, 96, 46, 63, 54, 238, 88, 57, 0, 112, 47, 63, 5, 73, 170, 53, 0, 112, 48, 63, 30, 209, 203, 55, 0, 112, 49, 63, 244, 253, 5, 56, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 254, 63, 248, 3, 254, 56, 0, 0, 252, 63, 193, 15, 252, 57, 0, 0, 250, 63, 201, 179, 140, 58, 0, 0, 248, 63, 16, 62, 248, 58, 0, 0, 246, 63, 48, 123, 64, 59, 0, 0, 244, 63, 96, 141, 137, 59, 0, 0, 242, 63, 72, 214, 185, 59, 0, 0, 240, 63, 241, 240, 240, 59, 0, 0, 239, 63, 127, 220, 186, 58, 0, 0, 237, 63, 108, 7, 102, 59, 0, 0, 235, 63, 166, 178, 189, 59, 0, 0, 234, 63, 161, 14, 234, 57, 0, 0, 232, 63, 247, 88, 75, 59, 0, 0, 230, 63, 72, 180, 194, 59, 0, 0, 229, 63, 172, 96, 150, 58, 0, 0, 227, 63, 228, 56, 142, 59, 0, 0, 225, 63, 14, 120, 252, 59, 0, 0, 224, 63, 56, 112, 96, 59, 0, 0, 222, 63, 77, 92, 233, 59, 0, 0, 221, 63, 76, 145, 79, 59, 0, 0, 219, 63, 239, 97, 235, 59, 0, 0, 218, 63, 79, 27, 104, 59, 0, 0, 217, 63, 178, 1, 89, 56, 0, 0, 215, 63, 229, 53, 148, 59, 0, 0, 214, 63, 89, 3, 174, 58, 0, 0, 212, 63, 3, 123, 199, 59, 0, 0, 211, 63, 109, 26, 80, 59, 0, 0, 210, 63, 33, 13, 210, 57, 0, 0, 208, 63, 204, 159, 182, 59, 0, 0, 207, 63, 81, 233, 72, 59, 0, 0, 206, 63, 185, 83, 52, 58, 0, 0, 204, 63, 205, 204, 204, 59, 0, 0, 203, 63, 192, 39, 135, 59, 0, 0, 202, 63, 205, 15, 11, 59, 0, 0, 201, 63, 209, 73, 123, 57, 0, 0, 199, 63, 125, 12, 206, 59, 0, 0, 198, 63, 106, 12, 152, 59, 0, 0, 197, 63, 247, 144, 75, 59, 0, 0, 196, 63, 21, 190, 220, 58, 0, 0, 195, 63, 49, 12, 195, 57, 0, 0, 193, 63, 214, 187, 228, 59, 0, 0, 192, 63, 193, 192, 192, 59, 0, 0, 191, 63, 232, 47, 160, 59, 0, 0, 190, 63, 12, 250, 130, 59, 0, 0, 189, 63, 142, 32, 82, 59, 0, 0, 188, 63, 24, 200, 36, 59, 0, 0, 187, 63, 135, 156, 251, 58, 0, 0, 186, 63, 140, 46, 186, 58, 0, 0, 185, 63, 233, 15, 133, 58, 0, 0, 184, 63, 3, 23, 56, 58, 0, 0, 183, 63, 162, 181, 251, 57, 0, 0, 182, 63, 97, 11, 182, 57, 0, 0, 181, 63, 170, 104, 158, 57, 0, 0, 180, 63, 65, 11, 180, 57, 0, 0, 179, 63, 41, 53, 246, 57, 0, 0, 178, 63, 67, 22, 50, 58, 0, 0, 177, 63, 192, 157, 126, 58, 0, 0, 176, 63, 11, 44, 176, 58, 0, 0, 175, 63, 26, 119, 235, 58, 0, 0, 174, 63, 185, 130, 24, 59, 0, 0, 173, 63, 176, 86, 64, 59, 0, 0, 172, 63, 8, 35, 109, 59, 0, 0, 171, 63, 227, 105, 143, 59, 0, 0, 170, 63, 171, 170, 170, 59, 0, 0, 169, 63, 72, 74, 200, 59, 0, 0, 168, 63, 87, 63, 232, 59, 0, 0, 168, 63, 129, 10, 168, 57, 0, 0, 167, 63, 230, 20, 188, 58, 0, 0, 166, 63, 114, 136, 43, 59, 0, 0, 165, 63, 5, 106, 125, 59, 0, 0, 164, 63, 30, 207, 169, 59, 0, 0, 163, 63, 61, 10, 215, 59, 0, 0, 163, 63, 246, 199, 75, 57, 0, 0, 162, 63, 172, 12, 223, 58, 0, 0, 161, 63, 93, 98, 86, 59, 0, 0, 160, 63, 161, 160, 160, 59, 0, 0, 159, 63, 254, 9, 216, 59, 0, 0, 159, 63, 57, 47, 11, 58, 0, 0, 158, 63, 72, 90, 25, 59, 0, 0, 157, 63, 158, 216, 137, 59, 0, 0, 156, 63, 97, 225, 200, 59, 0, 0, 156, 63, 193, 9, 156, 57, 0, 0, 155, 63, 62, 223, 24, 59, 0, 0, 154, 63, 217, 231, 144, 59, 0, 0, 153, 63, 219, 34, 215, 59, 0, 0, 153, 63, 139, 210, 120, 58, 0, 0, 152, 63, 19, 144, 81, 59, 0, 0, 151, 63, 237, 37, 180, 59, 0, 0, 151, 63, 46, 1, 23, 56, 0, 0, 150, 63, 216, 180, 31, 59, 0, 0, 149, 63, 104, 37, 160, 59, 0, 0, 148, 63, 79, 9, 242, 59, 0, 0, 148, 63, 41, 1, 11, 59, 0, 0, 147, 63, 196, 133, 154, 59, 0, 0, 146, 63, 132, 19, 241, 59, 0, 0, 146, 63, 37, 73, 18, 59, 0, 0, 145, 63, 197, 179, 162, 59, 0, 0, 144, 63, 9, 188, 253, 59, 0, 0, 144, 63, 198, 112, 52, 59, 0, 0, 143, 63, 238, 35, 184, 59, 0, 0, 143, 63, 208, 206, 59, 58, 0, 0, 142, 63, 218, 106, 112, 59, 0, 0, 141, 63, 2, 82, 218, 59, 0, 0, 141, 63, 35, 44, 247, 58, 0, 0, 140, 63, 4, 156, 162, 59, 0, 0, 140, 63, 193, 8, 140, 57, 0, 0, 139, 63, 148, 104, 96, 59, 0, 0, 138, 63, 252, 242, 216, 59, 0, 0, 138, 63, 225, 240, 5, 59, 0, 0, 137, 63, 138, 64, 174, 59, 0, 0, 137, 63, 215, 57, 86, 58, 0, 0, 136, 63, 137, 136, 136, 59, 0, 0, 135, 63, 136, 128, 247, 59, 0, 0, 135, 63, 190, 86, 79, 59, 0, 0, 134, 63, 68, 5, 217, 59, 0, 0, 134, 63, 252, 20, 23, 59, 0, 0, 133, 63, 97, 55, 191, 59, 0, 0, 133, 63, 77, 33, 208, 58, 0, 0, 132, 63, 200, 249, 169, 59, 0, 0, 132, 63, 8, 33, 132, 58, 0, 0, 131, 63, 82, 48, 153, 59, 0, 0, 131, 63, 188, 116, 19, 58, 0, 0, 130, 63, 191, 191, 140, 59, 0, 0, 130, 63, 33, 8, 130, 57, 0, 0, 129, 63, 169, 141, 132, 59, 0, 0, 129, 63, 4, 2, 129, 56, 0, 0, 128, 63, 129, 128, 128, 59, 0, 0, 128, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 194, 2, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 11, 0, 11, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 2, 134, 255, 255, 0, 0, 0, 255, 128, 146, 16, 0, 16, 0, 1, 255, 1, 134, 255, 255, 0, 0, 2, 8, 2, 146, 0, 9, 0, 146, 1, 10, 1, 146, 3, 2, 10, 192, 0, 0, 0, 0, 3, 1, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 104, 0, 0, 0, 195, 0, 2, 192, 112, 0, 0, 0, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 50, 0, 2, 2, 50, 1, 4, 0, 128, 4, 0, 193, 208, 1, 27, 0, 0, 8, 0, 193, 208, 0, 25, 0, 0, 0, 4, 4, 50, 8, 4, 128, 134, 3, 4, 136, 125, 0, 106, 128, 134, 0, 32, 128, 190, 199, 0, 136, 191, 131, 0, 6, 192, 24, 0, 0, 0, 3, 1, 6, 192, 56, 0, 0, 0, 3, 2, 2, 192, 64, 0, 0, 0, 131, 2, 6, 192, 72, 0, 0, 0, 67, 2, 2, 192, 128, 0, 0, 0, 3, 3, 10, 192, 136, 0, 0, 0, 159, 0, 6, 34, 159, 2, 8, 34, 127, 0, 140, 191, 5, 0, 134, 210, 1, 25, 0, 0, 4, 0, 133, 210, 4, 25, 0, 0, 4, 11, 8, 50, 5, 0, 133, 210, 1, 27, 0, 0, 5, 9, 8, 50, 5, 0, 133, 210, 1, 25, 0, 0, 5, 106, 25, 209, 5, 1, 2, 0, 4, 7, 6, 56, 4, 0, 14, 50, 5, 2, 16, 50, 8, 4, 18, 50, 159, 4, 12, 34, 1, 4, 14, 192, 0, 0, 0, 0, 128, 2, 20, 126, 127, 0, 140, 191, 0, 95, 0, 240, 7, 7, 4, 0, 0, 0, 134, 210, 2, 29, 0, 0, 1, 0, 133, 210, 6, 29, 0, 0, 1, 1, 0, 50, 1, 0, 133, 210, 2, 31, 0, 0, 1, 1, 0, 50, 1, 0, 133, 210, 2, 29, 0, 0, 1, 106, 25, 209, 1, 11, 2, 0, 0, 7, 0, 56, 2, 0, 134, 210, 1, 19, 0, 0, 0, 0, 133, 210, 0, 19, 0, 0, 0, 5, 0, 50, 1, 0, 133, 210, 1, 19, 0, 0, 3, 106, 25, 209, 1, 21, 0, 0, 11, 2, 4, 126, 0, 5, 8, 56, 131, 0, 6, 192, 120, 0, 0, 0, 3, 1, 6, 192, 32, 0, 0, 0, 127, 0, 140, 191, 2, 132, 0, 191, 85, 0, 133, 191, 3, 2, 6, 192, 40, 0, 0, 0, 2, 130, 0, 191, 41, 0, 132, 191, 3, 132, 0, 191, 29, 0, 133, 191, 3, 130, 0, 191, 12, 0, 132, 191, 0, 0, 143, 210, 130, 6, 2, 0, 112, 15, 140, 191, 144, 16, 4, 36, 0, 106, 25, 209, 4, 0, 2, 0, 5, 2, 6, 126, 3, 3, 2, 56, 2, 15, 4, 40, 0, 0, 112, 220, 0, 2, 0, 0, 110, 0, 130, 191, 3, 129, 0, 191, 108, 0, 132, 191, 0, 0, 143, 210, 129, 6, 2, 0, 112, 15, 140, 191, 136, 16, 4, 36, 127, 0, 140, 191, 0, 106, 25, 209, 8, 0, 2, 0, 9, 2, 6, 126, 3, 3, 2, 56, 2, 15, 4, 40, 0, 0, 104, 220, 0, 2, 0, 0, 95, 0, 130, 191, 0, 0, 143, 210, 130, 6, 2, 0, 0, 106, 25, 209, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 0, 0, 116, 220, 0, 7, 0, 0, 85, 0, 130, 191, 2, 129, 0, 191, 83, 0, 132, 191, 3, 132, 0, 191, 26, 0, 133, 191, 3, 130, 0, 191, 11, 0, 132, 191, 0, 0, 143, 210, 129, 6, 2, 0, 127, 0, 140, 191, 0, 106, 25, 209, 8, 0, 2, 0, 9, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 0, 0, 104, 220, 0, 7, 0, 0, 68, 0, 130, 191, 3, 129, 0, 191, 66, 0, 132, 191, 131, 0, 6, 192, 48, 0, 0, 0, 127, 0, 140, 191, 0, 106, 25, 209, 2, 6, 2, 0, 3, 2, 4, 126, 2, 9, 2, 56, 112, 15, 140, 191, 0, 0, 96, 220, 0, 7, 0, 0, 55, 0, 130, 191, 0, 0, 143, 210, 130, 6, 2, 0, 0, 106, 25, 209, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 0, 0, 112, 220, 0, 7, 0, 0, 45, 0, 130, 191, 3, 132, 0, 191, 34, 0, 133, 191, 3, 130, 0, 191, 14, 0, 132, 191, 112, 15, 140, 191, 144, 16, 0, 36, 0, 15, 10, 40, 1, 0, 143, 210, 130, 6, 2, 0, 1, 106, 25, 209, 4, 2, 2, 0, 5, 2, 6, 126, 3, 5, 4, 56, 144, 20, 6, 36, 3, 19, 12, 40, 0, 0, 116, 220, 1, 5, 0, 0, 27, 0, 130, 191, 3, 129, 0, 191, 25, 0, 132, 191, 112, 15, 140, 191, 136, 16, 0, 36, 0, 15, 0, 40, 144, 18, 2, 36, 2, 0, 143, 210, 130, 6, 2, 0, 0, 3, 0, 40, 152, 20, 2, 36, 2, 106, 25, 209, 4, 4, 2, 0, 5, 2, 8, 126, 4, 7, 6, 56, 0, 3, 0, 40, 0, 0, 112, 220, 2, 0, 0, 0, 9, 0, 130, 191, 0, 0, 143, 210, 130, 6, 2, 0, 0, 106, 25, 209, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 0, 0, 124, 220, 0, 7, 0, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 196, 2, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 19, 0, 19, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 2, 134, 255, 255, 0, 0, 0, 255, 128, 146, 16, 0, 16, 0, 1, 255, 1, 134, 255, 255, 0, 0, 2, 8, 2, 146, 0, 9, 0, 146, 1, 10, 1, 146, 3, 2, 10, 192, 0, 0, 0, 0, 3, 1, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 88, 0, 0, 0, 195, 0, 2, 192, 96, 0, 0, 0, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 50, 0, 2, 2, 50, 1, 4, 0, 128, 4, 0, 193, 208, 1, 27, 0, 0, 8, 0, 193, 208, 0, 25, 0, 0, 0, 4, 4, 50, 8, 4, 128, 134, 3, 4, 136, 125, 0, 106, 128, 134, 0, 32, 128, 190, 206, 0, 136, 191, 131, 0, 6, 192, 40, 0, 0, 0, 3, 1, 6, 192, 72, 0, 0, 0, 3, 2, 2, 192, 80, 0, 0, 0, 67, 2, 2, 192, 112, 0, 0, 0, 3, 3, 10, 192, 120, 0, 0, 0, 159, 0, 6, 34, 159, 2, 8, 34, 127, 0, 140, 191, 5, 0, 134, 210, 1, 25, 0, 0, 4, 0, 133, 210, 4, 25, 0, 0, 4, 11, 8, 50, 5, 0, 133, 210, 1, 27, 0, 0, 5, 9, 8, 50, 5, 0, 133, 210, 1, 25, 0, 0, 5, 106, 25, 209, 5, 1, 2, 0, 4, 7, 6, 56, 159, 4, 8, 34, 6, 0, 134, 210, 2, 29, 0, 0, 4, 0, 133, 210, 4, 29, 0, 0, 4, 13, 8, 50, 6, 0, 133, 210, 2, 31, 0, 0, 6, 9, 8, 50, 6, 0, 133, 210, 2, 29, 0, 0, 5, 106, 25, 209, 6, 11, 2, 0, 4, 7, 6, 56, 4, 0, 134, 210, 5, 19, 0, 0, 3, 0, 133, 210, 3, 19, 0, 0, 3, 9, 6, 50, 4, 0, 133, 210, 5, 19, 0, 0, 6, 106, 25, 209, 4, 5, 0, 0, 3, 2, 10, 126, 3, 11, 14, 56, 4, 0, 30, 50, 5, 2, 32, 50, 8, 4, 34, 50, 131, 0, 6, 192, 104, 0, 0, 0, 3, 2, 6, 192, 24, 0, 0, 0, 127, 0, 140, 191, 2, 132, 0, 191, 78, 0, 133, 191, 2, 130, 0, 191, 40, 0, 132, 191, 3, 130, 0, 191, 14, 0, 132, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 0, 80, 220, 3, 0, 0, 3, 112, 0, 140, 191, 249, 2, 12, 126, 3, 6, 5, 0, 249, 2, 10, 126, 3, 6, 4, 0, 57, 0, 130, 191, 3, 129, 0, 191, 13, 0, 132, 191, 3, 0, 143, 210, 129, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 0, 72, 220, 3, 0, 0, 3, 112, 0, 140, 191, 136, 6, 12, 32, 249, 2, 10, 126, 3, 6, 0, 0, 42, 0, 130, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 0, 84, 220, 3, 0, 0, 5, 33, 0, 130, 191, 2, 129, 0, 191, 29, 0, 132, 191, 3, 130, 0, 191, 9, 0, 132, 191, 3, 0, 143, 210, 129, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 0, 72, 220, 3, 0, 0, 5, 19, 0, 130, 191, 3, 129, 0, 191, 7, 0, 132, 191, 3, 106, 25, 209, 8, 12, 2, 0, 9, 2, 10, 126, 5, 15, 8, 56, 0, 0, 64, 220, 3, 0, 0, 5, 10, 0, 130, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 0, 80, 220, 3, 0, 0, 5, 1, 0, 130, 191, 2, 2, 10, 126, 3, 2, 12, 126, 5, 2, 16, 126, 4, 2, 14, 126, 58, 0, 130, 191, 3, 129, 0, 191, 18, 0, 132, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 0, 80, 220, 3, 0, 0, 3, 112, 0, 140, 191, 249, 2, 16, 126, 3, 6, 3, 0, 249, 2, 14, 126, 3, 6, 2, 0, 249, 2, 12, 126, 3, 6, 1, 0, 249, 2, 10, 126, 3, 6, 0, 0, 38, 0, 130, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 0, 80, 220, 3, 0, 0, 5, 3, 130, 0, 191, 16, 0, 132, 191, 3, 106, 25, 209, 3, 9, 1, 0, 4, 106, 28, 209, 4, 1, 169, 1, 0, 0, 80, 220, 3, 0, 0, 3, 112, 0, 140, 191, 249, 2, 16, 126, 3, 6, 5, 0, 249, 2, 14, 126, 3, 6, 4, 0, 249, 2, 12, 126, 5, 6, 5, 0, 249, 2, 10, 126, 5, 6, 4, 0, 12, 0, 130, 191, 6, 106, 25, 209, 3, 25, 1, 0, 7, 106, 28, 209, 4, 1, 169, 1, 0, 0, 80, 220, 6, 0, 0, 8, 3, 106, 25, 209, 3, 9, 1, 0, 4, 106, 28, 209, 4, 1, 169, 1, 0, 0, 84, 220, 3, 0, 0, 6, 131, 0, 6, 192, 32, 0, 0, 0, 127, 0, 140, 191, 1, 1, 14, 192, 0, 0, 0, 0, 128, 2, 36, 126, 112, 0, 140, 191, 0, 95, 32, 240, 15, 5, 1, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 194, 2, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 2, 134, 255, 255, 0, 0, 0, 255, 128, 146, 16, 0, 16, 0, 1, 255, 1, 134, 255, 255, 0, 0, 2, 8, 2, 146, 0, 9, 0, 146, 1, 10, 1, 146, 3, 2, 10, 192, 0, 0, 0, 0, 3, 1, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 72, 0, 0, 0, 195, 0, 2, 192, 80, 0, 0, 0, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 50, 0, 2, 2, 50, 1, 4, 0, 128, 4, 0, 198, 208, 1, 27, 0, 0, 8, 0, 198, 208, 0, 25, 0, 0, 0, 4, 4, 50, 8, 4, 128, 135, 3, 4, 134, 125, 0, 106, 234, 135, 126, 1, 128, 190, 0, 106, 254, 137, 28, 0, 136, 191, 3, 2, 10, 192, 24, 0, 0, 0, 131, 0, 6, 192, 40, 0, 0, 0, 3, 1, 2, 192, 48, 0, 0, 0, 127, 0, 140, 191, 4, 3, 14, 192, 0, 0, 0, 0, 2, 0, 6, 50, 3, 2, 8, 50, 4, 4, 10, 50, 128, 2, 12, 126, 127, 0, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 3, 1, 10, 192, 56, 0, 0, 0, 5, 2, 14, 192, 0, 0, 0, 0, 127, 0, 140, 191, 4, 0, 14, 50, 5, 2, 16, 50, 6, 4, 18, 50, 128, 2, 20, 126, 112, 15, 140, 191, 0, 95, 32, 240, 7, 3, 2, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 197, 2, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 21, 0, 21, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 2, 134, 255, 255, 0, 0, 0, 255, 128, 146, 16, 0, 16, 0, 1, 255, 1, 134, 255, 255, 0, 0, 2, 8, 2, 146, 0, 9, 0, 146, 1, 10, 1, 146, 3, 2, 10, 192, 0, 0, 0, 0, 3, 1, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 72, 0, 0, 0, 195, 0, 2, 192, 80, 0, 0, 0, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 50, 0, 2, 2, 50, 1, 4, 0, 128, 4, 0, 198, 208, 1, 27, 0, 0, 8, 0, 198, 208, 0, 25, 0, 0, 0, 4, 4, 50, 8, 4, 128, 135, 3, 4, 134, 125, 0, 106, 234, 135, 126, 1, 128, 190, 0, 106, 254, 137, 225, 2, 136, 191, 131, 0, 6, 192, 24, 0, 0, 0, 3, 1, 6, 192, 40, 0, 0, 0, 3, 2, 2, 192, 48, 0, 0, 0, 127, 0, 140, 191, 1, 3, 14, 192, 0, 0, 0, 0, 4, 0, 6, 50, 5, 2, 8, 50, 8, 4, 10, 50, 128, 2, 12, 126, 127, 0, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 112, 15, 140, 191, 106, 0, 16, 208, 3, 7, 1, 0, 3, 0, 0, 209, 3, 1, 169, 1, 242, 6, 156, 124, 106, 32, 130, 190, 225, 0, 136, 191, 128, 6, 136, 124, 106, 32, 132, 190, 128, 2, 6, 126, 4, 126, 254, 137, 219, 0, 136, 191, 255, 0, 136, 190, 28, 46, 77, 59, 8, 6, 136, 124, 106, 32, 136, 190, 255, 6, 6, 10, 82, 184, 78, 65, 8, 126, 254, 137, 242, 6, 6, 10, 210, 0, 136, 191, 255, 6, 14, 38, 255, 255, 255, 127, 242, 14, 16, 4, 255, 0, 138, 190, 0, 0, 128, 61, 106, 1, 75, 208, 8, 21, 0, 0, 126, 1, 138, 190, 10, 106, 254, 137, 7, 105, 16, 126, 70, 0, 136, 191, 129, 16, 18, 36, 255, 16, 16, 50, 0, 0, 128, 0, 255, 18, 18, 50, 0, 0, 0, 1, 255, 16, 20, 38, 0, 0, 127, 0, 255, 18, 18, 38, 0, 0, 1, 0, 9, 21, 18, 50, 249, 2, 20, 126, 9, 6, 5, 0, 128, 2, 22, 126, 10, 0, 143, 210, 131, 20, 2, 0, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 12, 106, 25, 209, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 56, 0, 0, 84, 220, 12, 0, 0, 12, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 10, 106, 25, 209, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 56, 0, 0, 84, 220, 10, 0, 0, 10, 255, 16, 16, 38, 255, 255, 127, 0, 240, 18, 18, 40, 240, 16, 16, 40, 9, 17, 16, 4, 113, 1, 140, 191, 13, 17, 18, 10, 12, 17, 18, 44, 255, 2, 28, 126, 171, 170, 170, 62, 255, 0, 140, 190, 0, 0, 128, 62, 7, 103, 30, 126, 12, 18, 28, 44, 12, 0, 193, 209, 12, 17, 38, 132, 193, 30, 30, 50, 14, 0, 193, 209, 9, 29, 194, 3, 9, 19, 32, 10, 13, 17, 24, 44, 15, 11, 16, 126, 14, 33, 24, 44, 255, 0, 140, 190, 244, 253, 5, 56, 12, 0, 193, 209, 8, 25, 48, 132, 112, 0, 140, 191, 12, 23, 24, 2, 8, 21, 16, 46, 0, 112, 49, 63, 12, 19, 30, 4, 255, 18, 28, 42, 0, 0, 0, 128, 8, 31, 26, 2, 10, 126, 254, 137, 8, 17, 18, 10, 21, 0, 136, 191, 8, 19, 20, 10, 255, 2, 22, 126, 171, 170, 42, 62, 255, 0, 140, 190, 37, 73, 18, 62, 12, 16, 22, 44, 8, 23, 22, 48, 205, 204, 76, 62, 8, 23, 22, 48, 0, 0, 128, 62, 8, 23, 22, 48, 171, 170, 170, 62, 10, 23, 20, 10, 241, 18, 28, 10, 15, 0, 193, 209, 9, 227, 41, 132, 15, 17, 26, 4, 255, 20, 24, 42, 0, 0, 0, 128, 255, 16, 16, 42, 0, 0, 0, 128, 10, 1, 254, 190, 8, 27, 20, 4, 15, 29, 18, 4, 15, 21, 20, 2, 12, 19, 18, 4, 255, 26, 22, 38, 0, 240, 255, 255, 9, 21, 18, 2, 13, 23, 16, 4, 9, 17, 16, 2, 255, 16, 18, 10, 0, 160, 42, 56, 11, 19, 18, 46, 0, 160, 42, 56, 8, 19, 16, 46, 0, 80, 213, 62, 11, 17, 18, 46, 0, 80, 213, 62, 255, 18, 20, 10, 59, 170, 184, 66, 10, 17, 20, 126, 191, 20, 24, 38, 131, 24, 24, 36, 255, 0, 139, 190, 85, 85, 85, 85, 255, 0, 138, 190, 85, 85, 85, 85, 12, 106, 25, 209, 10, 24, 2, 0, 11, 2, 26, 126, 13, 106, 28, 209, 13, 1, 169, 1, 0, 0, 84, 220, 12, 0, 0, 12, 255, 0, 138, 190, 0, 80, 213, 62, 10, 11, 28, 126, 11, 0, 193, 209, 10, 22, 38, 132, 14, 19, 30, 46, 0, 0, 49, 188, 8, 23, 16, 2, 14, 31, 22, 46, 239, 47, 228, 183, 8, 23, 22, 2, 255, 2, 28, 126, 171, 170, 42, 62, 255, 0, 138, 190, 171, 170, 42, 61, 10, 22, 28, 44, 14, 0, 193, 209, 14, 23, 194, 3, 11, 23, 30, 10, 14, 31, 22, 44, 255, 0, 138, 190, 8, 227, 130, 180, 255, 0, 139, 190, 24, 114, 177, 66, 112, 0, 140, 191, 13, 23, 26, 44, 12, 0, 68, 208, 8, 21, 0, 0, 11, 18, 132, 124, 12, 23, 26, 44, 106, 12, 140, 134, 11, 18, 130, 124, 134, 20, 16, 34, 12, 27, 20, 2, 106, 12, 234, 135, 8, 0, 136, 210, 10, 17, 2, 0, 255, 2, 20, 126, 0, 0, 128, 127, 255, 0, 138, 190, 208, 142, 206, 194, 8, 21, 16, 0, 10, 18, 150, 124, 128, 16, 16, 0, 3, 15, 138, 125, 242, 16, 16, 10, 255, 2, 18, 126, 0, 0, 192, 127, 255, 0, 138, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 138, 125, 10, 0, 194, 208, 3, 21, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 209, 8, 19, 42, 0, 3, 19, 132, 125, 8, 19, 16, 0, 7, 19, 152, 125, 8, 7, 14, 0, 242, 6, 138, 125, 242, 14, 6, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 0, 138, 190, 61, 10, 135, 63, 3, 0, 193, 209, 3, 21, 28, 4, 4, 1, 254, 190, 2, 126, 254, 137, 242, 2, 6, 126, 2, 1, 254, 190, 106, 0, 16, 208, 4, 7, 1, 0, 4, 0, 0, 209, 4, 1, 169, 1, 242, 8, 156, 124, 2, 106, 254, 134, 225, 0, 136, 191, 128, 8, 136, 124, 106, 32, 132, 190, 128, 2, 8, 126, 4, 126, 254, 137, 219, 0, 136, 191, 255, 0, 136, 190, 28, 46, 77, 59, 8, 8, 136, 124, 106, 32, 136, 190, 255, 8, 8, 10, 82, 184, 78, 65, 8, 126, 254, 137, 242, 8, 8, 10, 210, 0, 136, 191, 255, 8, 14, 38, 255, 255, 255, 127, 242, 14, 16, 4, 255, 0, 138, 190, 0, 0, 128, 61, 106, 1, 75, 208, 8, 21, 0, 0, 126, 1, 138, 190, 10, 106, 254, 137, 7, 105, 16, 126, 70, 0, 136, 191, 129, 16, 18, 36, 255, 16, 16, 50, 0, 0, 128, 0, 255, 18, 18, 50, 0, 0, 0, 1, 255, 16, 20, 38, 0, 0, 127, 0, 255, 18, 18, 38, 0, 0, 1, 0, 9, 21, 18, 50, 249, 2, 20, 126, 9, 6, 5, 0, 128, 2, 22, 126, 10, 0, 143, 210, 131, 20, 2, 0, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 12, 106, 25, 209, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 56, 0, 0, 84, 220, 12, 0, 0, 12, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 10, 106, 25, 209, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 56, 0, 0, 84, 220, 10, 0, 0, 10, 255, 16, 16, 38, 255, 255, 127, 0, 240, 18, 18, 40, 240, 16, 16, 40, 9, 17, 16, 4, 113, 1, 140, 191, 13, 17, 18, 10, 12, 17, 18, 44, 255, 2, 28, 126, 171, 170, 170, 62, 255, 0, 140, 190, 0, 0, 128, 62, 7, 103, 30, 126, 12, 18, 28, 44, 12, 0, 193, 209, 12, 17, 38, 132, 193, 30, 30, 50, 14, 0, 193, 209, 9, 29, 194, 3, 9, 19, 32, 10, 13, 17, 24, 44, 15, 11, 16, 126, 14, 33, 24, 44, 255, 0, 140, 190, 244, 253, 5, 56, 12, 0, 193, 209, 8, 25, 48, 132, 112, 0, 140, 191, 12, 23, 24, 2, 8, 21, 16, 46, 0, 112, 49, 63, 12, 19, 30, 4, 255, 18, 28, 42, 0, 0, 0, 128, 8, 31, 26, 2, 10, 126, 254, 137, 8, 17, 18, 10, 21, 0, 136, 191, 8, 19, 20, 10, 255, 2, 22, 126, 171, 170, 42, 62, 255, 0, 140, 190, 37, 73, 18, 62, 12, 16, 22, 44, 8, 23, 22, 48, 205, 204, 76, 62, 8, 23, 22, 48, 0, 0, 128, 62, 8, 23, 22, 48, 171, 170, 170, 62, 10, 23, 20, 10, 241, 18, 28, 10, 15, 0, 193, 209, 9, 227, 41, 132, 15, 17, 26, 4, 255, 20, 24, 42, 0, 0, 0, 128, 255, 16, 16, 42, 0, 0, 0, 128, 10, 1, 254, 190, 8, 27, 20, 4, 15, 29, 18, 4, 15, 21, 20, 2, 12, 19, 18, 4, 255, 26, 22, 38, 0, 240, 255, 255, 9, 21, 18, 2, 13, 23, 16, 4, 9, 17, 16, 2, 255, 16, 18, 10, 0, 160, 42, 56, 11, 19, 18, 46, 0, 160, 42, 56, 8, 19, 16, 46, 0, 80, 213, 62, 11, 17, 18, 46, 0, 80, 213, 62, 255, 18, 20, 10, 59, 170, 184, 66, 10, 17, 20, 126, 191, 20, 24, 38, 131, 24, 24, 36, 255, 0, 139, 190, 85, 85, 85, 85, 255, 0, 138, 190, 85, 85, 85, 85, 12, 106, 25, 209, 10, 24, 2, 0, 11, 2, 26, 126, 13, 106, 28, 209, 13, 1, 169, 1, 0, 0, 84, 220, 12, 0, 0, 12, 255, 0, 138, 190, 0, 80, 213, 62, 10, 11, 28, 126, 11, 0, 193, 209, 10, 22, 38, 132, 14, 19, 30, 46, 0, 0, 49, 188, 8, 23, 16, 2, 14, 31, 22, 46, 239, 47, 228, 183, 8, 23, 22, 2, 255, 2, 28, 126, 171, 170, 42, 62, 255, 0, 138, 190, 171, 170, 42, 61, 10, 22, 28, 44, 14, 0, 193, 209, 14, 23, 194, 3, 11, 23, 30, 10, 14, 31, 22, 44, 255, 0, 138, 190, 8, 227, 130, 180, 255, 0, 139, 190, 24, 114, 177, 66, 112, 0, 140, 191, 13, 23, 26, 44, 12, 0, 68, 208, 8, 21, 0, 0, 11, 18, 132, 124, 12, 23, 26, 44, 106, 12, 140, 134, 11, 18, 130, 124, 134, 20, 16, 34, 12, 27, 20, 2, 106, 12, 234, 135, 8, 0, 136, 210, 10, 17, 2, 0, 255, 2, 20, 126, 0, 0, 128, 127, 255, 0, 138, 190, 208, 142, 206, 194, 8, 21, 16, 0, 10, 18, 150, 124, 128, 16, 16, 0, 4, 15, 138, 125, 242, 16, 16, 10, 255, 2, 18, 126, 0, 0, 192, 127, 255, 0, 138, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 138, 125, 10, 0, 194, 208, 4, 21, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 209, 8, 19, 42, 0, 4, 19, 132, 125, 8, 19, 16, 0, 7, 19, 152, 125, 8, 9, 14, 0, 242, 8, 138, 125, 242, 14, 8, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 0, 138, 190, 61, 10, 135, 63, 4, 0, 193, 209, 4, 21, 28, 4, 4, 1, 254, 190, 2, 126, 254, 137, 242, 2, 8, 126, 2, 1, 254, 190, 3, 2, 10, 192, 56, 0, 0, 0, 106, 0, 16, 208, 5, 7, 1, 0, 5, 0, 0, 209, 5, 1, 169, 1, 127, 0, 140, 191, 8, 0, 34, 50, 9, 2, 36, 50, 10, 4, 38, 50, 242, 10, 156, 124, 106, 32, 130, 190, 225, 0, 136, 191, 128, 10, 136, 124, 106, 32, 132, 190, 128, 2, 10, 126, 4, 126, 254, 137, 219, 0, 136, 191, 255, 0, 136, 190, 28, 46, 77, 59, 8, 10, 136, 124, 106, 32, 136, 190, 255, 10, 10, 10, 82, 184, 78, 65, 8, 126, 254, 137, 242, 10, 10, 10, 210, 0, 136, 191, 255, 10, 14, 38, 255, 255, 255, 127, 242, 14, 16, 4, 255, 0, 138, 190, 0, 0, 128, 61, 106, 1, 75, 208, 8, 21, 0, 0, 126, 1, 138, 190, 10, 106, 254, 137, 7, 105, 16, 126, 70, 0, 136, 191, 129, 16, 18, 36, 255, 16, 16, 50, 0, 0, 128, 0, 255, 18, 18, 50, 0, 0, 0, 1, 255, 16, 20, 38, 0, 0, 127, 0, 255, 18, 18, 38, 0, 0, 1, 0, 9, 21, 18, 50, 249, 2, 20, 126, 9, 6, 5, 0, 128, 2, 22, 126, 10, 0, 143, 210, 131, 20, 2, 0, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 12, 106, 25, 209, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 56, 0, 0, 84, 220, 12, 0, 0, 12, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 10, 106, 25, 209, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 56, 0, 0, 84, 220, 10, 0, 0, 10, 255, 16, 16, 38, 255, 255, 127, 0, 240, 18, 18, 40, 240, 16, 16, 40, 9, 17, 16, 4, 113, 1, 140, 191, 13, 17, 18, 10, 12, 17, 18, 44, 255, 2, 28, 126, 171, 170, 170, 62, 255, 0, 140, 190, 0, 0, 128, 62, 7, 103, 30, 126, 12, 18, 28, 44, 12, 0, 193, 209, 12, 17, 38, 132, 193, 30, 30, 50, 14, 0, 193, 209, 9, 29, 194, 3, 9, 19, 32, 10, 13, 17, 24, 44, 15, 11, 16, 126, 14, 33, 24, 44, 255, 0, 140, 190, 244, 253, 5, 56, 12, 0, 193, 209, 8, 25, 48, 132, 112, 0, 140, 191, 12, 23, 24, 2, 8, 21, 16, 46, 0, 112, 49, 63, 12, 19, 26, 4, 255, 18, 28, 42, 0, 0, 0, 128, 8, 27, 30, 2, 10, 126, 254, 137, 8, 17, 18, 10, 21, 0, 136, 191, 8, 19, 20, 10, 255, 2, 22, 126, 171, 170, 42, 62, 255, 0, 140, 190, 37, 73, 18, 62, 12, 16, 22, 44, 8, 23, 22, 48, 205, 204, 76, 62, 8, 23, 22, 48, 0, 0, 128, 62, 8, 23, 22, 48, 171, 170, 170, 62, 10, 23, 20, 10, 241, 18, 28, 10, 13, 0, 193, 209, 9, 227, 41, 132, 13, 17, 30, 4, 255, 20, 24, 42, 0, 0, 0, 128, 255, 16, 16, 42, 0, 0, 0, 128, 10, 1, 254, 190, 8, 31, 16, 4, 13, 29, 20, 4, 13, 17, 16, 2, 12, 21, 18, 4, 255, 30, 20, 38, 0, 240, 255, 255, 9, 17, 16, 2, 15, 21, 18, 4, 8, 19, 16, 2, 255, 16, 18, 10, 0, 160, 42, 56, 10, 19, 18, 46, 0, 160, 42, 56, 8, 19, 16, 46, 0, 80, 213, 62, 10, 17, 18, 46, 0, 80, 213, 62, 255, 18, 22, 10, 59, 170, 184, 66, 11, 17, 22, 126, 191, 22, 24, 38, 131, 24, 24, 36, 255, 0, 139, 190, 85, 85, 85, 85, 255, 0, 138, 190, 85, 85, 85, 85, 12, 106, 25, 209, 10, 24, 2, 0, 11, 2, 26, 126, 13, 106, 28, 209, 13, 1, 169, 1, 0, 0, 84, 220, 12, 0, 0, 12, 255, 0, 138, 190, 0, 80, 213, 62, 11, 11, 28, 126, 10, 0, 193, 209, 10, 20, 38, 132, 14, 19, 30, 46, 0, 0, 49, 188, 8, 21, 16, 2, 14, 31, 20, 46, 239, 47, 228, 183, 8, 21, 20, 2, 255, 2, 28, 126, 171, 170, 42, 62, 255, 0, 138, 190, 171, 170, 42, 61, 10, 20, 28, 44, 14, 0, 193, 209, 14, 21, 194, 3, 10, 21, 30, 10, 14, 31, 20, 44, 255, 0, 138, 190, 8, 227, 130, 180, 255, 0, 139, 190, 24, 114, 177, 66, 112, 0, 140, 191, 13, 21, 26, 44, 12, 0, 68, 208, 8, 21, 0, 0, 11, 18, 132, 124, 12, 21, 26, 44, 106, 12, 140, 134, 11, 18, 130, 124, 134, 22, 16, 34, 12, 27, 20, 2, 106, 12, 234, 135, 8, 0, 136, 210, 10, 17, 2, 0, 255, 2, 20, 126, 0, 0, 128, 127, 255, 0, 138, 190, 208, 142, 206, 194, 8, 21, 16, 0, 10, 18, 150, 124, 128, 16, 16, 0, 5, 15, 138, 125, 242, 16, 16, 10, 255, 2, 18, 126, 0, 0, 192, 127, 255, 0, 138, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 138, 125, 10, 0, 194, 208, 5, 21, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 209, 8, 19, 42, 0, 5, 19, 132, 125, 8, 19, 16, 0, 7, 19, 152, 125, 8, 11, 14, 0, 242, 10, 138, 125, 242, 14, 10, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 0, 138, 190, 61, 10, 135, 63, 5, 0, 193, 209, 5, 21, 28, 4, 4, 1, 254, 190, 2, 126, 254, 137, 242, 2, 10, 126, 2, 1, 254, 190, 131, 0, 6, 192, 32, 0, 0, 0, 127, 0, 140, 191, 1, 1, 14, 192, 0, 0, 0, 0, 128, 2, 40, 126, 127, 0, 140, 191, 0, 95, 32, 240, 17, 3, 1, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 194, 2, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 2, 134, 255, 255, 0, 0, 0, 255, 128, 146, 16, 0, 16, 0, 1, 255, 1, 134, 255, 255, 0, 0, 2, 8, 2, 146, 0, 9, 0, 146, 1, 10, 1, 146, 3, 2, 10, 192, 0, 0, 0, 0, 3, 1, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 72, 0, 0, 0, 195, 0, 2, 192, 80, 0, 0, 0, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 50, 0, 2, 2, 50, 1, 4, 0, 128, 4, 0, 198, 208, 1, 27, 0, 0, 8, 0, 198, 208, 0, 25, 0, 0, 0, 4, 4, 50, 8, 4, 128, 135, 3, 4, 134, 125, 0, 106, 234, 135, 126, 1, 128, 190, 0, 106, 254, 137, 28, 0, 136, 191, 3, 2, 10, 192, 24, 0, 0, 0, 131, 0, 6, 192, 40, 0, 0, 0, 3, 1, 2, 192, 48, 0, 0, 0, 127, 0, 140, 191, 4, 3, 14, 192, 0, 0, 0, 0, 2, 0, 6, 50, 3, 2, 8, 50, 4, 4, 10, 50, 128, 2, 12, 126, 127, 0, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 3, 1, 10, 192, 56, 0, 0, 0, 5, 2, 14, 192, 0, 0, 0, 0, 127, 0, 140, 191, 4, 0, 14, 50, 5, 2, 16, 50, 6, 4, 18, 50, 128, 2, 20, 126, 112, 15, 140, 191, 0, 95, 32, 240, 7, 3, 2, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 193, 2, 172, 0, 144, 0, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 5, 0, 5, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 0, 134, 255, 255, 0, 0, 0, 8, 0, 146, 131, 0, 6, 192, 0, 0, 0, 0, 67, 0, 2, 192, 72, 0, 0, 0, 127, 0, 140, 191, 0, 2, 0, 128, 0, 0, 0, 50, 1, 0, 136, 125, 106, 32, 128, 190, 20, 0, 136, 191, 3, 2, 10, 192, 24, 0, 0, 0, 131, 0, 2, 192, 40, 0, 0, 0, 127, 0, 140, 191, 4, 3, 10, 192, 0, 0, 0, 0, 2, 0, 2, 50, 127, 0, 140, 191, 0, 32, 12, 224, 1, 1, 3, 128, 131, 0, 2, 192, 56, 0, 0, 0, 5, 1, 10, 192, 0, 0, 0, 0, 127, 0, 140, 191, 2, 0, 0, 50, 112, 15, 140, 191, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 194, 2, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 11, 0, 11, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 2, 134, 255, 255, 0, 0, 0, 255, 128, 146, 16, 0, 16, 0, 1, 255, 1, 134, 255, 255, 0, 0, 2, 8, 2, 146, 0, 9, 0, 146, 1, 10, 1, 146, 3, 1, 6, 192, 0, 0, 0, 0, 195, 0, 2, 192, 72, 0, 0, 0, 127, 0, 140, 191, 2, 4, 2, 128, 2, 0, 0, 50, 3, 0, 136, 125, 106, 32, 130, 190, 29, 0, 136, 191, 3, 2, 10, 192, 24, 0, 0, 0, 3, 1, 2, 192, 40, 0, 0, 0, 127, 0, 140, 191, 4, 3, 10, 192, 0, 0, 0, 0, 4, 0, 6, 50, 127, 0, 140, 191, 0, 32, 12, 224, 3, 3, 3, 128, 3, 3, 10, 192, 8, 0, 0, 0, 3, 1, 10, 192, 56, 0, 0, 0, 5, 4, 14, 192, 0, 0, 0, 0, 127, 0, 140, 191, 1, 14, 1, 128, 0, 12, 0, 128, 1, 6, 1, 128, 0, 5, 0, 128, 4, 0, 14, 50, 1, 4, 18, 50, 0, 2, 16, 50, 128, 2, 20, 126, 112, 15, 140, 191, 0, 95, 32, 240, 7, 3, 4, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 193, 2, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 7, 0, 7, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 2, 134, 255, 255, 0, 0, 0, 255, 128, 146, 16, 0, 16, 0, 1, 255, 1, 134, 255, 255, 0, 0, 2, 8, 2, 146, 0, 9, 0, 146, 1, 10, 1, 146, 3, 1, 6, 192, 0, 0, 0, 0, 195, 0, 2, 192, 72, 0, 0, 0, 127, 0, 140, 191, 2, 4, 2, 128, 2, 0, 0, 50, 3, 0, 136, 125, 106, 32, 130, 190, 31, 0, 136, 191, 3, 2, 10, 192, 24, 0, 0, 0, 3, 3, 10, 192, 8, 0, 0, 0, 3, 1, 6, 192, 40, 0, 0, 0, 127, 0, 140, 191, 67, 3, 2, 192, 48, 0, 0, 0, 4, 4, 14, 192, 0, 0, 0, 0, 1, 14, 1, 128, 0, 12, 0, 128, 127, 0, 140, 191, 1, 13, 1, 128, 0, 5, 0, 128, 4, 0, 6, 50, 1, 4, 10, 50, 0, 2, 8, 50, 128, 2, 12, 126, 0, 95, 0, 240, 3, 1, 4, 0, 3, 0, 2, 192, 56, 0, 0, 0, 5, 1, 10, 192, 0, 0, 0, 0, 127, 0, 140, 191, 0, 0, 0, 50, 112, 15, 140, 191, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 194, 2, 172, 0, 144, 19, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 2, 134, 255, 255, 0, 0, 0, 255, 128, 146, 16, 0, 16, 0, 1, 255, 1, 134, 255, 255, 0, 0, 2, 8, 2, 146, 0, 9, 0, 146, 1, 10, 1, 146, 3, 2, 10, 192, 0, 0, 0, 0, 3, 1, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 96, 0, 0, 0, 195, 0, 2, 192, 104, 0, 0, 0, 127, 0, 140, 191, 2, 8, 2, 128, 0, 10, 0, 128, 2, 0, 0, 50, 0, 2, 2, 50, 1, 4, 0, 128, 4, 0, 193, 208, 1, 27, 0, 0, 8, 0, 193, 208, 0, 25, 0, 0, 0, 4, 4, 50, 8, 4, 128, 134, 3, 4, 136, 125, 0, 106, 128, 134, 0, 32, 128, 190, 55, 0, 136, 191, 3, 2, 10, 192, 80, 0, 0, 0, 127, 0, 140, 191, 8, 0, 14, 50, 9, 2, 16, 50, 10, 4, 18, 50, 131, 0, 2, 192, 112, 0, 0, 0, 3, 1, 6, 192, 24, 0, 0, 0, 127, 0, 140, 191, 2, 130, 0, 191, 30, 0, 133, 191, 2, 129, 0, 191, 13, 0, 132, 191, 3, 2, 10, 192, 48, 0, 0, 0, 2, 3, 14, 192, 0, 0, 0, 0, 127, 0, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 27, 0, 130, 191, 2, 128, 0, 191, 25, 0, 132, 191, 3, 2, 10, 192, 32, 0, 0, 0, 2, 3, 14, 192, 0, 0, 0, 0, 127, 0, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 12, 0, 130, 191, 3, 2, 10, 192, 64, 0, 0, 0, 2, 3, 14, 192, 0, 0, 0, 0, 127, 0, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 193, 2, 172, 0, 144, 0, 0, 0, 11, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 5, 0, 5, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 192, 4, 0, 0, 0, 127, 0, 140, 191, 0, 255, 0, 134, 255, 255, 0, 0, 0, 8, 0, 146, 131, 0, 6, 192, 0, 0, 0, 0, 67, 0, 2, 192, 96, 0, 0, 0, 127, 0, 140, 191, 0, 2, 0, 128, 0, 0, 0, 50, 1, 0, 136, 125, 106, 32, 128, 190, 50, 0, 136, 191, 131, 0, 2, 192, 80, 0, 0, 0, 127, 0, 140, 191, 2, 0, 0, 50, 131, 0, 2, 192, 112, 0, 0, 0, 3, 1, 6, 192, 24, 0, 0, 0, 127, 0, 140, 191, 2, 130, 0, 191, 28, 0, 133, 191, 2, 129, 0, 191, 12, 0, 132, 191, 3, 2, 10, 192, 48, 0, 0, 0, 2, 1, 10, 192, 0, 0, 0, 0, 127, 0, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 25, 0, 130, 191, 2, 128, 0, 191, 23, 0, 132, 191, 3, 2, 10, 192, 32, 0, 0, 0, 2, 1, 10, 192, 0, 0, 0, 0, 127, 0, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 11, 0, 130, 191, 3, 2, 10, 192, 64, 0, 0, 0, 2, 1, 10, 192, 0, 0, 0, 0, 127, 0, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 2, 0, 0, 0, 0, 0, 0, 40, 0, 0, 0, 1, 0, 4, 0, 8, 2, 0, 0, 0, 0, 0, 0, 8, 4, 0, 0, 0, 0, 0, 0, 76, 0, 0, 0, 1, 0, 4, 0, 16, 6, 0, 0, 0, 0, 0, 0, 8, 4, 0, 0, 0, 0, 0, 0, 118, 0, 0, 0, 26, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 4, 0, 0, 0, 0, 0, 0, 149, 0, 0, 0, 26, 0, 5, 0, 0, 5, 0, 0, 0, 0, 0, 0, 204, 4, 0, 0, 0, 0, 0, 0, 180, 0, 0, 0, 26, 0, 5, 0, 0, 10, 0, 0, 0, 0, 0, 0, 8, 2, 0, 0, 0, 0, 0, 0, 209, 0, 0, 0, 26, 0, 5, 0, 0, 13, 0, 0, 0, 0, 0, 0, 28, 13, 0, 0, 0, 0, 0, 0, 249, 0, 0, 0, 26, 0, 5, 0, 0, 27, 0, 0, 0, 0, 0, 0, 8, 2, 0, 0, 0, 0, 0, 0, 33, 1, 0, 0, 26, 0, 5, 0, 0, 30, 0, 0, 0, 0, 0, 0, 148, 1, 0, 0, 0, 0, 0, 0, 58, 1, 0, 0, 26, 0, 5, 0, 0, 32, 0, 0, 0, 0, 0, 0, 208, 1, 0, 0, 0, 0, 0, 0, 90, 1, 0, 0, 26, 0, 5, 0, 0, 34, 0, 0, 0, 0, 0, 0, 216, 1, 0, 0, 0, 0, 0, 0, 122, 1, 0, 0, 26, 0, 5, 0, 0, 36, 0, 0, 0, 0, 0, 0, 112, 2, 0, 0, 0, 0, 0, 0, 144, 1, 0, 0, 26, 0, 5, 0, 0, 39, 0, 0, 0, 0, 0, 0, 12, 2, 0, 0, 0, 0, 0, 0, 170, 1, 0, 0, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 207, 1, 0, 0, 3, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152, 15, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160, 15, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 15, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 200, 15, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 17, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 17, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 19, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 76, 19, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 108, 19, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 116, 19, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 220, 20, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 228, 20, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 23, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 23, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 23, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 23, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160, 24, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 168, 24, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 0, 0, 0, 0, 88, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 3, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 1, 0, 0, 0, 0, 0, 0, 229, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 240, 2, 0, 0, 0, 0, 0, 0, 200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 0, 0, 0, 1, 0, 0, 0, 3, 0, 160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 184, 3, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 1, 0, 0, 0, 7, 0, 192, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 12, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 66, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 55, 0, 0, 0, 0, 0, 0, 128, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 144, 56, 0, 0, 0, 0, 0, 0, 176, 1, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 5, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0}; } } ROCR-Runtime-rocm-5.0.0/src/image/blit_object_gfx9xx.cpp000066400000000000000000001631521420110115200230050ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include namespace rocr { namespace image { uint8_t blit_object_gfx9xx[] = {127, 69, 76, 70, 2, 1, 1, 64, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 224, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 72, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 56, 0, 2, 0, 64, 0, 8, 0, 1, 0, 2, 0, 0, 96, 6, 0, 0, 0, 184, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 96, 5, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 41, 0, 0, 0, 0, 0, 0, 24, 41, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 46, 115, 104, 115, 116, 114, 116, 97, 98, 0, 46, 115, 116, 114, 116, 97, 98, 0, 46, 110, 111, 116, 101, 0, 46, 104, 115, 97, 100, 97, 116, 97, 95, 114, 101, 97, 100, 111, 110, 108, 121, 95, 97, 103, 101, 110, 116, 0, 46, 104, 115, 97, 116, 101, 120, 116, 0, 46, 115, 121, 109, 116, 97, 98, 0, 46, 115, 121, 109, 116, 97, 98, 0, 46, 114, 101, 108, 97, 46, 104, 115, 97, 116, 101, 120, 116, 0, 0, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 69, 88, 80, 95, 69, 80, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 76, 79, 71, 69, 0, 38, 104, 115, 97, 95, 101, 120, 116, 95, 105, 109, 97, 103, 101, 58, 58, 38, 95, 95, 111, 99, 109, 108, 116, 98, 108, 95, 77, 51, 50, 95, 76, 79, 71, 95, 73, 78, 86, 95, 69, 80, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 116, 111, 95, 98, 117, 102, 102, 101, 114, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 98, 117, 102, 102, 101, 114, 95, 116, 111, 95, 105, 109, 97, 103, 101, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 100, 101, 102, 97, 117, 108, 116, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 108, 105, 110, 101, 97, 114, 95, 116, 111, 95, 115, 116, 97, 110, 100, 97, 114, 100, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 115, 116, 97, 110, 100, 97, 114, 100, 95, 116, 111, 95, 108, 105, 110, 101, 97, 114, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 116, 111, 95, 114, 101, 103, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 111, 112, 121, 95, 105, 109, 97, 103, 101, 95, 114, 101, 103, 95, 116, 111, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 108, 101, 97, 114, 95, 105, 109, 97, 103, 101, 95, 107, 101, 114, 110, 101, 108, 0, 38, 95, 95, 99, 108, 101, 97, 114, 95, 105, 109, 97, 103, 101, 95, 49, 100, 98, 95, 107, 101, 114, 110, 101, 108, 0, 95, 95, 104, 115, 97, 95, 115, 101, 99, 116, 105, 111, 110, 46, 104, 115, 97, 100, 97, 116, 97, 95, 114, 101, 97, 100, 111, 110, 108, 121, 95, 97, 103, 101, 110, 116, 0, 95, 95, 104, 115, 97, 95, 115, 101, 99, 116, 105, 111, 110, 46, 104, 115, 97, 116, 101, 120, 116, 0, 0, 0, 0, 4, 0, 0, 0, 8, 0, 0, 0, 1, 0, 0, 0, 65, 77, 68, 0, 1, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 12, 0, 0, 0, 2, 0, 0, 0, 65, 77, 68, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 0, 4, 0, 0, 0, 26, 0, 0, 0, 3, 0, 0, 0, 65, 77, 68, 0, 4, 0, 7, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 77, 68, 0, 65, 77, 68, 71, 80, 85, 0, 0, 4, 0, 0, 0, 41, 0, 0, 0, 4, 0, 0, 0, 65, 77, 68, 0, 25, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 65, 77, 68, 32, 72, 83, 65, 32, 82, 117, 110, 116, 105, 109, 101, 32, 70, 105, 110, 97, 108, 105, 122, 101, 114, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 26, 0, 0, 0, 5, 0, 0, 0, 65, 77, 68, 0, 22, 0, 45, 104, 115, 97, 95, 99, 97, 108, 108, 95, 99, 111, 110, 118, 101, 110, 116, 105, 111, 110, 61, 48, 0, 5, 0, 0, 0, 0, 128, 63, 0, 0, 0, 0, 0, 96, 129, 63, 119, 62, 26, 57, 0, 192, 130, 63, 138, 105, 216, 57, 0, 32, 132, 63, 29, 70, 81, 58, 0, 160, 133, 63, 124, 54, 172, 57, 0, 0, 135, 63, 180, 12, 123, 58, 0, 128, 136, 63, 4, 116, 64, 58, 0, 0, 138, 63, 170, 171, 38, 58, 0, 128, 139, 63, 31, 15, 46, 58, 0, 0, 141, 63, 219, 250, 86, 58, 0, 160, 142, 63, 104, 49, 7, 57, 0, 32, 144, 63, 24, 226, 14, 58, 0, 192, 145, 63, 234, 220, 244, 56, 0, 64, 147, 63, 120, 89, 81, 58, 0, 224, 148, 63, 71, 125, 39, 58, 0, 128, 150, 63, 185, 105, 33, 58, 0, 32, 152, 63, 140, 130, 63, 58, 0, 224, 153, 63, 65, 38, 11, 55, 0, 128, 155, 63, 157, 155, 211, 57, 0, 32, 157, 63, 57, 205, 118, 58, 0, 224, 158, 63, 4, 147, 41, 58, 0, 160, 160, 63, 125, 136, 2, 58, 0, 96, 162, 63, 24, 24, 2, 58, 0, 32, 164, 63, 112, 173, 40, 58, 0, 224, 165, 63, 77, 181, 118, 58, 0, 192, 167, 63, 78, 59, 217, 57, 0, 160, 169, 63, 117, 90, 45, 56, 0, 96, 171, 63, 173, 205, 81, 58, 0, 64, 173, 63, 82, 247, 65, 58, 0, 32, 175, 63, 107, 197, 91, 58, 0, 32, 177, 63, 116, 96, 253, 56, 0, 0, 179, 63, 149, 32, 14, 58, 0, 0, 181, 63, 127, 102, 30, 57, 0, 224, 182, 63, 25, 143, 108, 58, 0, 224, 184, 63, 59, 122, 93, 58, 0, 224, 186, 63, 144, 213, 122, 58, 0, 0, 189, 63, 245, 57, 138, 57, 0, 0, 191, 63, 179, 205, 60, 58, 0, 32, 193, 63, 166, 204, 196, 57, 0, 64, 195, 63, 68, 155, 89, 57, 0, 96, 197, 63, 42, 66, 101, 57, 0, 128, 199, 63, 138, 76, 215, 57, 0, 160, 201, 63, 51, 236, 77, 58, 0, 224, 203, 63, 239, 79, 193, 57, 0, 32, 206, 63, 163, 130, 17, 57, 0, 96, 208, 63, 187, 246, 204, 56, 0, 160, 210, 63, 31, 217, 129, 57, 0, 224, 212, 63, 94, 213, 26, 58, 0, 64, 215, 63, 90, 153, 31, 57, 0, 128, 217, 63, 19, 174, 104, 58, 0, 224, 219, 63, 190, 188, 93, 58, 0, 96, 222, 63, 94, 130, 244, 55, 0, 192, 224, 63, 194, 238, 205, 57, 0, 32, 227, 63, 149, 75, 124, 58, 0, 160, 229, 63, 59, 55, 72, 58, 0, 32, 232, 63, 129, 82, 75, 58, 0, 192, 234, 63, 221, 231, 198, 55, 0, 64, 237, 63, 237, 1, 243, 57, 0, 224, 239, 63, 123, 51, 23, 57, 0, 128, 242, 63, 44, 158, 59, 56, 0, 32, 245, 63, 164, 162, 47, 57, 0, 192, 247, 63, 152, 251, 6, 58, 0, 128, 250, 63, 220, 182, 236, 56, 0, 32, 253, 63, 103, 96, 112, 58, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 59, 65, 172, 41, 52, 0, 0, 126, 60, 252, 176, 168, 53, 0, 192, 189, 60, 234, 131, 141, 54, 0, 16, 252, 60, 120, 14, 27, 54, 0, 240, 28, 61, 254, 185, 135, 54, 0, 160, 59, 61, 101, 236, 49, 54, 0, 16, 90, 61, 25, 113, 221, 54, 0, 80, 120, 61, 69, 0, 195, 53, 0, 32, 139, 61, 81, 119, 155, 55, 0, 0, 154, 61, 13, 203, 235, 55, 0, 208, 168, 61, 131, 159, 131, 55, 0, 128, 183, 61, 229, 138, 82, 55, 0, 16, 198, 61, 24, 235, 162, 55, 0, 144, 212, 61, 149, 116, 218, 54, 0, 240, 226, 61, 183, 30, 169, 54, 0, 48, 241, 61, 21, 183, 131, 55, 0, 96, 255, 61, 219, 49, 17, 55, 0, 176, 6, 62, 104, 62, 63, 56, 0, 176, 13, 62, 151, 106, 21, 56, 0, 160, 20, 62, 15, 124, 41, 56, 0, 128, 27, 62, 15, 16, 126, 56, 0, 96, 34, 62, 101, 182, 21, 56, 0, 48, 41, 62, 161, 227, 229, 55, 0, 240, 47, 62, 83, 56, 24, 56, 0, 176, 54, 62, 157, 113, 254, 53, 0, 80, 61, 62, 8, 129, 68, 56, 0, 240, 67, 62, 144, 50, 80, 56, 0, 144, 74, 62, 232, 57, 53, 55, 0, 16, 81, 62, 241, 15, 94, 56, 0, 144, 87, 62, 64, 167, 100, 56, 0, 16, 94, 62, 45, 116, 134, 55, 0, 112, 100, 62, 205, 227, 123, 56, 0, 224, 106, 62, 62, 173, 133, 54, 0, 48, 113, 62, 21, 183, 3, 56, 0, 128, 119, 62, 220, 203, 173, 55, 0, 192, 125, 62, 175, 54, 12, 56, 0, 0, 130, 62, 211, 82, 22, 55, 0, 16, 133, 62, 57, 113, 146, 56, 0, 32, 136, 62, 215, 252, 197, 56, 0, 48, 139, 62, 213, 85, 174, 56, 0, 64, 142, 62, 105, 193, 24, 56, 0, 64, 145, 62, 231, 253, 160, 56, 0, 64, 148, 62, 239, 9, 173, 56, 0, 64, 151, 62, 225, 186, 98, 56, 0, 48, 154, 62, 76, 205, 238, 56, 0, 48, 157, 62, 210, 170, 152, 55, 0, 32, 160, 62, 26, 26, 66, 55, 0, 0, 163, 62, 14, 225, 197, 56, 0, 240, 165, 62, 238, 42, 191, 55, 0, 208, 168, 62, 45, 135, 45, 56, 0, 176, 171, 62, 138, 46, 238, 55, 0, 128, 174, 62, 172, 223, 222, 56, 0, 96, 177, 62, 185, 242, 2, 56, 0, 48, 180, 62, 155, 30, 72, 56, 0, 0, 183, 62, 43, 170, 14, 56, 0, 192, 185, 62, 93, 251, 235, 56, 0, 144, 188, 62, 221, 95, 37, 56, 0, 80, 191, 62, 130, 59, 120, 56, 0, 16, 194, 62, 30, 218, 81, 56, 0, 208, 196, 62, 5, 27, 78, 55, 0, 128, 199, 62, 155, 67, 143, 56, 0, 48, 202, 62, 16, 14, 202, 56, 0, 224, 204, 62, 139, 192, 202, 56, 0, 144, 207, 62, 95, 246, 145, 56, 0, 64, 210, 62, 203, 33, 129, 55, 0, 224, 212, 62, 154, 154, 108, 56, 0, 128, 215, 62, 35, 153, 148, 56, 0, 32, 218, 62, 204, 123, 119, 56, 0, 192, 220, 62, 38, 45, 177, 55, 0, 80, 223, 62, 211, 206, 166, 56, 0, 224, 225, 62, 230, 211, 235, 56, 0, 112, 228, 62, 205, 227, 251, 56, 0, 0, 231, 62, 194, 133, 215, 56, 0, 144, 233, 62, 0, 126, 126, 56, 0, 16, 236, 62, 197, 146, 243, 56, 0, 160, 238, 62, 131, 9, 212, 55, 0, 32, 241, 62, 124, 26, 8, 56, 0, 160, 243, 62, 173, 195, 132, 55, 0, 16, 246, 62, 35, 233, 204, 56, 0, 144, 248, 62, 175, 95, 15, 56, 0, 0, 251, 62, 56, 253, 145, 56, 0, 112, 253, 62, 188, 71, 172, 56, 0, 224, 255, 62, 43, 4, 151, 56, 0, 32, 1, 63, 210, 82, 41, 57, 0, 80, 2, 63, 212, 206, 111, 57, 0, 144, 3, 63, 115, 112, 249, 55, 0, 192, 4, 63, 174, 158, 94, 56, 0, 240, 5, 63, 74, 200, 101, 56, 0, 32, 7, 63, 163, 11, 19, 56, 0, 64, 8, 63, 22, 207, 121, 57, 0, 112, 9, 63, 201, 202, 56, 57, 0, 160, 10, 63, 244, 210, 195, 56, 0, 192, 11, 63, 236, 93, 117, 57, 0, 240, 12, 63, 103, 180, 230, 56, 0, 16, 14, 63, 184, 15, 92, 57, 0, 64, 15, 63, 224, 188, 62, 56, 0, 96, 16, 63, 146, 209, 220, 56, 0, 128, 17, 63, 223, 107, 24, 57, 0, 160, 18, 63, 76, 231, 45, 57, 0, 192, 19, 63, 68, 9, 47, 57, 0, 224, 20, 63, 97, 255, 27, 57, 0, 0, 22, 63, 68, 237, 233, 56, 0, 32, 23, 63, 200, 109, 104, 56, 0, 48, 24, 63, 167, 153, 107, 57, 0, 80, 25, 63, 137, 156, 9, 57, 0, 112, 26, 63, 115, 118, 162, 55, 0, 128, 27, 63, 163, 218, 11, 57, 0, 144, 28, 63, 171, 105, 112, 57, 0, 176, 29, 63, 255, 73, 132, 56, 0, 192, 30, 63, 56, 53, 1, 57, 0, 208, 31, 63, 104, 194, 45, 57, 0, 224, 32, 63, 35, 244, 71, 57, 0, 240, 33, 63, 124, 241, 79, 57, 0, 0, 35, 63, 14, 225, 69, 57, 0, 16, 36, 63, 245, 232, 41, 57, 0, 32, 37, 63, 176, 93, 248, 56, 0, 48, 38, 63, 153, 95, 115, 56, 0, 48, 39, 63, 219, 8, 108, 57, 0, 64, 40, 63, 0, 230, 9, 57, 0, 80, 41, 63, 111, 153, 180, 55, 0, 80, 42, 63, 204, 51, 18, 57, 0, 80, 43, 63, 217, 234, 124, 57, 0, 96, 44, 63, 205, 181, 173, 56, 0, 96, 45, 63, 26, 38, 32, 57, 0, 96, 46, 63, 54, 238, 88, 57, 0, 112, 47, 63, 5, 73, 170, 53, 0, 112, 48, 63, 30, 209, 203, 55, 0, 112, 49, 63, 244, 253, 5, 56, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 254, 63, 248, 3, 254, 56, 0, 0, 252, 63, 193, 15, 252, 57, 0, 0, 250, 63, 201, 179, 140, 58, 0, 0, 248, 63, 16, 62, 248, 58, 0, 0, 246, 63, 48, 123, 64, 59, 0, 0, 244, 63, 96, 141, 137, 59, 0, 0, 242, 63, 72, 214, 185, 59, 0, 0, 240, 63, 241, 240, 240, 59, 0, 0, 239, 63, 127, 220, 186, 58, 0, 0, 237, 63, 108, 7, 102, 59, 0, 0, 235, 63, 166, 178, 189, 59, 0, 0, 234, 63, 161, 14, 234, 57, 0, 0, 232, 63, 247, 88, 75, 59, 0, 0, 230, 63, 72, 180, 194, 59, 0, 0, 229, 63, 172, 96, 150, 58, 0, 0, 227, 63, 228, 56, 142, 59, 0, 0, 225, 63, 14, 120, 252, 59, 0, 0, 224, 63, 56, 112, 96, 59, 0, 0, 222, 63, 77, 92, 233, 59, 0, 0, 221, 63, 76, 145, 79, 59, 0, 0, 219, 63, 239, 97, 235, 59, 0, 0, 218, 63, 79, 27, 104, 59, 0, 0, 217, 63, 178, 1, 89, 56, 0, 0, 215, 63, 229, 53, 148, 59, 0, 0, 214, 63, 89, 3, 174, 58, 0, 0, 212, 63, 3, 123, 199, 59, 0, 0, 211, 63, 109, 26, 80, 59, 0, 0, 210, 63, 33, 13, 210, 57, 0, 0, 208, 63, 204, 159, 182, 59, 0, 0, 207, 63, 81, 233, 72, 59, 0, 0, 206, 63, 185, 83, 52, 58, 0, 0, 204, 63, 205, 204, 204, 59, 0, 0, 203, 63, 192, 39, 135, 59, 0, 0, 202, 63, 205, 15, 11, 59, 0, 0, 201, 63, 209, 73, 123, 57, 0, 0, 199, 63, 125, 12, 206, 59, 0, 0, 198, 63, 106, 12, 152, 59, 0, 0, 197, 63, 247, 144, 75, 59, 0, 0, 196, 63, 21, 190, 220, 58, 0, 0, 195, 63, 49, 12, 195, 57, 0, 0, 193, 63, 214, 187, 228, 59, 0, 0, 192, 63, 193, 192, 192, 59, 0, 0, 191, 63, 232, 47, 160, 59, 0, 0, 190, 63, 12, 250, 130, 59, 0, 0, 189, 63, 142, 32, 82, 59, 0, 0, 188, 63, 24, 200, 36, 59, 0, 0, 187, 63, 135, 156, 251, 58, 0, 0, 186, 63, 140, 46, 186, 58, 0, 0, 185, 63, 233, 15, 133, 58, 0, 0, 184, 63, 3, 23, 56, 58, 0, 0, 183, 63, 162, 181, 251, 57, 0, 0, 182, 63, 97, 11, 182, 57, 0, 0, 181, 63, 170, 104, 158, 57, 0, 0, 180, 63, 65, 11, 180, 57, 0, 0, 179, 63, 41, 53, 246, 57, 0, 0, 178, 63, 67, 22, 50, 58, 0, 0, 177, 63, 192, 157, 126, 58, 0, 0, 176, 63, 11, 44, 176, 58, 0, 0, 175, 63, 26, 119, 235, 58, 0, 0, 174, 63, 185, 130, 24, 59, 0, 0, 173, 63, 176, 86, 64, 59, 0, 0, 172, 63, 8, 35, 109, 59, 0, 0, 171, 63, 227, 105, 143, 59, 0, 0, 170, 63, 171, 170, 170, 59, 0, 0, 169, 63, 72, 74, 200, 59, 0, 0, 168, 63, 87, 63, 232, 59, 0, 0, 168, 63, 129, 10, 168, 57, 0, 0, 167, 63, 230, 20, 188, 58, 0, 0, 166, 63, 114, 136, 43, 59, 0, 0, 165, 63, 5, 106, 125, 59, 0, 0, 164, 63, 30, 207, 169, 59, 0, 0, 163, 63, 61, 10, 215, 59, 0, 0, 163, 63, 246, 199, 75, 57, 0, 0, 162, 63, 172, 12, 223, 58, 0, 0, 161, 63, 93, 98, 86, 59, 0, 0, 160, 63, 161, 160, 160, 59, 0, 0, 159, 63, 254, 9, 216, 59, 0, 0, 159, 63, 57, 47, 11, 58, 0, 0, 158, 63, 72, 90, 25, 59, 0, 0, 157, 63, 158, 216, 137, 59, 0, 0, 156, 63, 97, 225, 200, 59, 0, 0, 156, 63, 193, 9, 156, 57, 0, 0, 155, 63, 62, 223, 24, 59, 0, 0, 154, 63, 217, 231, 144, 59, 0, 0, 153, 63, 219, 34, 215, 59, 0, 0, 153, 63, 139, 210, 120, 58, 0, 0, 152, 63, 19, 144, 81, 59, 0, 0, 151, 63, 237, 37, 180, 59, 0, 0, 151, 63, 46, 1, 23, 56, 0, 0, 150, 63, 216, 180, 31, 59, 0, 0, 149, 63, 104, 37, 160, 59, 0, 0, 148, 63, 79, 9, 242, 59, 0, 0, 148, 63, 41, 1, 11, 59, 0, 0, 147, 63, 196, 133, 154, 59, 0, 0, 146, 63, 132, 19, 241, 59, 0, 0, 146, 63, 37, 73, 18, 59, 0, 0, 145, 63, 197, 179, 162, 59, 0, 0, 144, 63, 9, 188, 253, 59, 0, 0, 144, 63, 198, 112, 52, 59, 0, 0, 143, 63, 238, 35, 184, 59, 0, 0, 143, 63, 208, 206, 59, 58, 0, 0, 142, 63, 218, 106, 112, 59, 0, 0, 141, 63, 2, 82, 218, 59, 0, 0, 141, 63, 35, 44, 247, 58, 0, 0, 140, 63, 4, 156, 162, 59, 0, 0, 140, 63, 193, 8, 140, 57, 0, 0, 139, 63, 148, 104, 96, 59, 0, 0, 138, 63, 252, 242, 216, 59, 0, 0, 138, 63, 225, 240, 5, 59, 0, 0, 137, 63, 138, 64, 174, 59, 0, 0, 137, 63, 215, 57, 86, 58, 0, 0, 136, 63, 137, 136, 136, 59, 0, 0, 135, 63, 136, 128, 247, 59, 0, 0, 135, 63, 190, 86, 79, 59, 0, 0, 134, 63, 68, 5, 217, 59, 0, 0, 134, 63, 252, 20, 23, 59, 0, 0, 133, 63, 97, 55, 191, 59, 0, 0, 133, 63, 77, 33, 208, 58, 0, 0, 132, 63, 200, 249, 169, 59, 0, 0, 132, 63, 8, 33, 132, 58, 0, 0, 131, 63, 82, 48, 153, 59, 0, 0, 131, 63, 188, 116, 19, 58, 0, 0, 130, 63, 191, 191, 140, 59, 0, 0, 130, 63, 33, 8, 130, 57, 0, 0, 129, 63, 169, 141, 132, 59, 0, 0, 129, 63, 4, 2, 129, 56, 0, 0, 128, 63, 129, 128, 128, 59, 0, 0, 128, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 194, 0, 172, 0, 148, 19, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 11, 0, 11, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 8, 134, 255, 255, 0, 0, 4, 255, 132, 146, 16, 0, 16, 0, 5, 255, 5, 134, 255, 255, 0, 0, 8, 10, 8, 146, 4, 11, 4, 146, 5, 12, 5, 146, 3, 0, 10, 192, 0, 0, 0, 0, 131, 2, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 104, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 112, 0, 0, 0, 0, 2, 6, 126, 2, 2, 8, 126, 0, 0, 255, 209, 8, 6, 2, 4, 1, 0, 255, 209, 4, 8, 6, 4, 10, 2, 6, 126, 2, 0, 193, 208, 1, 27, 0, 0, 12, 0, 136, 125, 2, 0, 255, 209, 5, 6, 10, 4, 106, 2, 130, 134, 127, 192, 140, 191, 1, 4, 136, 125, 2, 106, 128, 134, 0, 32, 128, 190, 199, 0, 136, 191, 131, 0, 6, 192, 24, 0, 0, 0, 3, 1, 6, 192, 56, 0, 0, 0, 3, 2, 2, 192, 64, 0, 0, 0, 131, 2, 6, 192, 72, 0, 0, 0, 67, 2, 2, 192, 128, 0, 0, 0, 3, 3, 10, 192, 136, 0, 0, 0, 159, 0, 6, 34, 159, 2, 8, 34, 127, 192, 140, 191, 5, 0, 134, 210, 1, 25, 0, 0, 4, 0, 133, 210, 4, 25, 0, 0, 6, 0, 133, 210, 1, 27, 0, 0, 4, 0, 255, 209, 4, 11, 26, 4, 5, 0, 133, 210, 1, 25, 0, 0, 5, 106, 25, 209, 5, 1, 2, 0, 4, 7, 6, 56, 4, 0, 14, 104, 5, 2, 16, 104, 8, 4, 18, 104, 159, 4, 12, 34, 1, 4, 14, 192, 0, 0, 0, 0, 128, 2, 20, 126, 127, 192, 140, 191, 0, 95, 0, 240, 7, 7, 4, 0, 0, 0, 134, 210, 2, 29, 0, 0, 1, 0, 133, 210, 6, 29, 0, 0, 4, 0, 133, 210, 2, 31, 0, 0, 0, 0, 255, 209, 1, 1, 18, 4, 1, 0, 133, 210, 2, 29, 0, 0, 1, 106, 25, 209, 1, 11, 2, 0, 0, 7, 0, 56, 2, 0, 134, 210, 1, 19, 0, 0, 0, 0, 133, 210, 0, 19, 0, 0, 0, 5, 0, 104, 1, 0, 133, 210, 1, 19, 0, 0, 3, 106, 25, 209, 1, 21, 0, 0, 11, 2, 4, 126, 0, 5, 8, 56, 131, 0, 6, 192, 120, 0, 0, 0, 3, 1, 6, 192, 32, 0, 0, 0, 127, 192, 140, 191, 2, 132, 0, 191, 85, 0, 133, 191, 3, 2, 6, 192, 40, 0, 0, 0, 2, 130, 0, 191, 41, 0, 132, 191, 3, 132, 0, 191, 29, 0, 133, 191, 3, 130, 0, 191, 12, 0, 132, 191, 0, 0, 143, 210, 130, 6, 2, 0, 0, 106, 25, 209, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 2, 0, 0, 210, 8, 33, 29, 4, 0, 128, 112, 220, 0, 2, 127, 0, 110, 0, 130, 191, 3, 129, 0, 191, 108, 0, 132, 191, 0, 0, 143, 210, 129, 6, 2, 0, 127, 192, 140, 191, 0, 106, 25, 209, 8, 0, 2, 0, 9, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 2, 0, 0, 210, 8, 17, 29, 4, 0, 128, 104, 220, 0, 2, 127, 0, 95, 0, 130, 191, 0, 0, 143, 210, 130, 6, 2, 0, 0, 106, 25, 209, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 0, 128, 116, 220, 0, 7, 127, 0, 85, 0, 130, 191, 2, 129, 0, 191, 83, 0, 132, 191, 3, 132, 0, 191, 26, 0, 133, 191, 3, 130, 0, 191, 11, 0, 132, 191, 0, 0, 143, 210, 129, 6, 2, 0, 127, 192, 140, 191, 0, 106, 25, 209, 8, 0, 2, 0, 9, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 0, 128, 104, 220, 0, 7, 127, 0, 68, 0, 130, 191, 3, 129, 0, 191, 66, 0, 132, 191, 131, 0, 6, 192, 48, 0, 0, 0, 127, 192, 140, 191, 0, 106, 25, 209, 2, 6, 2, 0, 3, 2, 4, 126, 2, 9, 2, 56, 112, 15, 140, 191, 0, 128, 96, 220, 0, 7, 127, 0, 55, 0, 130, 191, 0, 0, 143, 210, 130, 6, 2, 0, 0, 106, 25, 209, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 0, 128, 112, 220, 0, 7, 127, 0, 45, 0, 130, 191, 3, 132, 0, 191, 34, 0, 133, 191, 3, 130, 0, 191, 14, 0, 132, 191, 112, 15, 140, 191, 5, 0, 0, 210, 8, 33, 29, 4, 1, 0, 143, 210, 130, 6, 2, 0, 1, 106, 25, 209, 4, 2, 2, 0, 5, 2, 6, 126, 3, 5, 4, 56, 6, 0, 0, 210, 10, 33, 37, 4, 0, 128, 116, 220, 1, 5, 127, 0, 27, 0, 130, 191, 3, 129, 0, 191, 25, 0, 132, 191, 112, 15, 140, 191, 0, 0, 0, 210, 8, 17, 29, 4, 1, 0, 143, 210, 130, 6, 2, 0, 0, 0, 0, 210, 9, 33, 1, 4, 1, 106, 25, 209, 4, 2, 2, 0, 5, 2, 6, 126, 3, 5, 4, 56, 0, 0, 0, 210, 10, 49, 1, 4, 0, 128, 112, 220, 1, 0, 127, 0, 9, 0, 130, 191, 0, 0, 143, 210, 130, 6, 2, 0, 0, 106, 25, 209, 4, 0, 2, 0, 5, 2, 4, 126, 2, 3, 2, 56, 112, 15, 140, 191, 0, 128, 124, 220, 0, 7, 127, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 132, 0, 172, 0, 148, 19, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 0, 19, 0, 19, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 8, 134, 255, 255, 0, 0, 4, 255, 132, 146, 16, 0, 16, 0, 5, 255, 5, 134, 255, 255, 0, 0, 8, 10, 8, 146, 4, 11, 4, 146, 5, 12, 5, 146, 3, 0, 10, 192, 0, 0, 0, 0, 131, 2, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 88, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 96, 0, 0, 0, 0, 2, 6, 126, 2, 2, 8, 126, 0, 0, 255, 209, 8, 6, 2, 4, 1, 0, 255, 209, 4, 8, 6, 4, 10, 2, 6, 126, 2, 0, 193, 208, 1, 27, 0, 0, 12, 0, 136, 125, 2, 0, 255, 209, 5, 6, 10, 4, 106, 2, 130, 134, 127, 192, 140, 191, 1, 4, 136, 125, 2, 106, 128, 134, 0, 32, 128, 190, 194, 0, 136, 191, 131, 0, 6, 192, 40, 0, 0, 0, 3, 1, 6, 192, 72, 0, 0, 0, 3, 2, 2, 192, 80, 0, 0, 0, 67, 2, 2, 192, 112, 0, 0, 0, 3, 3, 10, 192, 120, 0, 0, 0, 159, 0, 6, 34, 159, 2, 8, 34, 127, 192, 140, 191, 5, 0, 134, 210, 1, 25, 0, 0, 4, 0, 133, 210, 4, 25, 0, 0, 6, 0, 133, 210, 1, 27, 0, 0, 4, 0, 255, 209, 4, 11, 26, 4, 5, 0, 133, 210, 1, 25, 0, 0, 5, 106, 25, 209, 5, 1, 2, 0, 4, 7, 6, 56, 159, 4, 8, 34, 6, 0, 134, 210, 2, 29, 0, 0, 4, 0, 133, 210, 4, 29, 0, 0, 7, 0, 133, 210, 2, 31, 0, 0, 4, 0, 255, 209, 4, 13, 30, 4, 6, 0, 133, 210, 2, 29, 0, 0, 5, 106, 25, 209, 6, 11, 2, 0, 4, 7, 6, 56, 4, 0, 134, 210, 5, 19, 0, 0, 3, 0, 133, 210, 3, 19, 0, 0, 3, 9, 6, 104, 4, 0, 133, 210, 5, 19, 0, 0, 6, 106, 25, 209, 4, 5, 0, 0, 3, 2, 10, 126, 3, 11, 14, 56, 4, 0, 30, 104, 5, 2, 32, 104, 8, 4, 34, 104, 131, 0, 6, 192, 104, 0, 0, 0, 3, 2, 6, 192, 24, 0, 0, 0, 127, 192, 140, 191, 2, 132, 0, 191, 78, 0, 133, 191, 2, 130, 0, 191, 40, 0, 132, 191, 3, 130, 0, 191, 14, 0, 132, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 128, 80, 220, 3, 0, 127, 3, 112, 15, 140, 191, 249, 2, 12, 126, 3, 6, 5, 0, 249, 2, 10, 126, 3, 6, 4, 0, 57, 0, 130, 191, 3, 129, 0, 191, 13, 0, 132, 191, 3, 0, 143, 210, 129, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 128, 72, 220, 3, 0, 127, 3, 112, 15, 140, 191, 136, 6, 12, 32, 249, 2, 10, 126, 3, 6, 0, 0, 42, 0, 130, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 128, 84, 220, 3, 0, 127, 5, 33, 0, 130, 191, 2, 129, 0, 191, 29, 0, 132, 191, 3, 130, 0, 191, 9, 0, 132, 191, 3, 0, 143, 210, 129, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 128, 72, 220, 3, 0, 127, 5, 19, 0, 130, 191, 3, 129, 0, 191, 7, 0, 132, 191, 3, 106, 25, 209, 8, 12, 2, 0, 9, 2, 10, 126, 5, 15, 8, 56, 0, 128, 64, 220, 3, 0, 127, 5, 10, 0, 130, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 128, 80, 220, 3, 0, 127, 5, 1, 0, 130, 191, 2, 2, 10, 126, 3, 2, 12, 126, 5, 2, 16, 126, 4, 2, 14, 126, 46, 0, 130, 191, 3, 129, 0, 191, 18, 0, 132, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 128, 80, 220, 3, 0, 127, 3, 112, 15, 140, 191, 249, 2, 16, 126, 3, 6, 3, 0, 249, 2, 14, 126, 3, 6, 2, 0, 249, 2, 12, 126, 3, 6, 1, 0, 249, 2, 10, 126, 3, 6, 0, 0, 26, 0, 130, 191, 3, 0, 143, 210, 130, 12, 2, 0, 3, 106, 25, 209, 8, 6, 2, 0, 9, 2, 10, 126, 5, 9, 8, 56, 0, 128, 80, 220, 3, 0, 127, 5, 3, 130, 0, 191, 12, 0, 132, 191, 4, 128, 80, 220, 3, 0, 127, 3, 112, 15, 140, 191, 249, 2, 16, 126, 3, 6, 5, 0, 249, 2, 14, 126, 3, 6, 4, 0, 249, 2, 12, 126, 5, 6, 5, 0, 249, 2, 10, 126, 5, 6, 4, 0, 4, 0, 130, 191, 12, 128, 80, 220, 3, 0, 127, 8, 4, 128, 84, 220, 3, 0, 127, 6, 131, 0, 6, 192, 32, 0, 0, 0, 127, 192, 140, 191, 1, 1, 14, 192, 0, 0, 0, 0, 128, 2, 36, 126, 112, 0, 140, 191, 0, 95, 32, 240, 15, 5, 1, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 0, 172, 0, 148, 19, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 8, 134, 255, 255, 0, 0, 4, 255, 132, 146, 16, 0, 16, 0, 5, 255, 5, 134, 255, 255, 0, 0, 8, 10, 8, 146, 4, 11, 4, 146, 5, 12, 5, 146, 3, 0, 10, 192, 0, 0, 0, 0, 131, 2, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 72, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 80, 0, 0, 0, 0, 2, 6, 126, 2, 2, 8, 126, 0, 0, 255, 209, 8, 6, 2, 4, 1, 0, 255, 209, 4, 8, 6, 4, 10, 2, 6, 126, 2, 0, 198, 208, 1, 27, 0, 0, 12, 0, 134, 125, 2, 0, 255, 209, 5, 6, 10, 4, 106, 2, 130, 135, 127, 192, 140, 191, 1, 4, 134, 125, 2, 106, 234, 135, 126, 1, 128, 190, 0, 106, 254, 137, 28, 0, 136, 191, 3, 2, 10, 192, 24, 0, 0, 0, 131, 0, 6, 192, 40, 0, 0, 0, 3, 1, 2, 192, 48, 0, 0, 0, 127, 192, 140, 191, 4, 3, 14, 192, 0, 0, 0, 0, 2, 0, 6, 104, 3, 2, 8, 104, 4, 4, 10, 104, 128, 2, 12, 126, 127, 192, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 3, 1, 10, 192, 56, 0, 0, 0, 5, 2, 14, 192, 0, 0, 0, 0, 127, 192, 140, 191, 4, 0, 14, 104, 5, 2, 16, 104, 6, 4, 18, 104, 128, 2, 20, 126, 112, 15, 140, 191, 0, 95, 32, 240, 7, 3, 2, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 133, 0, 172, 0, 148, 19, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 21, 0, 21, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 8, 134, 255, 255, 0, 0, 4, 255, 132, 146, 16, 0, 16, 0, 5, 255, 5, 134, 255, 255, 0, 0, 8, 10, 8, 146, 4, 11, 4, 146, 5, 12, 5, 146, 3, 0, 10, 192, 0, 0, 0, 0, 131, 2, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 72, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 80, 0, 0, 0, 0, 2, 6, 126, 2, 2, 8, 126, 0, 0, 255, 209, 8, 6, 2, 4, 1, 0, 255, 209, 4, 8, 6, 4, 10, 2, 6, 126, 2, 0, 198, 208, 1, 27, 0, 0, 12, 0, 134, 125, 2, 0, 255, 209, 5, 6, 10, 4, 106, 2, 130, 135, 127, 192, 140, 191, 1, 4, 134, 125, 2, 106, 234, 135, 126, 1, 128, 190, 0, 106, 254, 137, 233, 2, 136, 191, 131, 0, 6, 192, 24, 0, 0, 0, 3, 1, 6, 192, 40, 0, 0, 0, 3, 2, 2, 192, 48, 0, 0, 0, 127, 192, 140, 191, 1, 3, 14, 192, 0, 0, 0, 0, 4, 0, 6, 104, 5, 2, 8, 104, 8, 4, 10, 104, 128, 2, 12, 126, 127, 192, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 112, 15, 140, 191, 106, 0, 16, 208, 3, 7, 1, 0, 3, 0, 0, 209, 3, 1, 169, 1, 126, 1, 130, 190, 4, 0, 91, 208, 3, 229, 1, 0, 227, 0, 136, 191, 126, 1, 132, 190, 8, 0, 81, 208, 3, 1, 1, 0, 128, 2, 6, 126, 4, 126, 254, 137, 220, 0, 136, 191, 255, 0, 136, 190, 28, 46, 77, 59, 126, 1, 138, 190, 8, 0, 81, 208, 3, 17, 0, 0, 255, 6, 6, 10, 82, 184, 78, 65, 10, 126, 254, 137, 242, 6, 6, 10, 210, 0, 136, 191, 255, 6, 14, 38, 255, 255, 255, 127, 242, 14, 16, 4, 255, 0, 136, 190, 0, 0, 128, 61, 106, 1, 75, 208, 8, 17, 0, 0, 126, 1, 136, 190, 8, 106, 254, 137, 7, 105, 16, 126, 70, 0, 136, 191, 129, 16, 18, 36, 255, 16, 16, 104, 0, 0, 128, 0, 255, 18, 18, 104, 0, 0, 0, 1, 255, 16, 20, 38, 0, 0, 127, 0, 255, 18, 18, 38, 0, 0, 1, 0, 9, 21, 18, 104, 249, 2, 20, 126, 9, 6, 5, 0, 128, 2, 22, 126, 10, 0, 143, 210, 131, 20, 2, 0, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 12, 106, 25, 209, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 56, 0, 128, 84, 220, 12, 0, 127, 12, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 10, 106, 25, 209, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 56, 0, 128, 84, 220, 10, 0, 127, 10, 255, 16, 16, 38, 255, 255, 127, 0, 240, 18, 18, 40, 240, 16, 16, 40, 9, 17, 16, 4, 113, 15, 140, 191, 13, 17, 18, 10, 12, 17, 18, 44, 255, 2, 28, 126, 171, 170, 170, 62, 255, 0, 140, 190, 0, 0, 128, 62, 7, 103, 30, 126, 12, 18, 28, 44, 12, 0, 193, 209, 12, 17, 38, 132, 193, 30, 30, 104, 14, 0, 193, 209, 9, 29, 194, 3, 9, 19, 32, 10, 13, 17, 24, 44, 15, 11, 16, 126, 14, 33, 24, 44, 255, 0, 140, 190, 244, 253, 5, 56, 12, 0, 193, 209, 8, 25, 48, 132, 112, 15, 140, 191, 12, 23, 24, 2, 8, 21, 16, 46, 0, 112, 49, 63, 12, 19, 30, 4, 255, 18, 28, 42, 0, 0, 0, 128, 8, 31, 26, 2, 8, 126, 254, 137, 8, 17, 18, 10, 21, 0, 136, 191, 8, 19, 20, 10, 255, 2, 22, 126, 171, 170, 42, 62, 255, 0, 140, 190, 37, 73, 18, 62, 12, 16, 22, 44, 8, 23, 22, 48, 205, 204, 76, 62, 8, 23, 22, 48, 0, 0, 128, 62, 8, 23, 22, 48, 171, 170, 170, 62, 10, 23, 20, 10, 241, 18, 28, 10, 15, 0, 193, 209, 9, 227, 41, 132, 15, 17, 26, 4, 255, 20, 24, 42, 0, 0, 0, 128, 255, 16, 16, 42, 0, 0, 0, 128, 8, 1, 254, 190, 8, 27, 20, 4, 15, 29, 18, 4, 15, 21, 20, 2, 12, 19, 18, 4, 255, 26, 22, 38, 0, 240, 255, 255, 9, 21, 18, 2, 13, 23, 16, 4, 9, 17, 16, 2, 255, 16, 18, 10, 0, 160, 42, 56, 11, 19, 18, 46, 0, 160, 42, 56, 8, 19, 16, 46, 0, 80, 213, 62, 11, 17, 18, 46, 0, 80, 213, 62, 255, 18, 20, 10, 59, 170, 184, 66, 10, 17, 20, 126, 191, 20, 24, 38, 131, 24, 24, 36, 255, 0, 137, 190, 85, 85, 85, 85, 255, 0, 136, 190, 85, 85, 85, 85, 12, 106, 25, 209, 8, 24, 2, 0, 9, 2, 26, 126, 13, 106, 28, 209, 13, 1, 169, 1, 0, 128, 84, 220, 12, 0, 127, 12, 255, 0, 136, 190, 0, 80, 213, 62, 10, 11, 28, 126, 11, 0, 193, 209, 8, 22, 38, 132, 14, 19, 30, 46, 0, 0, 49, 188, 8, 23, 16, 2, 14, 31, 22, 46, 239, 47, 228, 183, 8, 23, 22, 2, 255, 2, 28, 126, 171, 170, 42, 62, 255, 0, 136, 190, 171, 170, 42, 61, 8, 22, 28, 44, 14, 0, 193, 209, 14, 23, 194, 3, 11, 23, 30, 10, 14, 31, 22, 44, 255, 0, 136, 190, 8, 227, 130, 180, 255, 0, 137, 190, 24, 114, 177, 66, 112, 15, 140, 191, 13, 23, 26, 44, 12, 0, 68, 208, 8, 17, 0, 0, 9, 18, 132, 124, 12, 23, 26, 44, 106, 12, 140, 134, 9, 18, 130, 124, 134, 20, 16, 34, 12, 27, 20, 2, 106, 12, 234, 135, 8, 0, 136, 210, 10, 17, 2, 0, 255, 2, 20, 126, 0, 0, 128, 127, 255, 0, 136, 190, 208, 142, 206, 194, 8, 21, 16, 0, 8, 18, 150, 124, 128, 16, 16, 0, 3, 15, 138, 125, 242, 16, 16, 10, 255, 2, 18, 126, 0, 0, 192, 127, 255, 0, 136, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 138, 125, 8, 0, 194, 208, 3, 17, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 209, 8, 19, 34, 0, 3, 19, 132, 125, 8, 19, 16, 0, 7, 19, 152, 125, 8, 7, 14, 0, 242, 6, 138, 125, 242, 14, 6, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 0, 136, 190, 61, 10, 135, 63, 3, 0, 193, 209, 3, 17, 28, 4, 4, 1, 254, 190, 2, 126, 254, 137, 242, 2, 6, 126, 2, 1, 254, 190, 106, 0, 16, 208, 4, 7, 1, 0, 4, 0, 0, 209, 4, 1, 169, 1, 242, 8, 156, 124, 2, 106, 254, 134, 227, 0, 136, 191, 126, 1, 132, 190, 8, 0, 81, 208, 4, 1, 1, 0, 128, 2, 8, 126, 4, 126, 254, 137, 220, 0, 136, 191, 255, 0, 136, 190, 28, 46, 77, 59, 126, 1, 138, 190, 8, 0, 81, 208, 4, 17, 0, 0, 255, 8, 8, 10, 82, 184, 78, 65, 10, 126, 254, 137, 242, 8, 8, 10, 210, 0, 136, 191, 255, 8, 14, 38, 255, 255, 255, 127, 242, 14, 16, 4, 255, 0, 136, 190, 0, 0, 128, 61, 106, 1, 75, 208, 8, 17, 0, 0, 126, 1, 136, 190, 8, 106, 254, 137, 7, 105, 16, 126, 70, 0, 136, 191, 129, 16, 18, 36, 255, 16, 16, 104, 0, 0, 128, 0, 255, 18, 18, 104, 0, 0, 0, 1, 255, 16, 20, 38, 0, 0, 127, 0, 255, 18, 18, 38, 0, 0, 1, 0, 9, 21, 18, 104, 249, 2, 20, 126, 9, 6, 5, 0, 128, 2, 22, 126, 10, 0, 143, 210, 131, 20, 2, 0, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 12, 106, 25, 209, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 56, 0, 128, 84, 220, 12, 0, 127, 12, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 10, 106, 25, 209, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 56, 0, 128, 84, 220, 10, 0, 127, 10, 255, 16, 16, 38, 255, 255, 127, 0, 240, 18, 18, 40, 240, 16, 16, 40, 9, 17, 16, 4, 113, 15, 140, 191, 13, 17, 18, 10, 12, 17, 18, 44, 255, 2, 28, 126, 171, 170, 170, 62, 255, 0, 140, 190, 0, 0, 128, 62, 7, 103, 30, 126, 12, 18, 28, 44, 12, 0, 193, 209, 12, 17, 38, 132, 193, 30, 30, 104, 14, 0, 193, 209, 9, 29, 194, 3, 9, 19, 32, 10, 13, 17, 24, 44, 15, 11, 16, 126, 14, 33, 24, 44, 255, 0, 140, 190, 244, 253, 5, 56, 12, 0, 193, 209, 8, 25, 48, 132, 112, 15, 140, 191, 12, 23, 24, 2, 8, 21, 16, 46, 0, 112, 49, 63, 12, 19, 30, 4, 255, 18, 28, 42, 0, 0, 0, 128, 8, 31, 26, 2, 8, 126, 254, 137, 8, 17, 18, 10, 21, 0, 136, 191, 8, 19, 20, 10, 255, 2, 22, 126, 171, 170, 42, 62, 255, 0, 140, 190, 37, 73, 18, 62, 12, 16, 22, 44, 8, 23, 22, 48, 205, 204, 76, 62, 8, 23, 22, 48, 0, 0, 128, 62, 8, 23, 22, 48, 171, 170, 170, 62, 10, 23, 20, 10, 241, 18, 28, 10, 15, 0, 193, 209, 9, 227, 41, 132, 15, 17, 26, 4, 255, 20, 24, 42, 0, 0, 0, 128, 255, 16, 16, 42, 0, 0, 0, 128, 8, 1, 254, 190, 8, 27, 20, 4, 15, 29, 18, 4, 15, 21, 20, 2, 12, 19, 18, 4, 255, 26, 22, 38, 0, 240, 255, 255, 9, 21, 18, 2, 13, 23, 16, 4, 9, 17, 16, 2, 255, 16, 18, 10, 0, 160, 42, 56, 11, 19, 18, 46, 0, 160, 42, 56, 8, 19, 16, 46, 0, 80, 213, 62, 11, 17, 18, 46, 0, 80, 213, 62, 255, 18, 20, 10, 59, 170, 184, 66, 10, 17, 20, 126, 191, 20, 24, 38, 131, 24, 24, 36, 255, 0, 137, 190, 85, 85, 85, 85, 255, 0, 136, 190, 85, 85, 85, 85, 12, 106, 25, 209, 8, 24, 2, 0, 9, 2, 26, 126, 13, 106, 28, 209, 13, 1, 169, 1, 0, 128, 84, 220, 12, 0, 127, 12, 255, 0, 136, 190, 0, 80, 213, 62, 10, 11, 28, 126, 11, 0, 193, 209, 8, 22, 38, 132, 14, 19, 30, 46, 0, 0, 49, 188, 8, 23, 16, 2, 14, 31, 22, 46, 239, 47, 228, 183, 8, 23, 22, 2, 255, 2, 28, 126, 171, 170, 42, 62, 255, 0, 136, 190, 171, 170, 42, 61, 8, 22, 28, 44, 14, 0, 193, 209, 14, 23, 194, 3, 11, 23, 30, 10, 14, 31, 22, 44, 255, 0, 136, 190, 8, 227, 130, 180, 255, 0, 137, 190, 24, 114, 177, 66, 112, 15, 140, 191, 13, 23, 26, 44, 12, 0, 68, 208, 8, 17, 0, 0, 9, 18, 132, 124, 12, 23, 26, 44, 106, 12, 140, 134, 9, 18, 130, 124, 134, 20, 16, 34, 12, 27, 20, 2, 106, 12, 234, 135, 8, 0, 136, 210, 10, 17, 2, 0, 255, 2, 20, 126, 0, 0, 128, 127, 255, 0, 136, 190, 208, 142, 206, 194, 8, 21, 16, 0, 8, 18, 150, 124, 128, 16, 16, 0, 4, 15, 138, 125, 242, 16, 16, 10, 255, 2, 18, 126, 0, 0, 192, 127, 255, 0, 136, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 138, 125, 8, 0, 194, 208, 4, 17, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 209, 8, 19, 34, 0, 4, 19, 132, 125, 8, 19, 16, 0, 7, 19, 152, 125, 8, 9, 14, 0, 242, 8, 138, 125, 242, 14, 8, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 0, 136, 190, 61, 10, 135, 63, 4, 0, 193, 209, 4, 17, 28, 4, 4, 1, 254, 190, 2, 126, 254, 137, 242, 2, 8, 126, 2, 1, 254, 190, 3, 2, 10, 192, 56, 0, 0, 0, 106, 0, 16, 208, 5, 7, 1, 0, 5, 0, 0, 209, 5, 1, 169, 1, 127, 192, 140, 191, 8, 0, 34, 104, 9, 2, 36, 104, 10, 4, 38, 104, 126, 1, 130, 190, 4, 0, 91, 208, 5, 229, 1, 0, 227, 0, 136, 191, 126, 1, 132, 190, 8, 0, 81, 208, 5, 1, 1, 0, 128, 2, 10, 126, 4, 126, 254, 137, 220, 0, 136, 191, 255, 0, 136, 190, 28, 46, 77, 59, 126, 1, 138, 190, 8, 0, 81, 208, 5, 17, 0, 0, 255, 10, 10, 10, 82, 184, 78, 65, 10, 126, 254, 137, 242, 10, 10, 10, 210, 0, 136, 191, 255, 10, 14, 38, 255, 255, 255, 127, 242, 14, 16, 4, 255, 0, 136, 190, 0, 0, 128, 61, 106, 1, 75, 208, 8, 17, 0, 0, 126, 1, 136, 190, 8, 106, 254, 137, 7, 105, 16, 126, 70, 0, 136, 191, 129, 16, 18, 36, 255, 16, 16, 104, 0, 0, 128, 0, 255, 18, 18, 104, 0, 0, 0, 1, 255, 16, 20, 38, 0, 0, 127, 0, 255, 18, 18, 38, 0, 0, 1, 0, 9, 21, 18, 104, 249, 2, 20, 126, 9, 6, 5, 0, 128, 2, 22, 126, 10, 0, 143, 210, 131, 20, 2, 0, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 12, 106, 25, 209, 12, 20, 2, 0, 13, 2, 26, 126, 13, 23, 26, 56, 0, 128, 84, 220, 12, 0, 127, 12, 255, 0, 141, 190, 85, 85, 85, 85, 255, 0, 140, 190, 85, 85, 85, 85, 10, 106, 25, 209, 12, 20, 2, 0, 13, 2, 28, 126, 14, 23, 22, 56, 0, 128, 84, 220, 10, 0, 127, 10, 255, 16, 16, 38, 255, 255, 127, 0, 240, 18, 18, 40, 240, 16, 16, 40, 9, 17, 16, 4, 113, 15, 140, 191, 13, 17, 18, 10, 12, 17, 18, 44, 255, 2, 28, 126, 171, 170, 170, 62, 255, 0, 140, 190, 0, 0, 128, 62, 7, 103, 30, 126, 12, 18, 28, 44, 12, 0, 193, 209, 12, 17, 38, 132, 193, 30, 30, 104, 14, 0, 193, 209, 9, 29, 194, 3, 9, 19, 32, 10, 13, 17, 24, 44, 15, 11, 16, 126, 14, 33, 24, 44, 255, 0, 140, 190, 244, 253, 5, 56, 12, 0, 193, 209, 8, 25, 48, 132, 112, 15, 140, 191, 12, 23, 24, 2, 8, 21, 16, 46, 0, 112, 49, 63, 12, 19, 26, 4, 255, 18, 28, 42, 0, 0, 0, 128, 8, 27, 30, 2, 8, 126, 254, 137, 8, 17, 18, 10, 21, 0, 136, 191, 8, 19, 20, 10, 255, 2, 22, 126, 171, 170, 42, 62, 255, 0, 140, 190, 37, 73, 18, 62, 12, 16, 22, 44, 8, 23, 22, 48, 205, 204, 76, 62, 8, 23, 22, 48, 0, 0, 128, 62, 8, 23, 22, 48, 171, 170, 170, 62, 10, 23, 20, 10, 241, 18, 28, 10, 13, 0, 193, 209, 9, 227, 41, 132, 13, 17, 30, 4, 255, 20, 24, 42, 0, 0, 0, 128, 255, 16, 16, 42, 0, 0, 0, 128, 8, 1, 254, 190, 8, 31, 16, 4, 13, 29, 20, 4, 13, 17, 16, 2, 12, 21, 18, 4, 255, 30, 20, 38, 0, 240, 255, 255, 9, 17, 16, 2, 15, 21, 18, 4, 8, 19, 16, 2, 255, 16, 18, 10, 0, 160, 42, 56, 10, 19, 18, 46, 0, 160, 42, 56, 8, 19, 16, 46, 0, 80, 213, 62, 10, 17, 18, 46, 0, 80, 213, 62, 255, 18, 22, 10, 59, 170, 184, 66, 11, 17, 22, 126, 191, 22, 24, 38, 131, 24, 24, 36, 255, 0, 137, 190, 85, 85, 85, 85, 255, 0, 136, 190, 85, 85, 85, 85, 12, 106, 25, 209, 8, 24, 2, 0, 9, 2, 26, 126, 13, 106, 28, 209, 13, 1, 169, 1, 0, 128, 84, 220, 12, 0, 127, 12, 255, 0, 136, 190, 0, 80, 213, 62, 11, 11, 28, 126, 10, 0, 193, 209, 8, 20, 38, 132, 14, 19, 30, 46, 0, 0, 49, 188, 8, 21, 16, 2, 14, 31, 20, 46, 239, 47, 228, 183, 8, 21, 20, 2, 255, 2, 28, 126, 171, 170, 42, 62, 255, 0, 136, 190, 171, 170, 42, 61, 8, 20, 28, 44, 14, 0, 193, 209, 14, 21, 194, 3, 10, 21, 30, 10, 14, 31, 20, 44, 255, 0, 136, 190, 8, 227, 130, 180, 255, 0, 137, 190, 24, 114, 177, 66, 112, 15, 140, 191, 13, 21, 26, 44, 12, 0, 68, 208, 8, 17, 0, 0, 9, 18, 132, 124, 12, 21, 26, 44, 106, 12, 140, 134, 9, 18, 130, 124, 134, 22, 16, 34, 12, 27, 20, 2, 106, 12, 234, 135, 8, 0, 136, 210, 10, 17, 2, 0, 255, 2, 20, 126, 0, 0, 128, 127, 255, 0, 136, 190, 208, 142, 206, 194, 8, 21, 16, 0, 8, 18, 150, 124, 128, 16, 16, 0, 5, 15, 138, 125, 242, 16, 16, 10, 255, 2, 18, 126, 0, 0, 192, 127, 255, 0, 136, 190, 0, 0, 128, 255, 8, 19, 16, 0, 128, 14, 138, 125, 8, 0, 194, 208, 5, 17, 0, 0, 128, 16, 16, 0, 255, 2, 18, 126, 0, 0, 128, 127, 8, 0, 0, 209, 8, 19, 34, 0, 5, 19, 132, 125, 8, 19, 16, 0, 7, 19, 152, 125, 8, 11, 14, 0, 242, 10, 138, 125, 242, 14, 10, 0, 255, 2, 14, 126, 174, 71, 97, 189, 255, 0, 136, 190, 61, 10, 135, 63, 5, 0, 193, 209, 5, 17, 28, 4, 4, 1, 254, 190, 2, 126, 254, 137, 242, 2, 10, 126, 2, 1, 254, 190, 131, 0, 6, 192, 32, 0, 0, 0, 127, 192, 140, 191, 1, 1, 14, 192, 0, 0, 0, 0, 128, 2, 40, 126, 127, 192, 140, 191, 0, 95, 32, 240, 17, 3, 1, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 0, 172, 0, 148, 19, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 8, 134, 255, 255, 0, 0, 4, 255, 132, 146, 16, 0, 16, 0, 5, 255, 5, 134, 255, 255, 0, 0, 8, 10, 8, 146, 4, 11, 4, 146, 5, 12, 5, 146, 3, 0, 10, 192, 0, 0, 0, 0, 131, 2, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 72, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 80, 0, 0, 0, 0, 2, 6, 126, 2, 2, 8, 126, 0, 0, 255, 209, 8, 6, 2, 4, 1, 0, 255, 209, 4, 8, 6, 4, 10, 2, 6, 126, 2, 0, 198, 208, 1, 27, 0, 0, 12, 0, 134, 125, 2, 0, 255, 209, 5, 6, 10, 4, 106, 2, 130, 135, 127, 192, 140, 191, 1, 4, 134, 125, 2, 106, 234, 135, 126, 1, 128, 190, 0, 106, 254, 137, 28, 0, 136, 191, 3, 2, 10, 192, 24, 0, 0, 0, 131, 0, 6, 192, 40, 0, 0, 0, 3, 1, 2, 192, 48, 0, 0, 0, 127, 192, 140, 191, 4, 3, 14, 192, 0, 0, 0, 0, 2, 0, 6, 104, 3, 2, 8, 104, 4, 4, 10, 104, 128, 2, 12, 126, 127, 192, 140, 191, 0, 95, 0, 240, 3, 3, 3, 0, 3, 1, 10, 192, 56, 0, 0, 0, 5, 2, 14, 192, 0, 0, 0, 0, 127, 192, 140, 191, 4, 0, 14, 104, 5, 2, 16, 104, 6, 4, 18, 104, 128, 2, 20, 126, 112, 15, 140, 191, 0, 95, 32, 240, 7, 3, 2, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 129, 0, 172, 0, 148, 0, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 0, 5, 0, 5, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 2, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 4, 134, 255, 255, 0, 0, 4, 10, 4, 146, 3, 0, 6, 192, 0, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 72, 0, 0, 0, 0, 2, 2, 126, 0, 0, 255, 209, 4, 2, 2, 4, 126, 1, 130, 190, 127, 192, 140, 191, 0, 0, 209, 208, 0, 3, 0, 0, 20, 0, 136, 191, 3, 2, 10, 192, 24, 0, 0, 0, 3, 0, 2, 192, 40, 0, 0, 0, 127, 192, 140, 191, 4, 3, 10, 192, 0, 0, 0, 0, 0, 0, 2, 104, 127, 192, 140, 191, 0, 32, 12, 224, 1, 1, 3, 128, 3, 0, 2, 192, 56, 0, 0, 0, 5, 1, 10, 192, 0, 0, 0, 0, 127, 192, 140, 191, 0, 0, 0, 104, 112, 15, 140, 191, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 195, 0, 172, 0, 148, 19, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 13, 0, 13, 0, 0, 0, 28, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 8, 134, 255, 255, 0, 0, 4, 255, 132, 146, 16, 0, 16, 0, 5, 255, 5, 134, 255, 255, 0, 0, 8, 10, 8, 146, 4, 11, 4, 146, 5, 12, 5, 146, 3, 0, 6, 192, 0, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 72, 0, 0, 0, 0, 2, 6, 126, 0, 0, 255, 209, 8, 6, 2, 4, 126, 1, 130, 190, 127, 192, 140, 191, 0, 0, 209, 208, 0, 3, 0, 0, 29, 0, 136, 191, 3, 2, 14, 192, 8, 0, 0, 0, 3, 0, 2, 192, 40, 0, 0, 0, 127, 192, 140, 191, 6, 4, 10, 192, 0, 0, 0, 0, 0, 0, 6, 104, 127, 192, 140, 191, 0, 32, 12, 224, 3, 3, 4, 128, 3, 4, 10, 192, 56, 0, 0, 0, 7, 5, 14, 192, 0, 0, 0, 0, 5, 10, 0, 128, 127, 192, 140, 191, 18, 2, 14, 126, 4, 8, 1, 128, 17, 2, 16, 126, 16, 0, 18, 104, 11, 0, 255, 209, 0, 14, 10, 4, 10, 0, 255, 209, 1, 16, 6, 4, 128, 2, 24, 126, 112, 15, 140, 191, 0, 95, 32, 240, 9, 3, 5, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 194, 0, 172, 0, 148, 19, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 9, 0, 9, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 8, 134, 255, 255, 0, 0, 4, 255, 132, 146, 16, 0, 16, 0, 5, 255, 5, 134, 255, 255, 0, 0, 8, 10, 8, 146, 4, 11, 4, 146, 5, 12, 5, 146, 3, 0, 6, 192, 0, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 72, 0, 0, 0, 0, 2, 6, 126, 0, 0, 255, 209, 8, 6, 2, 4, 126, 1, 130, 190, 127, 192, 140, 191, 0, 0, 209, 208, 0, 3, 0, 0, 31, 0, 136, 191, 3, 2, 14, 192, 8, 0, 0, 0, 3, 0, 6, 192, 40, 0, 0, 0, 127, 192, 140, 191, 67, 2, 2, 192, 48, 0, 0, 0, 6, 4, 14, 192, 0, 0, 0, 0, 5, 10, 5, 128, 127, 192, 140, 191, 9, 2, 6, 126, 4, 8, 4, 128, 1, 2, 8, 126, 0, 0, 10, 104, 7, 0, 255, 209, 5, 6, 10, 4, 6, 0, 255, 209, 4, 8, 6, 4, 128, 2, 16, 126, 0, 95, 0, 240, 5, 1, 4, 0, 3, 0, 2, 192, 56, 0, 0, 0, 7, 1, 10, 192, 0, 0, 0, 0, 127, 192, 140, 191, 0, 0, 0, 104, 112, 15, 140, 191, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 0, 172, 0, 148, 19, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 11, 0, 11, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 8, 134, 255, 255, 0, 0, 4, 255, 132, 146, 16, 0, 16, 0, 5, 255, 5, 134, 255, 255, 0, 0, 8, 10, 8, 146, 4, 11, 4, 146, 5, 12, 5, 146, 3, 0, 10, 192, 0, 0, 0, 0, 131, 2, 6, 192, 16, 0, 0, 0, 3, 3, 6, 192, 96, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 104, 0, 0, 0, 0, 2, 6, 126, 2, 2, 8, 126, 0, 0, 255, 209, 8, 6, 2, 4, 1, 0, 255, 209, 4, 8, 6, 4, 10, 2, 6, 126, 2, 0, 193, 208, 1, 27, 0, 0, 12, 0, 136, 125, 2, 0, 255, 209, 5, 6, 10, 4, 106, 2, 130, 134, 127, 192, 140, 191, 1, 4, 136, 125, 2, 106, 128, 134, 0, 32, 128, 190, 55, 0, 136, 191, 3, 2, 10, 192, 80, 0, 0, 0, 127, 192, 140, 191, 8, 0, 14, 104, 9, 2, 16, 104, 10, 4, 18, 104, 131, 0, 2, 192, 112, 0, 0, 0, 3, 1, 6, 192, 24, 0, 0, 0, 127, 192, 140, 191, 2, 130, 0, 191, 30, 0, 133, 191, 2, 129, 0, 191, 13, 0, 132, 191, 3, 2, 10, 192, 48, 0, 0, 0, 2, 3, 14, 192, 0, 0, 0, 0, 127, 192, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 27, 0, 130, 191, 2, 128, 0, 191, 25, 0, 132, 191, 3, 2, 10, 192, 32, 0, 0, 0, 2, 3, 14, 192, 0, 0, 0, 0, 127, 192, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 12, 0, 130, 191, 3, 2, 10, 192, 64, 0, 0, 0, 2, 3, 14, 192, 0, 0, 0, 0, 127, 192, 140, 191, 8, 2, 0, 126, 9, 2, 2, 126, 10, 2, 4, 126, 11, 2, 6, 126, 128, 2, 20, 126, 0, 95, 32, 240, 7, 0, 3, 0, 0, 0, 129, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 0, 172, 0, 148, 0, 0, 0, 43, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 0, 5, 0, 5, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 2, 192, 4, 0, 0, 0, 127, 192, 140, 191, 4, 255, 4, 134, 255, 255, 0, 0, 4, 10, 4, 146, 3, 0, 6, 192, 0, 0, 0, 0, 127, 192, 140, 191, 67, 0, 2, 192, 96, 0, 0, 0, 0, 2, 2, 126, 0, 0, 255, 209, 4, 2, 2, 4, 126, 1, 130, 190, 127, 192, 140, 191, 0, 0, 209, 208, 0, 3, 0, 0, 50, 0, 136, 191, 3, 0, 2, 192, 80, 0, 0, 0, 127, 192, 140, 191, 0, 0, 0, 104, 3, 0, 2, 192, 112, 0, 0, 0, 3, 1, 6, 192, 24, 0, 0, 0, 127, 192, 140, 191, 0, 130, 0, 191, 28, 0, 133, 191, 0, 129, 0, 191, 12, 0, 132, 191, 3, 2, 10, 192, 48, 0, 0, 0, 2, 1, 10, 192, 0, 0, 0, 0, 127, 192, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 25, 0, 130, 191, 0, 128, 0, 191, 23, 0, 132, 191, 3, 2, 10, 192, 32, 0, 0, 0, 2, 1, 10, 192, 0, 0, 0, 0, 127, 192, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 11, 0, 130, 191, 3, 2, 10, 192, 64, 0, 0, 0, 2, 1, 10, 192, 0, 0, 0, 0, 127, 192, 140, 191, 8, 2, 2, 126, 9, 2, 4, 126, 10, 2, 6, 126, 11, 2, 8, 126, 0, 32, 28, 224, 0, 1, 1, 128, 0, 0, 129, 191, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 2, 0, 0, 0, 0, 0, 0, 40, 0, 0, 0, 1, 0, 4, 0, 8, 2, 0, 0, 0, 0, 0, 0, 8, 4, 0, 0, 0, 0, 0, 0, 76, 0, 0, 0, 1, 0, 4, 0, 16, 6, 0, 0, 0, 0, 0, 0, 8, 4, 0, 0, 0, 0, 0, 0, 118, 0, 0, 0, 26, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 188, 4, 0, 0, 0, 0, 0, 0, 149, 0, 0, 0, 26, 0, 5, 0, 0, 5, 0, 0, 0, 0, 0, 0, 168, 4, 0, 0, 0, 0, 0, 0, 180, 0, 0, 0, 26, 0, 5, 0, 0, 10, 0, 0, 0, 0, 0, 0, 20, 2, 0, 0, 0, 0, 0, 0, 209, 0, 0, 0, 26, 0, 5, 0, 0, 13, 0, 0, 0, 0, 0, 0, 72, 13, 0, 0, 0, 0, 0, 0, 249, 0, 0, 0, 26, 0, 5, 0, 0, 27, 0, 0, 0, 0, 0, 0, 20, 2, 0, 0, 0, 0, 0, 0, 33, 1, 0, 0, 26, 0, 5, 0, 0, 30, 0, 0, 0, 0, 0, 0, 160, 1, 0, 0, 0, 0, 0, 0, 58, 1, 0, 0, 26, 0, 5, 0, 0, 32, 0, 0, 0, 0, 0, 0, 220, 1, 0, 0, 0, 0, 0, 0, 90, 1, 0, 0, 26, 0, 5, 0, 0, 34, 0, 0, 0, 0, 0, 0, 228, 1, 0, 0, 0, 0, 0, 0, 122, 1, 0, 0, 26, 0, 5, 0, 0, 36, 0, 0, 0, 0, 0, 0, 124, 2, 0, 0, 0, 0, 0, 0, 144, 1, 0, 0, 26, 0, 5, 0, 0, 39, 0, 0, 0, 0, 0, 0, 24, 2, 0, 0, 0, 0, 0, 0, 170, 1, 0, 0, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 207, 1, 0, 0, 3, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 15, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 184, 15, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 216, 15, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 224, 15, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 72, 17, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 17, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100, 19, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 108, 19, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 140, 19, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 148, 19, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 252, 20, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 21, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52, 23, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 23, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92, 23, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100, 23, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 204, 24, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 212, 24, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 0, 0, 0, 0, 88, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 3, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 1, 0, 0, 0, 0, 0, 0, 229, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 240, 2, 0, 0, 0, 0, 0, 0, 200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 0, 0, 0, 1, 0, 0, 0, 3, 0, 160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 184, 3, 0, 0, 0, 0, 0, 0, 24, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 1, 0, 0, 0, 7, 0, 192, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 24, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 66, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 55, 0, 0, 0, 0, 0, 0, 128, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152, 56, 0, 0, 0, 0, 0, 0, 176, 1, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 5, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0}; } } ROCR-Runtime-rocm-5.0.0/src/image/blit_src/000077500000000000000000000000001420110115200202755ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/blit_src/CMakeLists.txt000066400000000000000000000200241420110115200230330ustar00rootroot00000000000000################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ cmake_minimum_required ( VERSION 3.7 ) # Flag to abort before executing after default initialization of cache variables set (QUIT 0) # Import target 'clang' find_package(Clang REQUIRED HINTS ${CMAKE_INSTALL_PREFIX}/llvm ${CMAKE_PREFIX_PATH}/llvm PATHS /opt/rocm/llvm ) # Device libs doesn't support find_package yet. set(PREFIX_HINTS "") foreach(hint "/amdgcn/bitcode" "/lib/bitcode" "/lib/x86_64/bitcode") foreach(path ${CMAKE_PREFIX_PATH}) string(APPEND path ${hint}) list(APPEND PREFIX_HINTS ${path}) endforeach(path) endforeach(hint) get_include_path(BITCODE_DIR "Bitcode library path" RESULT FOUND NAMES "opencl.bc" "opencl.amdgcn.bc" HINTS "${CMAKE_INSTALL_PREFIX}/amdgcn/bitcode" "${CMAKE_INSTALL_PREFIX}/lib/bitcode" "${CMAKE_INSTALL_PREFIX}/lib/x86_64/bitcode" "${PREFIX_HINTS}" PATHS "/opt/rocm/amdgcn/bitcode" "/opt/rocm/lib/bitcode" "/opt/rocm/lib" "/opt/rocm/lib/x86_64/bitcode") if (NOT ${FOUND}) set (QUIT 1) endif() # Determine the target devices if not specified if (NOT DEFINED TARGET_DEVICES) set (TARGET_DEVICES "gfx700;gfx701;gfx702;gfx801;gfx802;gfx803;gfx805;gfx810;gfx900;gfx902;gfx904;gfx906;gfx908;gfx909;gfx90a;gfx90c;gfx1010;gfx1011;gfx1012;gfx1013;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035") endif() set( TARGET_DEVICES ${TARGET_DEVICES} CACHE STRING "Build targets" FORCE ) # End of default configuration and path checking. # Quit if configuration is incomplete. if (QUIT) message(FATAL_ERROR "Configuration halted.") return() endif() if(${CMAKE_VERBOSE_MAKEFILE}) get_property(clang_path TARGET clang PROPERTY LOCATION) message("Using clang from: ${clang_path}") message("Build Setting:") message(" Target Devices*: ${TARGET_DEVICES}") message(" (Specify \";\" separated list of target IDs.)") message(" Clang path: ${clang_path}") message(" Bitcode Dir: ${BITCODE_DIR}") endif() ##========================================== ## Add custom command to generate a kernel code object file ##========================================== function(gen_kernel_bc TARGET_ID INPUT_FILE OUTPUT_FILE) string (REGEX MATCH "^gfx([^:]+)" GFXIP "${TARGET_ID}") set (GFXIP_NUMBER "${CMAKE_MATCH_1}") # Report syntactically invalid target IDs and terminate. if (NOT GFXIP) message(FATAL_ERROR "Invalid target (${TARGET_ID}) specified for generating BLIT kerenel") return() endif() # Determine if device-libs is following old or new layout if(EXISTS "${BITCODE_DIR}/opencl.amdgcn.bc") set(BITCODE_ARGS "-nogpulib -Xclang -mlink-bitcode-file -Xclang ${BITCODE_DIR}/opencl.amdgcn.bc -Xclang -mlink-bitcode-file -Xclang ${BITCODE_DIR}/ockl.amdgcn.bc -Xclang -mlink-bitcode-file -Xclang ${BITCODE_DIR}/ocml.amdgcn.bc -Xclang -mlink-bitcode-file -Xclang ${BITCODE_DIR}/oclc_daz_opt_on.amdgcn.bc -Xclang -mlink-bitcode-file -Xclang ${BITCODE_DIR}/oclc_isa_version_${GFXIP_NUMBER}.amdgcn.bc -Xclang -mlink-bitcode-file -Xclang ${BITCODE_DIR}/oclc_unsafe_math_off.amdgcn.bc -Xclang -mlink-bitcode-file -Xclang ${BITCODE_DIR}/oclc_finite_only_off.amdgcn.bc") else() set(BITCODE_ARGS "--hip-device-lib-path=${BITCODE_DIR}") endif() separate_arguments(CLANG_ARG_LIST UNIX_COMMAND "-O2 -x cl -cl-denorms-are-zero -cl-std=CL2.0 -target amdgcn-amd-amdhsa -Xclang -finclude-default-header -mcpu=${TARGET_ID} ${BITCODE_ARGS} -o ${OUTPUT_FILE} ${INPUT_FILE}") ## Add custom command to produce a code object file. ## This depends on the kernel source file & compiler. ## It does not pickup devicelib changes. It is not clear ## how to do that after conversion to --rocm-path is done. add_custom_command(OUTPUT ${OUTPUT_FILE} COMMAND clang ${CLANG_ARG_LIST} DEPENDS ${INPUT_FILE} clang COMMENT "BUILDING bitcode for ${OUTPUT_FILE}..." VERBATIM) if(${CMAKE_VERBOSE_MAKEFILE}) message(" Kernel Source: " ${INPUT_FILE}) message(" Kernel Bitcode: " ${OUTPUT_FILE}) endif() endfunction(gen_kernel_bc) ##========================================== ## Find device code object name and forward to custom command ##========================================== function(build_kernel BLIT_NAME TARGET_ID) string (REGEX MATCH "^gfx([^:]+)" GFXIP "${TARGET_ID}") # Report syntactically invalid target IDs and terminate. if (NOT GFXIP) message(FATAL_ERROR "Invalid target (${TARGET_ID}) specified for generating BLIT kerenel (${BLIT_NAME})") return() endif() ## generate kernel bitcodes set (CODE_OBJECT_FILE "${BLIT_NAME}_${GFXIP}") set (CL_FILE ${CMAKE_CURRENT_SOURCE_DIR}/imageblit_kernels.cl) gen_kernel_bc(${TARGET_ID} ${CL_FILE} ${CODE_OBJECT_FILE}) ## Build a list of code object file names ## These will be target dependencies. set (HSACO_TARG_LIST ${HSACO_TARG_LIST} "${CODE_OBJECT_FILE}" PARENT_SCOPE) endfunction(build_kernel) ##========================================== ## Build the kernel for a list of devices ##========================================== function(build_kernel_for_devices BLIT_NAME) set(HSACO_TARG_LIST "") foreach(dev ${TARGET_DEVICES}) if(${CMAKE_VERBOSE_MAKEFILE}) message("\n Generating: ${dev} ...") endif() build_kernel(${BLIT_NAME} ${dev}) endforeach(dev) set(HSACO_TARG_LIST ${HSACO_TARG_LIST} PARENT_SCOPE) endfunction(build_kernel_for_devices) ##========================================== ## Create BLIT Code Object blobs file ##========================================== function(generate_blit_file BFILE) ## Add a custom command that generates opencl_blit_objects.cpp ## This depends on all the generated code object files and the C++ generator script. add_custom_command(OUTPUT ${BFILE}.cpp COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/create_hsaco_ascii_file.sh ${CMAKE_CURRENT_BINARY_DIR}/${BFILE}.cpp DEPENDS ${HSACO_TARG_LIST} create_hsaco_ascii_file.sh ) ## Export a target that builds (and depends on) opencl_blit_objects.cpp add_custom_target( ${BFILE} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${BFILE}.cpp ) endfunction(generate_blit_file) build_kernel_for_devices("ocl_blit_object") generate_blit_file("opencl_blit_objects") ROCR-Runtime-rocm-5.0.0/src/image/blit_src/README.md000066400000000000000000000050071420110115200215560ustar00rootroot00000000000000## OVERVIEW This directory contains the CMakeLists.txt for automatically generating the ASCII code object file, "opencl_blit_objects.cpp", which contains the blobs of the code object of the Image BLIT kernels for the devices supported on ROCm. The blobs are loaded by the image library and required to update whenever a new device is introduced. ## ADD NEW DEVICE To add a new supported device, the following steps are required: 1. Declare an extern variable of the device XXX, by adding the line of "extern uint32_t ocl_blit_object_gfxNNN[];" in "blit_kernel.cpp". 2. Update the BlitKernel::GetPatchedBlitObject() function to support the device by assigning "blit_code_object" to "ocl_blit_object_gfxNNN[]". 3. Add the target to the TARGET_DEVICES list in CMakeLists.txt. Specify using the target ID syntax which is the target GFX IP name, optionally followed by the settings for the target features such as XNACK and SRAMECC. If omitted, a target feature defaults to producing code that will execute on any setting. For example, "gfx908" for code that will run on any setting, or "gfx908:sramecc+:xnack-" for code that will only run if SRAMECC is enabled and XNACK is disabled. 4. Rebuild the image library. ## REQUIREMENT In order to create the code object file, the bitcodes of the kernels are generated by the compiler and the following bitcode libraries are required, opencl.bc ocml.bc irif.bc oclc_correctly_rounded_sqrt_off.bc oclc_daz_opt_on.bc oclc_finite_only_off.bc oclc_isa_version_.bc oclc_unsafe_math_off.bc where is the gfxip number of the GPU. The directory contains the bitcode libraries is specified in a CMake varaible. There are several variables are required for CMake to build the code object file. All of them have default values, and defined as following: OPENCL_DIR - the location of installed OpenCL (Default: /opt/rocm/opencl) BITCODE_DIR - the directory contains the bitcode library (Default: /opt/rocm/amdgcn/bitcode) LLVM_DIR - the directory contains the clang, llvm-link and llvm-dis executables (Default: ${PROJECT_BUILD_DIR}/../lightning/bin) TARGET_DEVICES - list of gpu types for kernel builds (eg. "gfx900;gfx902") (Default: "gfx900;gfx902;gfx904") ## STEPS TO BUILD $ make build $ cd build $ cmake -D${OPENCL_DIR} -D${BITCODE_DIR} -D${LLVM_DIR} -D${TARGET_DEVICES} .. $ make opencl_blit_objects.cpp ROCR-Runtime-rocm-5.0.0/src/image/blit_src/create_hsaco_ascii_file.sh000077500000000000000000000051401420110115200254230ustar00rootroot00000000000000#!/bin/bash -e ################################################################################ ## ## The University of Illinois/NCSA ## Open Source License (NCSA) ## ## Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. ## ## Developed by: ## ## AMD Research and AMD HSA Software Development ## ## Advanced Micro Devices, Inc. ## ## www.amd.com ## ## Permission is hereby granted, free of charge, to any person obtaining a copy ## of this software and associated documentation files (the "Software"), to ## deal with the Software without restriction, including without limitation ## the rights to use, copy, modify, merge, publish, distribute, sublicense, ## and/or sell copies of the Software, and to permit persons to whom the ## Software is furnished to do so, subject to the following conditions: ## ## - Redistributions of source code must retain the above copyright notice, ## this list of conditions and the following disclaimers. ## - Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimers in ## the documentation and/or other materials provided with the distribution. ## - Neither the names of Advanced Micro Devices, Inc, ## nor the names of its contributors may be used to endorse or promote ## products derived from this Software without specific prior written ## permission. ## ## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ## THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ## OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ## ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER ## DEALINGS WITH THE SOFTWARE. ## ################################################################################ opencl_blit_file="$1" if ! command -v xxd >/dev/null then echo "xxd not found!" exit 1 fi # Create the file in a temporary location and then move it in atomically { cat < "$opencl_blit_file" ROCR-Runtime-rocm-5.0.0/src/image/blit_src/imageblit_kernels.cl000066400000000000000000000541521420110115200243040ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// /// Kernel code for HSA image import/export/copy/clear in OpenCL C form. uint4 read_image(__read_only image1d_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, uint format, int4 coords) { switch (format) { case 0: // 1D return read_imageui(src1d, coords.x); break; case 1: // 2D return read_imageui(src2d, coords.xy); break; case 2: // 3D return read_imageui(src3d, coords); break; case 3: // 1DA return read_imageui(src1da, coords.xy); break; case 4: // 2DA return read_imageui(src2da, coords); break; // case 5: //1DB // return read_imageui(src1db, coords.x); // break; default: // Critical failure. return 0; } } void write_image(__write_only image1d_t src1d, __write_only image2d_t src2d, __write_only image3d_t src3d, __write_only image1d_array_t src1da, __write_only image2d_array_t src2da, uint format, int4 coords, uint4 texel) { switch (format) { case 0: // 1D write_imageui(src1d, coords.x, texel); break; case 1: // 2D write_imageui(src2d, coords.xy, texel); break; case 2: // 3D write_imageui(src3d, coords, texel); break; case 3: // 1DA write_imageui(src1da, coords.xy, texel); break; case 4: // 2DA write_imageui(src2da, coords, texel); break; // case 5: //1DB // write_imageui(src1db, coords.x, texel); // break; default: // Critical failure. return; } } float4 read_image_float(__read_only image1d_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, uint format, int4 coords) { switch (format) { case 0: // 1D return read_imagef(src1d, coords.x); break; case 1: // 2D return read_imagef(src2d, coords.xy); break; case 2: // 3D return read_imagef(src3d, coords); break; case 3: // 1DA return read_imagef(src1da, coords.xy); break; case 4: // 2DA return read_imagef(src2da, coords); break; default: // Critical failure. return 0; } } void write_image_float(__write_only image1d_t src1d, __write_only image2d_t src2d, __write_only image3d_t src3d, __write_only image1d_array_t src1da, __write_only image2d_array_t src2da, uint format, int4 coords, float4 texel) { switch (format) { case 0: // 1D write_imagef(src1d, coords.x, texel); break; case 1: // 2D write_imagef(src2d, coords.xy, texel); break; case 2: // 3D write_imagef(src3d, coords, texel); break; case 3: // 1DA write_imagef(src1da, coords.xy, texel); break; case 4: // 2DA write_imagef(src2da, coords, texel); break; default: // Critical failure. return; } } void write_image_int(__write_only image1d_t src1d, __write_only image2d_t src2d, __write_only image3d_t src3d, __write_only image1d_array_t src1da, __write_only image2d_array_t src2da, uint format, int4 coords, int4 texel) { switch (format) { case 0: // 1D write_imagei(src1d, coords.x, texel); break; case 1: // 2D write_imagei(src2d, coords.xy, texel); break; case 2: // 3D write_imagei(src3d, coords, texel); break; case 3: // 1DA write_imagei(src1da, coords.xy, texel); break; case 4: // 2DA write_imagei(src2da, coords, texel); break; default: // Critical failure. return; } } //image handle is repeated since OCL doesn't allow pointers to or casting of images. //dst is start of output pixel in destination buffer //format.x is element count //format.y is element size //format.z is max(dword per pixel, 1) //format.w is texture type. //srcOrigin is start pixel address. //No export for 64, 96, 128 bit formats __kernel void copy_image_to_buffer( __read_only image1d_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, __global void* const dst, int4 srcOrigin, uint4 format, ulong pitch, ulong slice_pitch) { ulong idxDst; int4 coordsSrc; uint4 texel; __global uchar* const dstUChar = (__global uchar* const)dst; __global ushort* const dstUShort = (__global ushort* const)dst; __global uint* const dstUInt = (__global uint* const)dst; coordsSrc.x = get_global_id(0); coordsSrc.y = get_global_id(1); coordsSrc.z = get_global_id(2); coordsSrc.w = 0; idxDst = (coordsSrc.z * slice_pitch + coordsSrc.y * pitch + coordsSrc.x) * format.z; coordsSrc.x += srcOrigin.x; coordsSrc.y += srcOrigin.y; coordsSrc.z += srcOrigin.z; texel = read_image(src1d, src2d, src3d, src1da, src2da, format.w, coordsSrc); // Check components switch (format.x) { case 1: // Check size switch (format.y) { case 1: dstUChar[idxDst] = texel.x; break; case 2: dstUShort[idxDst] = texel.x; break; case 4: dstUInt[idxDst] = texel.x; break; } break; case 2: // Check size switch (format.y) { case 1: dstUShort[idxDst] = texel.x | (texel.y << 8); break; case 2: dstUInt[idxDst] = texel.x | (texel.y << 16); break; case 4: dstUInt[idxDst++] = texel.x; dstUInt[idxDst] = texel.y; break; } break; case 4: // Check size switch (format.y) { case 1: dstUInt[idxDst] = texel.x | (texel.y << 8) | (texel.z << 16) | (texel.w << 24); break; case 2: dstUInt[idxDst++] = texel.x | (texel.y << 16); dstUInt[idxDst] = texel.z | (texel.w << 16); break; case 4: dstUInt[idxDst++] = texel.x; dstUInt[idxDst++] = texel.y; dstUInt[idxDst++] = texel.z; dstUInt[idxDst] = texel.w; break; } break; } } __kernel void copy_buffer_to_image(__global uint* src, __write_only image1d_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int4 dstOrigin, uint4 format, ulong pitch, ulong slice_pitch) { ulong idxSrc; int4 coordsDst; uint4 texel; __global uint* srcUInt = src; __global ushort* srcUShort = (__global ushort*)src; __global uchar* srcUChar = (__global uchar*)src; ushort tmpUShort; uint tmpUInt; coordsDst.x = get_global_id(0); coordsDst.y = get_global_id(1); coordsDst.z = get_global_id(2); coordsDst.w = 0; idxSrc = (coordsDst.z * slice_pitch + coordsDst.y * pitch + coordsDst.x) * format.z; coordsDst.x += dstOrigin.x; coordsDst.y += dstOrigin.y; coordsDst.z += dstOrigin.z; // Check components switch (format.x) { case 1: // Check size switch (format.y) { case 1: texel.x = (uint)srcUChar[idxSrc]; break; case 2: texel.x = (uint)srcUShort[idxSrc]; break; case 4: texel.x = srcUInt[idxSrc]; break; } break; case 2: // Check size switch (format.y) { case 1: tmpUShort = srcUShort[idxSrc]; texel.x = (uint)(tmpUShort & 0xff); texel.y = (uint)(tmpUShort >> 8); break; case 2: tmpUInt = srcUInt[idxSrc]; texel.x = (tmpUInt & 0xffff); texel.y = (tmpUInt >> 16); break; case 4: texel.x = srcUInt[idxSrc++]; texel.y = srcUInt[idxSrc]; break; } break; case 4: // Check size switch (format.y) { case 1: tmpUInt = srcUInt[idxSrc]; texel.x = tmpUInt & 0xff; texel.y = (tmpUInt >> 8) & 0xff; texel.z = (tmpUInt >> 16) & 0xff; texel.w = (tmpUInt >> 24) & 0xff; break; case 2: tmpUInt = srcUInt[idxSrc++]; texel.x = tmpUInt & 0xffff; texel.y = (tmpUInt >> 16); tmpUInt = srcUInt[idxSrc]; texel.z = tmpUInt & 0xffff; texel.w = (tmpUInt >> 16); break; case 4: texel.x = srcUInt[idxSrc++]; texel.y = srcUInt[idxSrc++]; texel.z = srcUInt[idxSrc++]; texel.w = srcUInt[idxSrc]; break; } break; } // Write the final pixel write_image(dst1d, dst2d, dst3d, dst1da, dst2da, format.w, coordsDst, texel); } __kernel void copy_image_default(__read_only image1d_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, __write_only image1d_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int4 srcOrigin, int4 dstOrigin, int srcFormat, int dstFormat) { int4 coordsDst; int4 coordsSrc; coordsDst.x = get_global_id(0); coordsDst.y = get_global_id(1); coordsDst.z = get_global_id(2); coordsDst.w = 0; coordsSrc = srcOrigin + coordsDst; coordsDst += dstOrigin; uint4 texel; texel = read_image(src1d, src2d, src3d, src1da, src2da, srcFormat, coordsSrc); write_image(dst1d, dst2d, dst3d, dst1da, dst2da, dstFormat, coordsDst, texel); } float linear_to_standard_rgba(float l_val) { float s_val = l_val; if (isnan(s_val)) s_val = 0.0f; if (s_val > 1.0f) { s_val = 1.0f; } else if (s_val < 0.0f) { s_val = 0.0f; } else if (s_val < 0.0031308f) { s_val = 12.92f * s_val; } else { s_val = (1.055f * pow(s_val, 5.0f / 12.0f)) - 0.055f; } return s_val; } __kernel void copy_image_linear_to_standard( __read_only image1d_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, int srcFormat, __write_only image1d_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int dstFormat, int4 srcOrigin, int4 dstOrigin) { int4 coordsDst; int4 coordsSrc; coordsDst.x = get_global_id(0); coordsDst.y = get_global_id(1); coordsDst.z = get_global_id(2); coordsDst.w = 0; coordsSrc = srcOrigin + coordsDst; coordsDst += dstOrigin; float4 texel; texel = read_image_float(src1d, src2d, src3d, src1da, src2da, srcFormat, coordsSrc); texel.x = linear_to_standard_rgba(texel.x); texel.y = linear_to_standard_rgba(texel.y); texel.z = linear_to_standard_rgba(texel.z); write_image_float(dst1d, dst2d, dst3d, dst1da, dst2da, dstFormat, coordsDst, texel); } __kernel void copy_image_standard_to_linear( __read_only image1d_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, int srcFormat, __write_only image1d_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int dstFormat, int4 srcOrigin, int4 dstOrigin) { int4 coordsDst; int4 coordsSrc; coordsDst.x = get_global_id(0); coordsDst.y = get_global_id(1); coordsDst.z = get_global_id(2); coordsDst.w = 0; coordsSrc = srcOrigin + coordsDst; coordsDst += dstOrigin; float4 texel; texel = read_image_float(src1d, src2d, src3d, src1da, src2da, srcFormat, coordsSrc); write_image_float(dst1d, dst2d, dst3d, dst1da, dst2da, dstFormat, coordsDst, texel); } __kernel void copy_image_1db( __read_only image1d_buffer_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, int srcFormat, __write_only image1d_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int dstFormat, int4 srcOrigin, int4 dstOrigin) { int coordDst; int coordSrc; coordDst = get_global_id(0); coordSrc = srcOrigin.x + coordDst; coordDst += dstOrigin.x; uint4 texel; texel = read_imageui(src1d, coordSrc); write_imageui(dst1d, coordDst, texel); } __kernel void copy_image_1db_to_reg( __read_only image1d_buffer_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, int srcFormat, __write_only image1d_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int dstFormat, int4 srcOrigin, int4 dstOrigin) { int4 coordsDst; int coordSrc; coordsDst.x = get_global_id(0); coordsDst.y = get_global_id(1); coordsDst.z = get_global_id(2); coordsDst.w = 0; coordSrc = srcOrigin.x + coordsDst.x; coordsDst += dstOrigin; uint4 texel; texel = read_imageui(src1d, coordSrc); write_imageui(dst1d, coordsDst.x, texel); } __kernel void copy_image_reg_to_1db( __read_only image1d_t src1d, __read_only image2d_t src2d, __read_only image3d_t src3d, __read_only image1d_array_t src1da, __read_only image2d_array_t src2da, int srcFormat, __write_only image1d_buffer_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int dstFormat, int4 srcOrigin, int4 dstOrigin) { int coordDst; int4 coordsSrc; coordsSrc.x = get_global_id(0); coordsSrc.y = get_global_id(1); coordsSrc.z = get_global_id(2); coordsSrc.w = 0; coordDst = dstOrigin.x + coordsSrc.x; coordsSrc += srcOrigin; uint4 texel; texel = read_imageui(src1d, coordsSrc.x); write_imageui(dst1d, coordDst, texel); } __kernel void clear_image(__write_only image1d_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int dstFormat, uint type, uint4 fill_data, int4 origin) { int4 coords; coords.x = get_global_id(0); coords.y = get_global_id(1); coords.z = get_global_id(2); coords.w = 0; coords += origin; // Check components switch (type) { case 0: write_image_float(dst1d, dst2d, dst3d, dst1da, dst2da, dstFormat, coords, *(float4*)&fill_data); break; case 1: write_image_int(dst1d, dst2d, dst3d, dst1da, dst2da, dstFormat, coords, *(int4*)&fill_data); break; case 2: write_image(dst1d, dst2d, dst3d, dst1da, dst2da, dstFormat, coords, fill_data); break; } } __kernel void clear_image_1db(__write_only image1d_buffer_t dst1d, __write_only image2d_t dst2d, __write_only image3d_t dst3d, __write_only image1d_array_t dst1da, __write_only image2d_array_t dst2da, int dstFormat, uint4 fill_data, int4 origin, uint type) { int4 coords; coords.x = get_global_id(0); coords += origin; // Check components switch (type) { case 0: write_imagef(dst1d, coords.x, *(float4*)&fill_data); break; case 1: write_imagei(dst1d, coords.x, *(int4*)&fill_data); break; case 2: write_imageui(dst1d, coords.x, fill_data); break; } } ROCR-Runtime-rocm-5.0.0/src/image/device_info.cpp000077500000000000000000000135331420110115200214620ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include #include #include "core/inc/hsa_internal.h" #include "device_info.h" #include "addrlib/src/amdgpu_asic_addr.h" namespace rocr { namespace image { uint32_t MajorVerFromDevID(uint32_t dev_id) { return dev_id >> 8; } uint32_t MinorVerFromDevID(uint32_t dev_id) { return (dev_id >> 4) & 0xF; } uint32_t StepFromDevID(uint32_t dev_id) { return dev_id & 0xF; } hsa_status_t GetGPUAsicID(hsa_agent_t agent, uint32_t *chip_id) { char asic_name[64]; assert(chip_id != nullptr); hsa_status_t status = HSA::hsa_agent_get_info( agent, static_cast(HSA_AGENT_INFO_NAME), &asic_name); assert(status == HSA_STATUS_SUCCESS); if (status != HSA_STATUS_SUCCESS) { return status; } std::string a_str(asic_name); assert(a_str.compare(0, 3, "gfx", 3) == 0); a_str.erase(0,3); // Load chip_id accounting for stepping and minor in hex and major in dec. *chip_id = std::stoi(a_str.substr(a_str.length() - 2), nullptr, 16); *chip_id += (std::stoi(a_str.substr(0, a_str.length() - 2)) << 8); return HSA_STATUS_SUCCESS; } uint32_t DevIDToAddrLibFamily(uint32_t dev_id) { uint32_t major_ver = MajorVerFromDevID(dev_id); uint32_t minor_ver = MinorVerFromDevID(dev_id); uint32_t step = StepFromDevID(dev_id); // FAMILY_UNKNOWN 0xFF // FAMILY_SI - Southern Islands: Tahiti (P), Pitcairn (PM), Cape Verde (M), Bali (V) // FAMILY_TN - Fusion Trinity: Devastator - DVST (M), Scrapper (V) // FAMILY_CI - Sea Islands: Hawaii (P), Maui (P), Bonaire (M) // FAMILY_KV - Fusion Kaveri: Spectre, Spooky; Fusion Kabini: Kalindi // FAMILY_VI - Volcanic Islands: Iceland (V), Tonga (M) // FAMILY_CZ - Carrizo, Nolan, Amur // FAMILY_PI - Pirate Islands // FAMILY_AI - Arctic Islands // FAMILY_RV - Raven // FAMILY_NV - Navi switch (major_ver) { case 6: switch (minor_ver) { case 0: switch (step) { case 0: case 1: return FAMILY_SI; default: return FAMILY_UNKNOWN; } default: return FAMILY_UNKNOWN; } case 7: switch (minor_ver) { case 0: switch (step) { case 0: case 1: case 2: return FAMILY_CI; case 3: return FAMILY_KV; default: return FAMILY_UNKNOWN; } default: return FAMILY_UNKNOWN; } case 8: switch (minor_ver) { case 0: switch (step) { case 0: case 2: case 3: case 4: return FAMILY_VI; case 1: return FAMILY_CZ; default: return FAMILY_UNKNOWN; } default: return FAMILY_UNKNOWN; } case 9: switch (minor_ver) { case 0: switch (step) { case 0: case 1: case 4: // Vega12 case 6: // Vega20 case 8: // Arcturus case 10: // Aldebaran return FAMILY_AI; case 2: case 3: return FAMILY_RV; default: return FAMILY_UNKNOWN; } default: return FAMILY_UNKNOWN; } case 10: switch (minor_ver) { case 0: case 1: // Navi case 3: switch (step) { case 0: case 1: case 2: case 3: case 4: case 5: return FAMILY_NV; default: return FAMILY_UNKNOWN; } default: return FAMILY_UNKNOWN; } default: return FAMILY_UNKNOWN; } assert(0); // We should have already returned } } // namespace image } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/image/device_info.h000077500000000000000000000047331420110115200211310ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_INC_DEVICE_INFO_H_ #define HSA_RUNTIME_CORE_INC_DEVICE_INFO_H_ #include "stdint.h" #include "inc/hsa.h" namespace rocr { namespace image { uint32_t MajorVerFromDevID(uint32_t dev_id); uint32_t MinorVerFromDevID(uint32_t dev_id); uint32_t StepFromDevID(uint32_t dev_id); uint32_t DevIDToAddrLibFamily(uint32_t dev_id); hsa_status_t GetGPUAsicID(hsa_agent_t agent, uint32_t *chip_id); } // namespace image } // namespace rocr #endif // HSA_RUNTIME_CORE_INC_DEVICE_INFO_H_ ROCR-Runtime-rocm-5.0.0/src/image/hsa_ext_image.cpp000066400000000000000000000357721420110115200220130ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "image_runtime.h" #include "image/inc/hsa_ext_image_impl.h" #include "core/inc/exceptions.h" namespace rocr { namespace AMD { hsa_status_t handleException(); template static __forceinline T handleExceptionT() { handleException(); abort(); return T(); } } // namespace amd #define TRY try { #define CATCH } catch(...) { return AMD::handleException(); } #define CATCHRET(RETURN_TYPE) } catch(...) { return AMD::handleExceptionT(); } namespace image { //---------------------------------------------------------------------------// // Utilty routines //---------------------------------------------------------------------------// static void enforceDefaultPitch(hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, size_t& image_data_row_pitch, size_t& image_data_slice_pitch) { // Set default pitch if (image_data_row_pitch == 0) { auto manager = ImageRuntime::instance()->image_manager(agent); assert((manager != nullptr) && "Image manager should already exit."); image_data_row_pitch = image_descriptor->width * manager->GetImageProperty(agent, image_descriptor->format, image_descriptor->geometry) .element_size; } // Set default slice pitch if ((image_data_slice_pitch == 0) && ((image_descriptor->depth != 0) || (image_descriptor->array_size != 0))) { switch (image_descriptor->geometry) { case HSA_EXT_IMAGE_GEOMETRY_3D: case HSA_EXT_IMAGE_GEOMETRY_2DA: case HSA_EXT_IMAGE_GEOMETRY_2DADEPTH: { image_data_slice_pitch = image_data_row_pitch * image_descriptor->height; break; } case HSA_EXT_IMAGE_GEOMETRY_1DA: { image_data_slice_pitch = image_data_row_pitch; break; } default: fprintf(stderr, "Depth set on single layer image geometry.\n"); //assert(false && "Depth set on single layer image geometry."); } } } //---------------------------------------------------------------------------// // APIs that implement Image functionality //---------------------------------------------------------------------------// hsa_status_t hsa_amd_image_get_info_max_dim(hsa_agent_t agent, hsa_agent_info_t attribute, void* value) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (value == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->GetImageInfoMaxDimension(agent, attribute, value); CATCH; } hsa_status_t hsa_ext_image_get_capability(hsa_agent_t agent, hsa_ext_image_geometry_t image_geometry, const hsa_ext_image_format_t* image_format, uint32_t* capability_mask) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if ((image_format == NULL) || (capability_mask == NULL) || (image_geometry < HSA_EXT_IMAGE_GEOMETRY_1D) || (image_geometry > HSA_EXT_IMAGE_GEOMETRY_2DADEPTH)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->GetImageCapability(agent, *image_format, image_geometry, *capability_mask); CATCH; } hsa_status_t hsa_ext_image_data_get_info(hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_info_t* image_data_info) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if ((image_descriptor == NULL) || (image_data_info == NULL) || (access_permission < HSA_ACCESS_PERMISSION_RO) || (access_permission > HSA_ACCESS_PERMISSION_RW)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->GetImageSizeAndAlignment( agent, *image_descriptor, HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE, 0, 0, *image_data_info); CATCH; } hsa_status_t hsa_ext_image_create(hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, const void* image_data, hsa_access_permission_t access_permission, hsa_ext_image_t* image) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (image_descriptor == NULL || image_data == NULL || image == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->CreateImageHandle( agent, *image_descriptor, image_data, access_permission, HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE, 0, 0, *image); CATCH; } hsa_status_t hsa_ext_image_destroy(hsa_agent_t agent, hsa_ext_image_t image) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } return ImageRuntime::instance()->DestroyImageHandle(image); CATCH; } hsa_status_t hsa_ext_image_copy(hsa_agent_t agent, hsa_ext_image_t src_image, const hsa_dim3_t* src_offset, hsa_ext_image_t dst_image, const hsa_dim3_t* dst_offset, const hsa_dim3_t* range) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (src_image.handle == 0 || dst_image.handle == 0 || src_offset == NULL || dst_offset == NULL || range == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->CopyImage(src_image, dst_image, *src_offset, *dst_offset, *range); CATCH; } hsa_status_t hsa_ext_image_import(hsa_agent_t agent, const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, hsa_ext_image_t dst_image, const hsa_ext_image_region_t* image_region) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (src_memory == NULL || dst_image.handle == 0 || image_region == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->CopyBufferToImage(src_memory, src_row_pitch, src_slice_pitch, dst_image, *image_region); CATCH; } hsa_status_t hsa_ext_image_export(hsa_agent_t agent, hsa_ext_image_t src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t* image_region) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (dst_memory == NULL || src_image.handle == 0 || image_region == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->CopyImageToBuffer(src_image, dst_memory, dst_row_pitch, dst_slice_pitch, *image_region); CATCH; } hsa_status_t hsa_ext_image_clear(hsa_agent_t agent, hsa_ext_image_t image, const void* data, const hsa_ext_image_region_t* image_region) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (image.handle == 0 || image_region == NULL || data == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->FillImage(image, data, *image_region); CATCH; }; hsa_status_t hsa_ext_sampler_create(hsa_agent_t agent, const hsa_ext_sampler_descriptor_t* sampler_descriptor, hsa_ext_sampler_t* sampler) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (sampler_descriptor == NULL || sampler == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->CreateSamplerHandle(agent, *sampler_descriptor, *sampler); CATCH; } hsa_status_t hsa_ext_sampler_destroy(hsa_agent_t agent, hsa_ext_sampler_t sampler) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } return ImageRuntime::instance()->DestroySamplerHandle(sampler); CATCH; } hsa_status_t hsa_ext_image_get_capability_with_layout(hsa_agent_t agent, hsa_ext_image_geometry_t image_geometry, const hsa_ext_image_format_t* image_format, hsa_ext_image_data_layout_t image_data_layout, uint32_t* capability_mask) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if ((image_format == NULL) || (capability_mask == NULL) || (image_geometry < HSA_EXT_IMAGE_GEOMETRY_1D) || (image_geometry > HSA_EXT_IMAGE_GEOMETRY_2DADEPTH) || (image_data_layout != HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->GetImageCapability(agent, *image_format, image_geometry, *capability_mask); CATCH; } hsa_status_t hsa_ext_image_data_get_info_with_layout( hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t* image_data_info) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if ((image_descriptor == NULL) || (image_data_info == NULL) || (access_permission < HSA_ACCESS_PERMISSION_RO) || (access_permission > HSA_ACCESS_PERMISSION_RW) || (image_data_layout != HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } enforceDefaultPitch(agent, image_descriptor, image_data_row_pitch, image_data_slice_pitch); return ImageRuntime::instance()->GetImageSizeAndAlignment( agent, *image_descriptor, image_data_layout, image_data_row_pitch, image_data_slice_pitch, *image_data_info); CATCH; } hsa_status_t hsa_ext_image_create_with_layout( hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, const void* image_data, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_t* image) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (image_descriptor == NULL || image_data == NULL || image == NULL || image_data_layout != HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } enforceDefaultPitch(agent, image_descriptor, image_data_row_pitch, image_data_slice_pitch); return ImageRuntime::instance()->CreateImageHandle( agent, *image_descriptor, image_data, access_permission, image_data_layout, image_data_row_pitch, image_data_slice_pitch, *image); CATCH; } hsa_status_t hsa_amd_image_create(hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, const hsa_amd_image_descriptor_t* image_layout, const void* image_data, hsa_access_permission_t access_permission, hsa_ext_image_t* image) { TRY; if (agent.handle == 0) { return HSA_STATUS_ERROR_INVALID_AGENT; } if (image_descriptor == NULL || image_data == NULL || image == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return ImageRuntime::instance()->CreateImageHandleWithLayout( agent, *image_descriptor, image_layout, image_data, access_permission, *image); CATCH; } void LoadImage(core::ImageExtTableInternal* image_api, decltype(::hsa_amd_image_create)** interface_api) { image_api->hsa_ext_image_get_capability_fn = hsa_ext_image_get_capability; image_api->hsa_ext_image_data_get_info_fn = hsa_ext_image_data_get_info; image_api->hsa_ext_image_create_fn = hsa_ext_image_create; image_api->hsa_ext_image_import_fn = hsa_ext_image_import; image_api->hsa_ext_image_export_fn = hsa_ext_image_export; image_api->hsa_ext_image_copy_fn = hsa_ext_image_copy; image_api->hsa_ext_image_clear_fn = hsa_ext_image_clear; image_api->hsa_ext_image_destroy_fn = hsa_ext_image_destroy; image_api->hsa_ext_sampler_create_fn = hsa_ext_sampler_create; image_api->hsa_ext_sampler_destroy_fn = hsa_ext_sampler_destroy; image_api->hsa_ext_image_get_capability_with_layout_fn = hsa_ext_image_get_capability_with_layout; image_api->hsa_ext_image_data_get_info_with_layout_fn = hsa_ext_image_data_get_info_with_layout; image_api->hsa_ext_image_create_with_layout_fn = hsa_ext_image_create_with_layout; image_api->hsa_amd_image_get_info_max_dim_fn = hsa_amd_image_get_info_max_dim; *interface_api = hsa_amd_image_create; } void ReleaseImageRsrcs() { ImageRuntime::DestroySingleton(); } } // namespace image } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/image/image_lut.h000066400000000000000000000057641420110115200206270ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_EXT_IMAGE_IMAGE_LUT_H #define AMD_HSA_EXT_IMAGE_IMAGE_LUT_H #include #include "inc/hsa_ext_image.h" #include "resource.h" #include "util.h" namespace rocr { namespace image { class ImageLut { public: ImageLut() {} virtual ~ImageLut() {} virtual uint32_t MapGeometry(hsa_ext_image_geometry_t geometry) const = 0; virtual ImageProperty MapFormat(const hsa_ext_image_format_t& format, hsa_ext_image_geometry_t geometry) const = 0; virtual Swizzle MapSwizzle(hsa_ext_image_channel_order32_t order) const = 0; virtual uint32_t GetMaxWidth(hsa_ext_image_geometry_t geometry) const = 0; virtual uint32_t GetMaxHeight(hsa_ext_image_geometry_t geometry) const = 0; virtual uint32_t GetMaxDepth(hsa_ext_image_geometry_t geometry) const = 0; virtual uint32_t GetMaxArraySize(hsa_ext_image_geometry_t geometry) const = 0; private: DISALLOW_COPY_AND_ASSIGN(ImageLut); }; } // namespace image } // namespace rocr #endif // AMD_HSA_EXT_IMAGE_IMAGE_LUT_H ROCR-Runtime-rocm-5.0.0/src/image/image_lut_kv.cpp000066400000000000000000000360051420110115200216520ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "image_lut_kv.h" #include "resource_kv.h" namespace rocr { namespace image { const uint32_t ImageLutKv::kGeometryLut_[GEOMETRY_COUNT] = { SQ_RSRC_IMG_1D, // HSA_EXT_IMAGE_GEOMETRY_1D SQ_RSRC_IMG_2D, // HSA_EXT_IMAGE_GEOMETRY_2D SQ_RSRC_IMG_3D, // HSA_EXT_IMAGE_GEOMETRY_3D SQ_RSRC_IMG_1D_ARRAY, // HSA_EXT_IMAGE_GEOMETRY_1DA SQ_RSRC_IMG_2D_ARRAY, // HSA_EXT_IMAGE_GEOMETRY_2DA 0, // HSA_EXT_IMAGE_GEOMETRY_1DB SQ_RSRC_IMG_2D, // HSA_EXT_IMAGE_GEOMETRY_2DDEPTH SQ_RSRC_IMG_2D_ARRAY // HSA_EXT_IMAGE_GEOMETRY_2DADEPTH }; const ImageProperty ImageLutKv::kPropLut_[ORDER_COUNT][TYPE_COUNT] = { {// HSA_EXT_IMAGE_CHANNEL_ORDER_A {RW, 1, FMT_8, TYPE_SNORM}, {RW, 2, FMT_16, TYPE_SNORM}, {RW, 1, FMT_8, TYPE_UNORM}, {RW, 2, FMT_16, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 1, FMT_8, TYPE_SINT}, {RW, 2, FMT_16, TYPE_SINT}, {RW, 4, FMT_32, TYPE_SINT}, {RW, 1, FMT_8, TYPE_UINT}, {RW, 2, FMT_16, TYPE_UINT}, {RW, 4, FMT_32, TYPE_UINT}, {RW, 2, FMT_16, TYPE_FLOAT}, {RW, 4, FMT_32, TYPE_FLOAT}}, {// HSA_EXT_IMAGE_CHANNEL_ORDER_R {RW, 1, FMT_8, TYPE_SNORM}, {RW, 2, FMT_16, TYPE_SNORM}, {RW, 1, FMT_8, TYPE_UNORM}, {RW, 2, FMT_16, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 1, FMT_8, TYPE_SINT}, {RW, 2, FMT_16, TYPE_SINT}, {RW, 4, FMT_32, TYPE_SINT}, {RW, 1, FMT_8, TYPE_UINT}, {RW, 2, FMT_16, TYPE_UINT}, {RW, 4, FMT_32, TYPE_UINT}, {RW, 2, FMT_16, TYPE_FLOAT}, {RW, 4, FMT_32, TYPE_FLOAT}}, {0}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RX { // HSA_EXT_IMAGE_CHANNEL_ORDER_RG {RW, 2, FMT_8_8, TYPE_SNORM}, {RW, 4, FMT_16_16, TYPE_SNORM}, {RW, 2, FMT_8_8, TYPE_UNORM}, {RW, 4, FMT_16_16, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 2, FMT_8_8, TYPE_SINT}, {RW, 4, FMT_16_16, TYPE_SINT}, {RW, 8, FMT_32_32, TYPE_SINT}, {RW, 2, FMT_8_8, TYPE_UINT}, {RW, 4, FMT_16_16, TYPE_UINT}, {RW, 8, FMT_32_32, TYPE_UINT}, {RW, 4, FMT_16_16, TYPE_FLOAT}, {RW, 8, FMT_32_32, TYPE_FLOAT}}, {0}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGX { // HSA_EXT_IMAGE_CHANNEL_ORDER_RA {RW, 2, FMT_8_8, TYPE_SNORM}, {RW, 4, FMT_16_16, TYPE_SNORM}, {RW, 2, FMT_8_8, TYPE_UNORM}, {RW, 4, FMT_16_16, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 2, FMT_8_8, TYPE_SINT}, {RW, 4, FMT_16_16, TYPE_SINT}, {RW, 8, FMT_32_32, TYPE_SINT}, {RW, 2, FMT_8_8, TYPE_UINT}, {RW, 4, FMT_16_16, TYPE_UINT}, {RW, 8, FMT_32_32, TYPE_UINT}, {RW, 4, FMT_16_16, TYPE_FLOAT}, {RW, 8, FMT_32_32, TYPE_FLOAT}}, {// HSA_EXT_IMAGE_CHANNEL_ORDER_RGB {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 2, FMT_1_5_5_5, TYPE_UNORM}, {RW, 2, FMT_5_6_5, TYPE_UNORM}, {RW, 4, FMT_2_10_10_10, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {0}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX { // HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA {RW, 4, FMT_8_8_8_8, TYPE_SNORM}, {RW, 8, FMT_16_16_16_16, TYPE_SNORM}, {RW, 4, FMT_8_8_8_8, TYPE_UNORM}, {RW, 8, FMT_16_16_16_16, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 4, FMT_8_8_8_8, TYPE_SINT}, {RW, 8, FMT_16_16_16_16, TYPE_SINT}, {RW, 16, FMT_32_32_32_32, TYPE_SINT}, {RW, 4, FMT_8_8_8_8, TYPE_UINT}, {RW, 8, FMT_16_16_16_16, TYPE_UINT}, {RW, 16, FMT_32_32_32_32, TYPE_UINT}, {RW, 8, FMT_16_16_16_16, TYPE_FLOAT}, {RW, 16, FMT_32_32_32_32, TYPE_FLOAT}}, {// HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA {RW, 4, FMT_8_8_8_8, TYPE_SNORM}, {0, 0, 0, 0}, {RW, 4, FMT_8_8_8_8, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 4, FMT_8_8_8_8, TYPE_SINT}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 4, FMT_8_8_8_8, TYPE_UINT}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {// HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB {RW, 4, FMT_8_8_8_8, TYPE_SNORM}, {0, 0, 0, 0}, {RW, 4, FMT_8_8_8_8, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 4, FMT_8_8_8_8, TYPE_SINT}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 4, FMT_8_8_8_8, TYPE_UINT}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {0}, // HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR {0}, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB {0}, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX { // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA {0, 0, 0, 0}, {0, 0, 0, 0}, {RO, 4, FMT_8_8_8_8, TYPE_SRGB}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {0}, // HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA { // HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY {RW, 1, FMT_8, TYPE_SNORM}, {RW, 2, FMT_16, TYPE_SNORM}, {RW, 1, FMT_8, TYPE_UNORM}, {RW, 2, FMT_16, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 2, FMT_16, TYPE_FLOAT}, {RW, 4, FMT_32, TYPE_FLOAT}}, {// HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE {RW, 1, FMT_8, TYPE_SNORM}, {RW, 2, FMT_16, TYPE_SNORM}, {RW, 1, FMT_8, TYPE_UNORM}, {RW, 2, FMT_16, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {RW, 2, FMT_16, TYPE_FLOAT}, {RW, 4, FMT_32, TYPE_FLOAT}}, {// HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {ROWO, 2, FMT_16, TYPE_UNORM}, // TODO: 24 bit {0, 3, FMT_32, TYPE_UNORM}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {ROWO, 4, FMT_32, TYPE_FLOAT}}, {0} // HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL }; const Swizzle ImageLutKv::kSwizzleLut_[ORDER_COUNT] = { {SEL_0, SEL_0, SEL_0, SEL_X}, // HSA_EXT_IMAGE_CHANNEL_ORDER_A {SEL_X, SEL_0, SEL_0, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_R {SEL_X, SEL_0, SEL_0, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RX {SEL_X, SEL_Y, SEL_0, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RG {SEL_X, SEL_Y, SEL_0, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGX {SEL_X, SEL_0, SEL_0, SEL_Y}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RA {SEL_Z, SEL_Y, SEL_X, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGB {SEL_Z, SEL_Y, SEL_X, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX {SEL_X, SEL_Y, SEL_Z, SEL_W}, // HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA {SEL_Z, SEL_Y, SEL_X, SEL_W}, // HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA {SEL_Y, SEL_Z, SEL_W, SEL_X}, // HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB {SEL_Y, SEL_X, SEL_W, SEL_Z}, // HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR {SEL_X, SEL_Y, SEL_Z, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB {SEL_X, SEL_Y, SEL_Z, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX {SEL_X, SEL_Y, SEL_Z, SEL_W}, // HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA {SEL_Z, SEL_Y, SEL_X, SEL_W}, // HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA {SEL_X, SEL_X, SEL_X, SEL_X}, // HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY {SEL_X, SEL_X, SEL_X, SEL_1}, // HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE {SEL_X, SEL_0, SEL_0, SEL_0}, // HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH {SEL_Y, SEL_0, SEL_0, SEL_0} // HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL }; const uint32_t ImageLutKv::kMaxDimensionLut_[GEOMETRY_COUNT][4] = { {16384, 1, 1, 1}, // HSA_EXT_IMAGE_GEOMETRY_1D {16384, 16384, 1, 1}, // HSA_EXT_IMAGE_GEOMETRY_2D {16384, 16384, 8192, 1}, // HSA_EXT_IMAGE_GEOMETRY_3D {16384, 1, 1, 8192}, // HSA_EXT_IMAGE_GEOMETRY_1DA {16384, 16384, 1, 8192}, // HSA_EXT_IMAGE_GEOMETRY_2DA {4294967295, 1, 1, 1}, // HSA_EXT_IMAGE_GEOMETRY_1DB {16384, 16384, 1, 1}, // HSA_EXT_IMAGE_GEOMETRY_2DDEPTH {16384, 16384, 1, 8192} // HSA_EXT_IMAGE_GEOMETRY_2DADEPTH }; uint32_t ImageLutKv::MapGeometry(hsa_ext_image_geometry_t geometry) const { switch (geometry) { case HSA_EXT_IMAGE_GEOMETRY_1D: case HSA_EXT_IMAGE_GEOMETRY_2D: case HSA_EXT_IMAGE_GEOMETRY_3D: case HSA_EXT_IMAGE_GEOMETRY_1DA: case HSA_EXT_IMAGE_GEOMETRY_2DA: case HSA_EXT_IMAGE_GEOMETRY_1DB: case HSA_EXT_IMAGE_GEOMETRY_2DDEPTH: case HSA_EXT_IMAGE_GEOMETRY_2DADEPTH: return kGeometryLut_[geometry]; default: assert(false && "Should not reach here"); return static_cast(-1); }; } ImageProperty ImageLutKv::MapFormat(const hsa_ext_image_format_t& format, hsa_ext_image_geometry_t geometry) const { switch (geometry) { case HSA_EXT_IMAGE_GEOMETRY_1D: case HSA_EXT_IMAGE_GEOMETRY_2D: case HSA_EXT_IMAGE_GEOMETRY_3D: case HSA_EXT_IMAGE_GEOMETRY_1DA: case HSA_EXT_IMAGE_GEOMETRY_2DA: return kPropLut_[format.channel_order][format.channel_type]; case HSA_EXT_IMAGE_GEOMETRY_1DB: switch (format.channel_order) { // Hardware does not support buffer access to srgb image. case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA: case HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA: break; default: switch (format.channel_type) { // Hardware does not support buffer access to 555/565 packed image. case HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555: case HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565: break; default: return kPropLut_[format.channel_order][format.channel_type]; } } break; case HSA_EXT_IMAGE_GEOMETRY_2DDEPTH: case HSA_EXT_IMAGE_GEOMETRY_2DADEPTH: switch (format.channel_order) { case HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH: case HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL: return kPropLut_[format.channel_order][format.channel_type]; default: break; } break; default: assert(false && "Should not reach here"); break; } ImageProperty prop = {0}; return prop; } Swizzle ImageLutKv::MapSwizzle(hsa_ext_image_channel_order32_t order) const { const Swizzle invalid_swizzle = {0xff, 0xff, 0xff, 0xff}; switch (order) { case HSA_EXT_IMAGE_CHANNEL_ORDER_A: case HSA_EXT_IMAGE_CHANNEL_ORDER_R: case HSA_EXT_IMAGE_CHANNEL_ORDER_RX: case HSA_EXT_IMAGE_CHANNEL_ORDER_RG: case HSA_EXT_IMAGE_CHANNEL_ORDER_RGX: case HSA_EXT_IMAGE_CHANNEL_ORDER_RA: case HSA_EXT_IMAGE_CHANNEL_ORDER_RGB: case HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX: case HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA: case HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA: case HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB: case HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA: case HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA: case HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY: case HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE: case HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH: case HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL: return kSwizzleLut_[order]; default: assert(false && "Should not reach here"); return invalid_swizzle; }; } uint32_t ImageLutKv::GetMaxWidth(hsa_ext_image_geometry_t geometry) const { return kMaxDimensionLut_[geometry][0]; } uint32_t ImageLutKv::GetMaxHeight(hsa_ext_image_geometry_t geometry) const { return kMaxDimensionLut_[geometry][1]; } uint32_t ImageLutKv::GetMaxDepth(hsa_ext_image_geometry_t geometry) const { return kMaxDimensionLut_[geometry][2]; } uint32_t ImageLutKv::GetMaxArraySize(hsa_ext_image_geometry_t geometry) const { return kMaxDimensionLut_[geometry][3]; } uint32_t ImageLutKv::GetPixelSize(uint8_t data_format, uint8_t data_type) const { //Currently only supports formats that ROCr can create. switch(data_format) { case FMT_1_5_5_5: return 2; case FMT_16: return 2; case FMT_16_16: return 4; case FMT_16_16_16_16: return 8; case FMT_2_10_10_10: return 4; //SPK: Where is unorm returning 3? Was this a Hawaii specific thing? case FMT_32: return (data_type==TYPE_UNORM) ? 3 : 4; case FMT_32_32: return 8; case FMT_32_32_32_32: return 16; case FMT_5_6_5: return 2; case FMT_8: return 1; case FMT_8_8: return 2; case FMT_8_8_8_8: return 4; default: return 0; } } } // namespace image } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/image/image_lut_kv.h000066400000000000000000000072001420110115200213120ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_EXT_IMAGE_IMAGE_LUT_KV_H #define AMD_HSA_EXT_IMAGE_IMAGE_LUT_KV_H #include "image_lut.h" namespace rocr { namespace image { class ImageLutKv : public ImageLut { public: ImageLutKv() {} virtual ~ImageLutKv() {} virtual uint32_t MapGeometry(hsa_ext_image_geometry_t geometry) const; virtual ImageProperty MapFormat(const hsa_ext_image_format_t& format, hsa_ext_image_geometry_t geometry) const; virtual Swizzle MapSwizzle(hsa_ext_image_channel_order32_t order) const; virtual uint32_t GetMaxWidth(hsa_ext_image_geometry_t geometry) const; virtual uint32_t GetMaxHeight(hsa_ext_image_geometry_t geometry) const; virtual uint32_t GetMaxDepth(hsa_ext_image_geometry_t geometry) const; virtual uint32_t GetMaxArraySize(hsa_ext_image_geometry_t geometry) const; uint32_t GetPixelSize(uint8_t data_format, uint8_t data_type) const; private: // Lookup table of image geometry to device geometry enum. static const uint32_t kGeometryLut_[GEOMETRY_COUNT]; // Lookup table of channel format property. Based on HSA Programmer's // Reference Manual 1.0P Table 9-4 Channel Order, Channel type and Image // Geometry Combinations. static const ImageProperty kPropLut_[ORDER_COUNT][TYPE_COUNT]; // Lookup table of channel order swizzle. static const Swizzle kSwizzleLut_[ORDER_COUNT]; // Lookup table of image geometry to max dimension. // Each record contains four values: widht, height, depth, array_size. static const uint32_t kMaxDimensionLut_[GEOMETRY_COUNT][4]; DISALLOW_COPY_AND_ASSIGN(ImageLutKv); }; } // namespace image } // namespace rocr #endif // AMD_HSA_EXT_IMAGE_IMAGE_LUT_KV_H ROCR-Runtime-rocm-5.0.0/src/image/image_manager.cpp000066400000000000000000000616051420110115200217640ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "inc/hsa_ext_amd.h" #include "inc/hsa_ext_image.h" #include "core/inc/hsa_ext_amd_impl.h" #include "image_manager.h" #include "image_runtime.h" #include #include #include #include #if (defined(WIN32) || defined(_WIN32)) #define NOMINMAX __inline long int lrintf(float f) { return _mm_cvtss_si32(_mm_load_ss(&f)); } #endif namespace rocr { namespace image { Image* Image::Create(hsa_agent_t agent) { hsa_amd_memory_pool_t pool = ImageRuntime::instance()->kernarg_pool(); Image* image = NULL; hsa_status_t status = AMD::hsa_amd_memory_pool_allocate(pool, sizeof(Image), 0, reinterpret_cast(&image)); assert(status == HSA_STATUS_SUCCESS); if (status != HSA_STATUS_SUCCESS) return NULL; new (image) Image(); status = AMD::hsa_amd_agents_allow_access(1, &agent, NULL, image); if (status != HSA_STATUS_SUCCESS) { Image::Destroy(image); return NULL; } return image; } void Image::Destroy(const Image* image) { assert(image != NULL); image->~Image(); hsa_status_t status = AMD::hsa_amd_memory_pool_free(const_cast(image)); assert(status == HSA_STATUS_SUCCESS); } Sampler* Sampler::Create(hsa_agent_t agent) { hsa_amd_memory_pool_t pool = ImageRuntime::instance()->kernarg_pool(); Sampler* sampler = NULL; hsa_status_t status = AMD::hsa_amd_memory_pool_allocate(pool, sizeof(Sampler), 0, reinterpret_cast(&sampler)); if (status != HSA_STATUS_SUCCESS) return NULL; new (sampler) Sampler(); status = AMD::hsa_amd_agents_allow_access(1, &agent, NULL, sampler); if (status != HSA_STATUS_SUCCESS) { Sampler::Destroy(sampler); return NULL; } return sampler; } void Sampler::Destroy(const Sampler* sampler) { assert(sampler != NULL); sampler->~Sampler(); hsa_status_t status = AMD::hsa_amd_memory_pool_free(const_cast(sampler)); assert(status == HSA_STATUS_SUCCESS); } ImageManager::ImageManager() {} ImageManager::~ImageManager() {} hsa_status_t ImageManager::CopyBufferToImage( const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, const Image& dst_image, const hsa_ext_image_region_t& image_region) { Image* src_image = Image::Create(dst_image.component); src_image->component = dst_image.component; src_image->desc = dst_image.desc; src_image->data = const_cast(src_memory); src_image->permission = HSA_ACCESS_PERMISSION_RO; src_image->row_pitch = src_row_pitch; src_image->slice_pitch = src_slice_pitch; const hsa_dim3_t dst_origin = image_region.offset; const hsa_dim3_t src_origin = {0}; const hsa_dim3_t copy_size = image_region.range; hsa_status_t status = ImageManager::CopyImage( dst_image, *src_image, dst_origin, src_origin, copy_size); Image::Destroy(src_image); return status; } hsa_status_t ImageManager::CopyImageToBuffer( const Image& src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t& image_region) { // Treat buffer as image since we don't tile our image anyway. Image* dst_image = Image::Create(src_image.component); dst_image->component = src_image.component; dst_image->desc = src_image.desc; // the width, height, depth is ignored. dst_image->data = dst_memory; dst_image->permission = HSA_ACCESS_PERMISSION_WO; dst_image->row_pitch = dst_row_pitch; dst_image->slice_pitch = dst_slice_pitch; const hsa_dim3_t dst_origin = {0}; const hsa_dim3_t src_origin = image_region.offset; const hsa_dim3_t copy_size = image_region.range; hsa_status_t status = ImageManager::CopyImage( *dst_image, src_image, dst_origin, src_origin, copy_size); Image::Destroy(dst_image); return status; } hsa_status_t ImageManager::CopyImage(const Image& dst_image, const Image& src_image, const hsa_dim3_t& dst_origin, const hsa_dim3_t& src_origin, const hsa_dim3_t size) { ImageProperty dst_image_prop = GetImageProperty( dst_image.component, dst_image.desc.format, dst_image.desc.geometry); assert(dst_image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); const size_t dst_element_size = dst_image_prop.element_size; assert(dst_element_size != 0); ImageProperty src_image_prop = GetImageProperty( src_image.component, src_image.desc.format, src_image.desc.geometry); assert(src_image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); const size_t src_element_size = src_image_prop.element_size; assert(src_element_size != 0); const hsa_ext_image_format_t src_format = src_image.desc.format; const hsa_ext_image_channel_order32_t src_order = src_format.channel_order; const hsa_ext_image_channel_type32_t src_type = src_format.channel_type; const hsa_ext_image_format_t dst_format = dst_image.desc.format; const hsa_ext_image_channel_order32_t dst_order = dst_format.channel_order; const hsa_ext_image_channel_type32_t dst_type = dst_format.channel_type; bool linear_to_standard_rgb = false; bool standard_to_linear_rgb = false; if ((src_order != dst_order) || (src_type != dst_type)) { // Source and destination format must be the same, except for // SRGBA <--> RGBA images. if ((src_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8) && (dst_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8)) { if ((src_order == HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA) && (dst_order == HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA)) { standard_to_linear_rgb = true; } else if ((src_order == HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA) && (dst_order == HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA)) { linear_to_standard_rgb = true; } else { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } } else { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } } // Source and destination format should be the same so the element size // should be same too. const size_t element_size = src_element_size; // row_pitch and slice_pitch in bytes. const size_t dst_row_pitch = std::max(dst_image.row_pitch, size.x * element_size); const size_t dst_slice_pitch = std::max( dst_image.slice_pitch, dst_row_pitch * (dst_image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DA ? 1 : size.y)); const size_t src_row_pitch = std::max(src_image.row_pitch, size.x * element_size); const size_t src_slice_pitch = std::max( src_image.slice_pitch, src_row_pitch * (src_image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DA ? 1 : size.y)); size_t src_offset = src_origin.x; size_t dst_offset = dst_origin.x; size_t copy_size = size.x; // Calculate source the offset in bytes. src_offset *= element_size; src_offset += src_row_pitch * src_origin.y; src_offset += src_slice_pitch * src_origin.z; // Calculate destination the offset in bytes. dst_offset *= element_size; dst_offset += dst_row_pitch * dst_origin.y; dst_offset += dst_slice_pitch * dst_origin.z; copy_size *= element_size; // Get destination and source memory. unsigned char* dst = static_cast(dst_image.data); const unsigned char* src = static_cast(src_image.data); if (!linear_to_standard_rgb && !standard_to_linear_rgb) { // Copy the memory by row. for (size_t slice = 0; slice < size.z; ++slice) { size_t src_offset_temp = src_offset + slice * src_slice_pitch; size_t dst_offset_temp = dst_offset + slice * dst_slice_pitch; for (size_t rows = 0; rows < size.y; ++rows) { std::memcpy((dst + dst_offset_temp), (src + src_offset_temp), copy_size); src_offset_temp += src_row_pitch; dst_offset_temp += dst_row_pitch; } } } else { // Copy per pixel between RGBA-SRGBA images. for (size_t slice = 0; slice < size.z; ++slice) { size_t src_offset_temp = src_offset + slice * src_slice_pitch; size_t dst_offset_temp = dst_offset + slice * dst_slice_pitch; for (size_t rows = 0; rows < size.y; ++rows) { const uint8_t* src_pixel = src + src_offset_temp; uint8_t* dst_pixel = dst + dst_offset_temp; if (linear_to_standard_rgb) { for (size_t cols = 0; cols < size.x; ++cols) { dst_pixel[0] = Denormalize(LinearToStandardRGB(Normalize(src_pixel[0]))); // R dst_pixel[1] = Denormalize(LinearToStandardRGB(Normalize(src_pixel[1]))); // G dst_pixel[2] = Denormalize(LinearToStandardRGB(Normalize(src_pixel[2]))); // B dst_pixel[3] = src_pixel[3]; // A src_pixel += element_size; dst_pixel += element_size; } } else { assert(standard_to_linear_rgb); for (size_t cols = 0; cols < size.x; ++cols) { dst_pixel[0] = Denormalize(StandardToLinearRGB(Normalize(src_pixel[0]))); // R dst_pixel[1] = Denormalize(StandardToLinearRGB(Normalize(src_pixel[1]))); // G dst_pixel[2] = Denormalize(StandardToLinearRGB(Normalize(src_pixel[2]))); // B dst_pixel[3] = src_pixel[3]; // A src_pixel += element_size; dst_pixel += element_size; } } src_offset_temp += src_row_pitch; dst_offset_temp += dst_row_pitch; } } } return HSA_STATUS_SUCCESS; } uint16_t ImageManager::FloatToHalf(float in) { volatile union { float f; uint32_t u; } fu; fu.f = in; const uint16_t sign_bit_16 = (fu.u >> 16) & 0x8000; const uint32_t exp_32 = (fu.u >> 23) & 0xff; const uint32_t mantissa_32 = (fu.u) & 0x7fffff; if (exp_32 == 0 && mantissa_32 == 0) { // Zero. return sign_bit_16; } else if (exp_32 == 0xff) { if (mantissa_32 == 0) { // Inf. return (sign_bit_16 | 0x7c00); } else if ((mantissa_32 & 0x400000)) { // Quiet NaN. return (sign_bit_16 | 0x7e00); } else { // Signal NaN. return (sign_bit_16 | 0x7c01); } } else { const uint32_t kMaxExpNormal = 0x477fe000 >> 23; // 65504. const uint32_t kMinExpNormal = 0x38800000 >> 23; // 2^-14; const uint32_t kMinExpSubnormal = 0x33800000 >> 23; // 2^-24. if (exp_32 > kMaxExpNormal) { // Half overflow. // TODO: clamp it to max half float or +Inf. return (sign_bit_16 | 0x7bff); } else if (exp_32 < kMinExpSubnormal) { // Half underflow. return (sign_bit_16); } else if (exp_32 < kMinExpNormal) { // Half subnormal. return (sign_bit_16 | ((0x0400 | (mantissa_32 >> 13)) >> (127 - exp_32 - 14))); } else { // Half normal. return (sign_bit_16 | (((exp_32 - 127 + 15) << 10) | (mantissa_32 >> 13))); } } } float ImageManager::Normalize(uint8_t u_val) { if (u_val == 0) { return 0.0f; } else if (u_val == UINT8_MAX) { return 1.0f; } else { return std::min( std::max(static_cast(u_val) / static_cast(UINT8_MAX), 0.0f), 1.0f); } } uint8_t ImageManager::Denormalize(float f_val) { const unsigned long kScale = UINT8_MAX; return std::min( static_cast(std::max(lrintf(kScale * f_val), 0l)), kScale); } float ImageManager::StandardToLinearRGB(float s_val) { // Map SRGB value to RGB color space based on HSA Programmers Reference // Manual version 1.0 Provisional, chapter 7.1.4.1.2 Standard RGB (s-Form). double l_val = (double)s_val; l_val = (l_val <= 0.04045f) ? (l_val / 12.92f) : pow(((l_val + 0.055f) / 1.055f), 2.4f); return l_val; } float ImageManager::LinearToStandardRGB(float l_val) { // Map RGB value to SRGB color space based on HSA Programmers Reference // Manual version 1.0 Provisional, chapter 7.1.4.1.2 Standard RGB (s-Form). double s_val = (double)l_val; #if (defined(WIN32) || defined(_WIN32)) if (_isnan(s_val)) s_val = 0.0; #else if (std::isnan(s_val)) s_val = 0.0; #endif if (s_val > 1.0) { s_val = 1.0; } else if (s_val < 0.0) { s_val = 0.0; } else if (s_val < 0.0031308) { s_val = 12.92 * s_val; } else { s_val = (1.055 * pow(s_val, 5.0 / 12.0)) - 0.055; } return s_val; } void ImageManager::FormatPattern(const hsa_ext_image_format_t& format, const void* pattern_in, void* pattern_out) { const int kR = 0; const int kG = 1; const int kB = 2; const int kA = 3; int index[4] = {0}; int num_channel = 0; switch (format.channel_order) { case HSA_EXT_IMAGE_CHANNEL_ORDER_A: index[0] = kA; num_channel = 1; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_R: case HSA_EXT_IMAGE_CHANNEL_ORDER_RX: index[0] = kR; num_channel = 1; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_RG: case HSA_EXT_IMAGE_CHANNEL_ORDER_RGX: index[0] = kR; index[1] = kG; num_channel = 2; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_RA: index[0] = kR; index[1] = kA; num_channel = 2; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_RGB: case HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX: index[0] = kR; index[1] = kG; index[2] = kB; num_channel = 3; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA: index[0] = kR; index[1] = kG; index[2] = kB; index[3] = kA; num_channel = 4; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA: case HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA: index[0] = kB; index[1] = kG; index[2] = kR; index[3] = kA; num_channel = 4; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB: index[0] = kA; index[1] = kR; index[2] = kG; index[3] = kB; num_channel = 4; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR: index[0] = kA; index[1] = kB; index[2] = kG; index[3] = kR; num_channel = 4; break; case HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY: case HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE: case HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH: case HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL: index[0] = kR; num_channel = 1; break; default: assert(false && "Should not reach here."); break; } const float* pattern_in_f = NULL; const int32_t* pattern_in_i32 = NULL; const uint32_t* pattern_in_ui32 = NULL; float new_pattern_in_f[4] = { 0 }; if ((format.channel_order == HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB) || (format.channel_order == HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX) || (format.channel_order == HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA) || (format.channel_order == HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA)) { pattern_in_f = reinterpret_cast(pattern_in); new_pattern_in_f[0] = LinearToStandardRGB(pattern_in_f[0]); new_pattern_in_f[1] = LinearToStandardRGB(pattern_in_f[1]); new_pattern_in_f[2] = LinearToStandardRGB(pattern_in_f[2]); new_pattern_in_f[3] = pattern_in_f[3]; pattern_in_f = reinterpret_cast(new_pattern_in_f); } else { pattern_in_f = reinterpret_cast(pattern_in); pattern_in_i32 = reinterpret_cast(pattern_in); pattern_in_ui32 = reinterpret_cast(pattern_in); } for (int c = 0; c < num_channel; ++c) { switch (format.channel_type) { case HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT8: { int8_t* pattern_out_i8 = reinterpret_cast(pattern_out); const long kScale = INT8_MAX; const long conv = lrintf(kScale * pattern_in_f[index[c]]); pattern_out_i8[c] = std::min(std::max(conv, -kScale - 1l), kScale); } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT16: { int16_t* pattern_out_i16 = reinterpret_cast(pattern_out); const long kScale = INT16_MAX; const long conv = lrintf(kScale * pattern_in_f[index[c]]); pattern_out_i16[c] = std::min(std::max(conv, -kScale - 1l), kScale); } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8: { uint8_t* pattern_out_ui8 = reinterpret_cast(pattern_out); const unsigned long kScale = UINT8_MAX; const long conv = lrintf(kScale * pattern_in_f[index[c]]); pattern_out_ui8[c] = std::min(static_cast(std::max(conv, 0l)), kScale); } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT16: { uint16_t* pattern_out_ui16 = reinterpret_cast(pattern_out); const unsigned long kScale = UINT16_MAX; const long conv = lrintf(kScale * pattern_in_f[index[c]]); pattern_out_ui16[c] = std::min(static_cast(std::max(conv, 0l)), kScale); } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24: { typedef struct Order24 { uint32_t r : 24; } Order24; Order24* pattern_out_u24 = reinterpret_cast(pattern_out); const unsigned long kScale = 0xffffff; const long conv = lrintf(kScale * pattern_in_f[index[c]]); pattern_out_u24[c].r = std::min(static_cast(std::max(conv, 0l)), kScale); } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555: { typedef struct Order555 { uint32_t b : 5; uint32_t g : 5; uint32_t r : 5; } Order555; Order555* pattern_out_u555 = reinterpret_cast(pattern_out); const unsigned long kScale = 0x1f; long conv = lrintf(kScale * pattern_in_f[index[0]]); pattern_out_u555->r = std::min(static_cast(std::max(conv, 0l)), kScale); conv = lrintf(kScale * pattern_in_f[index[1]]); pattern_out_u555->g = std::min(static_cast(std::max(conv, 0l)), kScale); conv = lrintf(kScale * pattern_in_f[index[2]]); pattern_out_u555->b = std::min(static_cast(std::max(conv, 0l)), kScale); return; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565: { typedef struct Order565 { uint32_t b : 5; uint32_t g : 6; uint32_t r : 5; } Order565; Order565* pattern_out_u565 = reinterpret_cast(pattern_out); unsigned long scale = 0x1f; long conv = lrintf(scale * pattern_in_f[index[0]]); pattern_out_u565->r = std::min(static_cast(std::max(conv, 0l)), scale); scale = 0x3f; conv = lrintf(scale * pattern_in_f[index[1]]); pattern_out_u565->g = std::min(static_cast(std::max(conv, 0l)), scale); scale = 0x1f; conv = lrintf(scale * pattern_in_f[index[2]]); pattern_out_u565->b = std::min(static_cast(std::max(conv, 0l)), scale); return; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010: { typedef struct Order101010 { uint32_t b : 10; uint32_t g : 10; uint32_t r : 10; } Order101010; Order101010* pattern_out_u101010 = reinterpret_cast(pattern_out); const unsigned long kScale = 0x3ff; long conv = lrintf(kScale * pattern_in_f[index[0]]); pattern_out_u101010->r = std::min(static_cast(std::max(conv, 0l)), kScale); conv = lrintf(kScale * pattern_in_f[index[1]]); pattern_out_u101010->g = std::min(static_cast(std::max(conv, 0l)), kScale); conv = lrintf(kScale * pattern_in_f[index[2]]); pattern_out_u101010->b = std::min(static_cast(std::max(conv, 0l)), kScale); return; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT8: { int8_t* pattern_out_i8 = reinterpret_cast(pattern_out); pattern_out_i8[c] = pattern_in_i32[index[c]]; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT16: { int16_t* pattern_out_i16 = reinterpret_cast(pattern_out); pattern_out_i16[c] = pattern_in_i32[index[c]]; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT32: { int32_t* pattern_out_i32 = reinterpret_cast(pattern_out); pattern_out_i32[c] = pattern_in_i32[index[c]]; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8: { uint8_t* pattern_out_ui8 = reinterpret_cast(pattern_out); pattern_out_ui8[c] = pattern_in_ui32[index[c]]; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16: { uint16_t* pattern_out_ui16 = reinterpret_cast(pattern_out); pattern_out_ui16[c] = pattern_in_ui32[index[c]]; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32: { uint32_t* pattern_out_ui32 = reinterpret_cast(pattern_out); pattern_out_ui32[c] = pattern_in_ui32[index[c]]; } break; case HSA_EXT_IMAGE_CHANNEL_TYPE_HALF_FLOAT: { // TODO: convert to f16 uint16_t* pattern_out_ui16 = reinterpret_cast(pattern_out); pattern_out_ui16[c] = FloatToHalf(pattern_in_f[index[c]]); break; } case HSA_EXT_IMAGE_CHANNEL_TYPE_FLOAT: { float* pattern_out_f = reinterpret_cast(pattern_out); pattern_out_f[c] = pattern_in_f[index[c]]; } break; default: assert(false && "Should not reach here."); break; } } } hsa_status_t ImageManager::FillImage(const Image& image, const void* pattern, const hsa_ext_image_region_t& region) { const hsa_dim3_t origin = region.offset; const hsa_dim3_t size = region.range; ImageProperty image_prop = GetImageProperty(image.component, image.desc.format, image.desc.geometry); assert(image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); const size_t element_size = image_prop.element_size; assert(element_size != 0); const size_t row_pitch = image.row_pitch; const size_t slice_pitch = image.slice_pitch; // Map memory. unsigned char* fill_mem = static_cast(image.data); char fill_value[4 * sizeof(int)] = {0}; FormatPattern(image.desc.format, pattern, fill_value); // Calculate offset. size_t offset = origin.x * element_size; offset += row_pitch * origin.y; offset += slice_pitch * origin.z; // Fill the image memory with the pattern. for (size_t slice = 0; slice < size.z; ++slice) { size_t offset_temp = offset + slice * slice_pitch; for (size_t rows = 0; rows < size.y; ++rows) { size_t pix_offset = offset_temp; // Copy pattern per pixel. for (size_t column = 0; column < size.x; ++column) { memcpy((fill_mem + pix_offset), fill_value, element_size); pix_offset += element_size; } offset_temp += row_pitch; } } return HSA_STATUS_SUCCESS; } } // namespace image } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/image/image_manager.h000066400000000000000000000134021420110115200214210ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_EXT_IMAGE_IMAGE_MANAGER_H #define AMD_HSA_EXT_IMAGE_IMAGE_MANAGER_H #include #include "inc/hsa.h" #include "inc/hsa_ext_image.h" #include "resource.h" #include "util.h" namespace rocr { namespace image { /// @brief Abstract class for creating AMD agent specific image / sampler /// resources and data transfer. class ImageManager { public: explicit ImageManager(); virtual ~ImageManager(); virtual hsa_status_t Initialize(hsa_agent_t agent_handle) = 0; virtual void Cleanup() = 0; /// @brief Retrieve device specific image property of a certain format /// and geometry. virtual ImageProperty GetImageProperty( hsa_agent_t component, const hsa_ext_image_format_t& format, hsa_ext_image_geometry_t geometry) const = 0; /// @brief Retrieve device specific supported max width, height, depth, /// and array size of an image geometry. virtual void GetImageInfoMaxDimension(hsa_agent_t component, hsa_ext_image_geometry_t geometry, uint32_t& width, uint32_t& height, uint32_t& depth, uint32_t& array_size) const = 0; /// @brief Calculate the size and alignment of the backing storage of an /// image. virtual hsa_status_t CalculateImageSizeAndAlignment( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t& image_info) const = 0; /// @brief Fill image structure with device specific image object. virtual hsa_status_t PopulateImageSrd(Image& image) const = 0; /// @brief Fill image structure with device specific image object using the given format. virtual hsa_status_t PopulateImageSrd(Image& image, const metadata_amd_t* desc) const = 0; /// @brief Modify device specific image object according to the specified /// new format. virtual hsa_status_t ModifyImageSrd( Image& image, hsa_ext_image_format_t& new_format) const = 0; /// @brief Fill sampler structure with device specific sampler object. virtual hsa_status_t PopulateSamplerSrd(Sampler& sampler) const = 0; // @brief Copy the content of a linear memory to an image object. virtual hsa_status_t CopyBufferToImage( const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, const Image& dst_image, const hsa_ext_image_region_t& image_region); /// @brief Copy the content of an image object to a linear memory. virtual hsa_status_t CopyImageToBuffer( const Image& src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t& image_region); /// @brief Transfer images backing storage. virtual hsa_status_t CopyImage(const Image& dst_image, const Image& src_image, const hsa_dim3_t& dst_origin, const hsa_dim3_t& src_origin, const hsa_dim3_t size); /// @brief Fill image backing storage using host copy. virtual hsa_status_t FillImage(const Image& image, const void* pattern, const hsa_ext_image_region_t& region); protected: static uint16_t FloatToHalf(float in); static inline float Normalize(uint8_t u_val); static inline uint8_t Denormalize(float f_val); static float StandardToLinearRGB(float s_val); static float LinearToStandardRGB(float l_val); static void FormatPattern(const hsa_ext_image_format_t& format, const void* pattern_in, void* pattern_out); private: DISALLOW_COPY_AND_ASSIGN(ImageManager); }; } // namespace image } // namespace rocr #endif // AMD_HSA_EXT_IMAGE_IMAGE_MANAGER_H ROCR-Runtime-rocm-5.0.0/src/image/image_manager_ai.cpp000077500000000000000000000513411420110115200224340ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #define NOMINMAX #include "image_manager_ai.h" #include #include #include #include "hsakmt.h" #include "inc/hsa_ext_amd.h" #include "core/inc/hsa_internal.h" #include "addrlib/src/core/addrlib.h" #include "image_runtime.h" #include "resource.h" #include "resource_ai.h" #include "util.h" #include "device_info.h" namespace rocr { namespace image { ImageManagerAi::ImageManagerAi() : ImageManagerKv() {} ImageManagerAi::~ImageManagerAi() {} hsa_status_t ImageManagerAi::CalculateImageSizeAndAlignment( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t& image_info) const { ADDR2_COMPUTE_SURFACE_INFO_OUTPUT out = {0}; hsa_profile_t profile; hsa_status_t status = HSA::hsa_agent_get_info(component, HSA_AGENT_INFO_PROFILE, &profile); Image::TileMode tileMode = Image::TileMode::LINEAR; if (image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE) { tileMode = (profile == HSA_PROFILE_BASE && desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB)? Image::TileMode::TILED : Image::TileMode::LINEAR; } if (GetAddrlibSurfaceInfoAi(component, desc, tileMode, image_data_row_pitch, image_data_slice_pitch, out) == (uint32_t)(-1)) { return HSA_STATUS_ERROR; } size_t rowPitch = (out.bpp >> 3) * out.pitch; size_t slicePitch = rowPitch * out.height; if (desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB && image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR && ((image_data_row_pitch && (rowPitch != image_data_row_pitch)) || (image_data_slice_pitch && (slicePitch != image_data_slice_pitch)))) { return static_cast(HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED); } image_info.size = out.surfSize; assert(image_info.size != 0); image_info.alignment = out.baseAlign; assert(image_info.alignment != 0); return HSA_STATUS_SUCCESS; } static const uint64_t kLimitSystem = 1ULL << 48; bool ImageManagerAi::IsLocalMemory(const void* address) const { return true; } hsa_status_t ImageManagerAi::PopulateImageSrd(Image& image, const metadata_amd_t* descriptor) const { metadata_amd_ai_t* desc = (metadata_amd_ai_t*)descriptor; bool atc_access = true; const void* image_data_addr = image.data; ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); if((image_prop.cap == HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED) || (image_prop.element_size == 0)) return (hsa_status_t)HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); if (IsLocalMemory(image.data)) { atc_access = false; image_data_addr = reinterpret_cast( reinterpret_cast(image.data) - local_memory_base_address_); } image.srd[0]=desc->word0.u32All; image.srd[1]=desc->word1.u32All; image.srd[2]=desc->word2.u32All; image.srd[3]=desc->word3.u32All; image.srd[4]=desc->word4.u32All; image.srd[5]=desc->word5.u32All; image.srd[6]=desc->word6.u32All; image.srd[7]=desc->word7.u32All; if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { sq_buf_rsrc_word0_u word0; sq_buf_rsrc_word1_u word1; sq_buf_rsrc_word2_u word2; sq_buf_rsrc_word3_u word3; word0.val = 0; word0.f.base_address = PtrLow32(image_data_addr); word1.val = image.srd[1]; word1.f.base_address_hi = PtrHigh32(image_data_addr); word1.f.stride = image_prop.element_size; word3.val = image.srd[3]; word3.f.dst_sel_x = swizzle.x; word3.f.dst_sel_y = swizzle.y; word3.f.dst_sel_z = swizzle.z; word3.f.dst_sel_w = swizzle.w; word3.f.num_format = image_prop.data_type; word3.f.data_format = image_prop.data_format; word3.f.index_stride = image_prop.element_size; image.srd[0] = word0.val; image.srd[1] = word1.val; image.srd[3] = word3.val; } else { uint32_t hwPixelSize = image_lut_.GetPixelSize(desc->word1.bitfields.DATA_FORMAT, desc->word1.bitfields.NUM_FORMAT); if(image_prop.element_size!=hwPixelSize) return (hsa_status_t)HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED; ((SQ_IMG_RSRC_WORD0*)(&image.srd[0]))->bits.BASE_ADDRESS = PtrLow40Shift8(image_data_addr); ((SQ_IMG_RSRC_WORD1*)(&image.srd[1]))->bits.BASE_ADDRESS_HI = PtrHigh64Shift40(image_data_addr); ((SQ_IMG_RSRC_WORD1*)(&image.srd[1]))->bits.DATA_FORMAT = image_prop.data_format; ((SQ_IMG_RSRC_WORD1*)(&image.srd[1]))->bits.NUM_FORMAT = image_prop.data_type; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.DST_SEL_X = swizzle.x; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.DST_SEL_Y = swizzle.y; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.DST_SEL_Z = swizzle.z; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.DST_SEL_W = swizzle.w; if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DA || image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1D) { ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.TYPE = image_lut_.MapGeometry(image.desc.geometry); } // Imported metadata holds the offset to metadata, add the image base address. uintptr_t meta = uintptr_t(((SQ_IMG_RSRC_WORD5*)(&image.srd[5]))->bits.META_DATA_ADDRESS_HI) << 40; meta |= uintptr_t(((SQ_IMG_RSRC_WORD7*)(&image.srd[7]))->bits.META_DATA_ADDRESS) << 8; meta += reinterpret_cast(image_data_addr); ((SQ_IMG_RSRC_WORD7*)(&image.srd[7]))->bits.META_DATA_ADDRESS = PtrLow40Shift8((void*)meta); ((SQ_IMG_RSRC_WORD5*)(&image.srd[5]))->bits.META_DATA_ADDRESS_HI = PtrHigh64Shift40((void*)meta); } //Looks like this is only used for CPU copies. image.row_pitch = 0;//desc->word4.bits.pitch+1*desc->word3.bits.element_size; image.slice_pitch = 0;//desc->; //Used by HSAIL shader ABI image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } static TEX_BC_SWIZZLE GetBcSwizzle(const Swizzle& swizzle) { SEL r = (SEL)swizzle.x; SEL g = (SEL)swizzle.y; SEL b = (SEL)swizzle.z; SEL a = (SEL)swizzle.w; TEX_BC_SWIZZLE bcSwizzle = TEX_BC_Swizzle_XYZW; if (a == SEL_X) { // Have to use either TEX_BC_Swizzle_WZYX or TEX_BC_Swizzle_WXYZ // // For the pre-defined border color values (white, opaque black, transparent black), the only thing that // matters is that the alpha channel winds up in the correct place (because the RGB channels are all the same) // so either of these TEX_BC_Swizzle enumerations will work. Not sure what happens with border color palettes. if (b == SEL_Y) { // ABGR bcSwizzle = TEX_BC_Swizzle_WZYX; } else if ((r == SEL_X) && (g == SEL_X) && (b == SEL_X)) { //RGBA bcSwizzle = TEX_BC_Swizzle_XYZW; } else { // ARGB bcSwizzle = TEX_BC_Swizzle_WXYZ; } } else if (r == SEL_X) { // Have to use either TEX_BC_Swizzle_XYZW or TEX_BC_Swizzle_XWYZ if (g == SEL_Y) { // RGBA bcSwizzle = TEX_BC_Swizzle_XYZW; } else if((g == SEL_X) && (b == SEL_X) && (a == SEL_W)) { // RGBA bcSwizzle = TEX_BC_Swizzle_XYZW; } else { // RAGB bcSwizzle = TEX_BC_Swizzle_XWYZ; } } else if (g == SEL_X) { // GRAB, have to use TEX_BC_Swizzle_YXWZ bcSwizzle = TEX_BC_Swizzle_YXWZ; } else if (b == SEL_X) { // BGRA, have to use TEX_BC_Swizzle_ZYXW bcSwizzle = TEX_BC_Swizzle_ZYXW; } return bcSwizzle; } hsa_status_t ImageManagerAi::PopulateImageSrd(Image& image) const { ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); assert(image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); assert(image_prop.element_size != 0); bool atc_access = true; const void* image_data_addr = image.data; if (IsLocalMemory(image.data)) { atc_access = false; image_data_addr = reinterpret_cast( reinterpret_cast(image.data) - local_memory_base_address_); } if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { sq_buf_rsrc_word0_u word0; sq_buf_rsrc_word1_u word1; sq_buf_rsrc_word2_u word2; sq_buf_rsrc_word3_u word3; word0.val = 0; word0.f.base_address = PtrLow32(image_data_addr); word1.val = 0; word1.f.base_address_hi = PtrHigh32(image_data_addr); word1.f.stride = image_prop.element_size; word1.f.swizzle_enable = false; word1.f.cache_swizzle = false; word2.f.num_records = image.desc.width * image_prop.element_size; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); word3.val = 0; word3.f.dst_sel_x = swizzle.x; word3.f.dst_sel_y = swizzle.y; word3.f.dst_sel_z = swizzle.z; word3.f.dst_sel_w = swizzle.w; word3.f.num_format = image_prop.data_type; word3.f.data_format = image_prop.data_format; word3.f.index_stride = image_prop.element_size; word3.f.type = image_lut_.MapGeometry(image.desc.geometry); image.srd[0] = word0.val; image.srd[1] = word1.val; image.srd[2] = word2.val; image.srd[3] = word3.val; image.row_pitch = image.desc.width * image_prop.element_size; image.slice_pitch = image.row_pitch; } else { sq_img_rsrc_word0_u word0; sq_img_rsrc_word1_u word1; sq_img_rsrc_word2_u word2; sq_img_rsrc_word3_u word3; sq_img_rsrc_word4_u word4; sq_img_rsrc_word5_u word5; sq_img_rsrc_word6_u word6; sq_img_rsrc_word7_u word7; ADDR2_COMPUTE_SURFACE_INFO_OUTPUT out = {0}; uint32_t swizzleMode = GetAddrlibSurfaceInfoAi(image.component, image.desc, image.tile_mode, image.row_pitch, image.slice_pitch, out); if (swizzleMode == (uint32_t)(-1)) { return HSA_STATUS_ERROR; } assert((out.bpp / 8) == image_prop.element_size); const size_t row_pitch_size = out.pitch * image_prop.element_size; word0.f.base_address = PtrLow40Shift8(image_data_addr); word1.val = 0; word1.f.base_address_hi = PtrHigh64Shift40(image_data_addr); word1.f.min_lod = 0; word1.f.data_format = image_prop.data_format; word1.f.num_format = image_prop.data_type; word2.val = 0; word2.f.width = image.desc.width - 1; word2.f.height = image.desc.height - 1; word2.f.perf_mod = 0; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); word3.val = 0; word3.f.dst_sel_x = swizzle.x; word3.f.dst_sel_y = swizzle.y; word3.f.dst_sel_z = swizzle.z; word3.f.dst_sel_w = swizzle.w; word3.f.sw_mode = swizzleMode; word3.f.type = image_lut_.MapGeometry(image.desc.geometry); const bool image_array = (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DA || image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_2DA || image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_2DADEPTH); const bool image_3d = (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_3D); word4.val = 0; word4.f.depth = (image_array) ? std::max(image.desc.array_size, static_cast(1)) - 1 : (image_3d) ? image.desc.depth - 1 : 0; word4.f.pitch = out.pitch - 1; word4.f.bc_swizzle = GetBcSwizzle(swizzle); word5.val = 0; word6.val = 0; word7.val = 0; image.srd[0] = word0.val; image.srd[1] = word1.val; image.srd[2] = word2.val; image.srd[3] = word3.val; image.srd[4] = word4.val; image.srd[5] = word5.val; image.srd[6] = word6.val; image.srd[7] = word7.val; image.row_pitch = row_pitch_size; image.slice_pitch = out.sliceSize; } image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } hsa_status_t ImageManagerAi::ModifyImageSrd( Image& image, hsa_ext_image_format_t& new_format) const { image.desc.format = new_format; ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); assert(image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); assert(image_prop.element_size != 0); if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); SQ_BUF_RSRC_WORD3* word3 = reinterpret_cast(&image.srd[3]); word3->bits.DST_SEL_X = swizzle.x; word3->bits.DST_SEL_Y = swizzle.y; word3->bits.DST_SEL_Z = swizzle.z; word3->bits.DST_SEL_W = swizzle.w; word3->bits.NUM_FORMAT = image_prop.data_type; word3->bits.DATA_FORMAT = image_prop.data_format; } else { SQ_IMG_RSRC_WORD1* word1 = reinterpret_cast(&image.srd[1]); word1->bits.DATA_FORMAT = image_prop.data_format; word1->bits.NUM_FORMAT = image_prop.data_type; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); SQ_IMG_RSRC_WORD3* word3 = reinterpret_cast(&image.srd[3]); word3->bits.DST_SEL_X = swizzle.x; word3->bits.DST_SEL_Y = swizzle.y; word3->bits.DST_SEL_Z = swizzle.z; word3->bits.DST_SEL_W = swizzle.w; } image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } hsa_status_t ImageManagerAi::PopulateSamplerSrd(Sampler& sampler) const { const hsa_ext_sampler_descriptor_t sampler_descriptor = sampler.desc; SQ_IMG_SAMP_WORD0 word0; SQ_IMG_SAMP_WORD1 word1; SQ_IMG_SAMP_WORD2 word2; SQ_IMG_SAMP_WORD3 word3; word0.u32All = 0; switch (sampler_descriptor.address_mode) { case HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_EDGE: word0.bits.CLAMP_X = static_cast(SQ_TEX_CLAMP_LAST_TEXEL); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_BORDER: word0.bits.CLAMP_X = static_cast(SQ_TEX_CLAMP_BORDER); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_MIRRORED_REPEAT: word0.bits.CLAMP_X = static_cast(SQ_TEX_MIRROR); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_UNDEFINED: case HSA_EXT_SAMPLER_ADDRESSING_MODE_REPEAT: word0.bits.CLAMP_X = static_cast(SQ_TEX_WRAP); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } word0.bits.CLAMP_Y = word0.bits.CLAMP_X; word0.bits.CLAMP_Z = word0.bits.CLAMP_X; word0.bits.FORCE_UNNORMALIZED = (sampler_descriptor.coordinate_mode == HSA_EXT_SAMPLER_COORDINATE_MODE_UNNORMALIZED); word1.u32All = 0; word1.bits.MAX_LOD = 4095; word2.u32All = 0; switch (sampler_descriptor.filter_mode) { case HSA_EXT_SAMPLER_FILTER_MODE_NEAREST: word2.bits.XY_MAG_FILTER = static_cast(SQ_TEX_XY_FILTER_POINT); break; case HSA_EXT_SAMPLER_FILTER_MODE_LINEAR: word2.bits.XY_MAG_FILTER = static_cast(SQ_TEX_XY_FILTER_BILINEAR); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } word2.bits.XY_MIN_FILTER = word2.bits.XY_MAG_FILTER; word2.bits.Z_FILTER = SQ_TEX_Z_FILTER_NONE; word2.bits.MIP_FILTER = SQ_TEX_MIP_FILTER_NONE; word3.u32All = 0; // TODO: check this bit with HSAIL spec. word3.bits.BORDER_COLOR_TYPE = SQ_TEX_BORDER_COLOR_TRANS_BLACK; sampler.srd[0] = word0.u32All; sampler.srd[1] = word1.u32All; sampler.srd[2] = word2.u32All; sampler.srd[3] = word3.u32All; return HSA_STATUS_SUCCESS; } uint32_t ImageManagerAi::GetAddrlibSurfaceInfoAi( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, Image::TileMode tileMode, size_t image_data_row_pitch, size_t image_data_slice_pitch, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT& out) const { const ImageProperty image_prop = GetImageProperty(component, desc.format, desc.geometry); const AddrFormat addrlib_format = GetAddrlibFormat(image_prop); const uint32_t width = static_cast(desc.width); const uint32_t height = static_cast(desc.height); static const size_t kMinNumSlice = 1; const uint32_t num_slice = static_cast( std::max(kMinNumSlice, std::max(desc.array_size, desc.depth))); ADDR2_COMPUTE_SURFACE_INFO_INPUT in = {0}; in.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_INPUT); in.format = addrlib_format; in.bpp = static_cast(image_prop.element_size) * 8; in.width = width; in.height = height; in.numSlices = num_slice; in.pitchInElement = image_data_row_pitch / image_prop.element_size; switch(desc.geometry) { case HSA_EXT_IMAGE_GEOMETRY_1D: case HSA_EXT_IMAGE_GEOMETRY_1DB: case HSA_EXT_IMAGE_GEOMETRY_1DA: in.resourceType = ADDR_RSRC_TEX_1D; break; case HSA_EXT_IMAGE_GEOMETRY_2D: case HSA_EXT_IMAGE_GEOMETRY_2DDEPTH: case HSA_EXT_IMAGE_GEOMETRY_2DA: case HSA_EXT_IMAGE_GEOMETRY_2DADEPTH: in.resourceType = ADDR_RSRC_TEX_2D; break; case HSA_EXT_IMAGE_GEOMETRY_3D: in.resourceType = ADDR_RSRC_TEX_3D; break; } in.flags.texture = 1; ADDR2_GET_PREFERRED_SURF_SETTING_INPUT prefSettingsInput = { 0 }; ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT prefSettingsOutput = { 0 }; prefSettingsInput.size = sizeof(prefSettingsInput); prefSettingsInput.flags = in.flags; prefSettingsInput.bpp = in.bpp; prefSettingsInput.format = in.format; prefSettingsInput.width = in.width; prefSettingsInput.height = in.height; prefSettingsInput.numFrags = in.numFrags; prefSettingsInput.numSamples = in.numSamples; prefSettingsInput.numMipLevels = in.numMipLevels; prefSettingsInput.numSlices = in.numSlices; prefSettingsInput.resourceLoction = ADDR_RSRC_LOC_UNDEF; prefSettingsInput.resourceType = in.resourceType; // Disallow all swizzles but linear. if (tileMode == Image::TileMode::LINEAR) { prefSettingsInput.forbiddenBlock.macroThin4KB = 1; prefSettingsInput.forbiddenBlock.macroThick4KB = 1; prefSettingsInput.forbiddenBlock.macroThin64KB = 1; prefSettingsInput.forbiddenBlock.macroThick64KB = 1; } prefSettingsInput.forbiddenBlock.micro = 1; // but don't ever allow the 256b swizzle modes prefSettingsInput.forbiddenBlock.var = 1; // and don't allow variable-size block modes if (ADDR_OK != Addr2GetPreferredSurfaceSetting(addr_lib_, &prefSettingsInput, &prefSettingsOutput)) { return (uint32_t)(-1); } in.swizzleMode = prefSettingsOutput.swizzleMode; out.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_OUTPUT); if (ADDR_OK != Addr2ComputeSurfaceInfo(addr_lib_, &in, &out)) { return (uint32_t)(-1); } if (out.surfSize == 0) { return (uint32_t)(-1); } return in.swizzleMode; } } // namespace image } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/image/image_manager_ai.h000066400000000000000000000076361420110115200221060ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_IMAGE_MANAGER_AI_H #define HSA_RUNTIME_EXT_IMAGE_IMAGE_MANAGER_AI_H #include "addrlib/inc/addrinterface.h" #include "image_manager_kv.h" namespace rocr { namespace image { class ImageManagerAi : public ImageManagerKv { public: explicit ImageManagerAi(); virtual ~ImageManagerAi(); /// @brief Calculate the size and alignment of the backing storage of an /// image. virtual hsa_status_t CalculateImageSizeAndAlignment( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t& image_info) const; /// @brief Fill image structure with device specific image object. virtual hsa_status_t PopulateImageSrd(Image& image) const; /// @brief Fill image structure with device specific image object using the given format. virtual hsa_status_t PopulateImageSrd(Image& image, const metadata_amd_t* desc) const; /// @brief Modify device specific image object according to the specified /// new format. virtual hsa_status_t ModifyImageSrd(Image& image, hsa_ext_image_format_t& new_format) const; /// @brief Fill sampler structure with device specific sampler object. virtual hsa_status_t PopulateSamplerSrd(Sampler& sampler) const; protected: uint32_t GetAddrlibSurfaceInfoAi(hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, Image::TileMode tileMode, size_t image_data_row_pitch, size_t image_data_slice_pitch, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT& out) const; bool IsLocalMemory(const void* address) const; private: DISALLOW_COPY_AND_ASSIGN(ImageManagerAi); }; } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_IMAGE_MANAGER_AI_H ROCR-Runtime-rocm-5.0.0/src/image/image_manager_kv.cpp000077500000000000000000001062401420110115200224620ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #define NOMINMAX #include "image_manager_kv.h" #include #include #include #include "hsakmt.h" #include "inc/hsa_ext_amd.h" #include "core/inc/hsa_internal.h" #include "core/inc/hsa_ext_amd_impl.h" #include "addrlib/inc/addrinterface.h" #include "addrlib/src/core/addrlib.h" #include "image_runtime.h" #include "resource.h" #include "resource_kv.h" #include "util.h" #include "device_info.h" namespace rocr { namespace image { ImageManagerKv::ImageManagerKv() : ImageManager() {} ImageManagerKv::~ImageManagerKv() {} hsa_status_t ImageManagerKv::Initialize(hsa_agent_t agent_handle) { agent_ = agent_handle; hsa_status_t status = GetGPUAsicID(agent_, &chip_id_); uint32_t major_ver = MajorVerFromDevID(chip_id_); assert(status == HSA_STATUS_SUCCESS); family_type_ = DevIDToAddrLibFamily(chip_id_); HsaGpuTileConfig tileConfig = {0}; unsigned int tc[40]; unsigned int mtc[40]; tileConfig.TileConfig = &tc[0]; tileConfig.NumTileConfigs = 40; tileConfig.MacroTileConfig = &mtc[0]; tileConfig.NumMacroTileConfigs = 40; uint32_t node_id = 0; status = HSA::hsa_agent_get_info( agent_, static_cast(HSA_AMD_AGENT_INFO_DRIVER_NODE_ID), &node_id); assert(status == HSA_STATUS_SUCCESS); HSAKMT_STATUS stat = hsaKmtGetTileConfig(node_id, &tileConfig); assert(stat == HSAKMT_STATUS_SUCCESS); // Initialize address library. // TODO(bwicakso) hard coded based on UGL parameters. // Need to get this information from KMD. addr_lib_ = NULL; ADDR_CREATE_INPUT addr_create_input = {0}; ADDR_CREATE_OUTPUT addr_create_output = {0}; if (major_ver >= 9) { addr_create_input.chipEngine = CIASICIDGFXENGINE_ARCTICISLAND; } else { addr_create_input.chipEngine = CIASICIDGFXENGINE_SOUTHERNISLAND; } addr_create_input.chipFamily = family_type_; addr_create_input.chipRevision = 0; // TODO(bwicakso): find how to get this. ADDR_CREATE_FLAGS create_flags = {0}; create_flags.value = 0; create_flags.useTileIndex = 1; addr_create_input.createFlags = create_flags; addr_create_input.callbacks.allocSysMem = AllocSysMem; addr_create_input.callbacks.freeSysMem = FreeSysMem; addr_create_input.callbacks.debugPrint = 0; ADDR_REGISTER_VALUE reg_val = {0}; reg_val.gbAddrConfig = tileConfig.GbAddrConfig; reg_val.noOfBanks = tileConfig.NumBanks; reg_val.noOfRanks = tileConfig.NumRanks; reg_val.pTileConfig = tileConfig.TileConfig; reg_val.noOfEntries = tileConfig.NumTileConfigs; reg_val.noOfMacroEntries = tileConfig.NumMacroTileConfigs; reg_val.pMacroTileConfig = tileConfig.MacroTileConfig; addr_create_input.regValue = reg_val; addr_create_input.minPitchAlignPixels = 0; ADDR_E_RETURNCODE addr_ret = AddrCreate(&addr_create_input, &addr_create_output); if (addr_ret == ADDR_OK) { addr_lib_ = addr_create_output.hLib; } else { return HSA_STATUS_ERROR; } // The ImageManagerKv::Initialize is called on the first call to // hsa_ext_image_*, so checking the coherency mode here is fine as long as // the change to the coherency mode happens before a call to // hsa_ext_image_create. hsa_amd_coherency_type_t coherency_type; status = AMD::hsa_amd_coherency_get_type(agent_, &coherency_type); assert(status == HSA_STATUS_SUCCESS); mtype_ = (coherency_type == HSA_AMD_COHERENCY_TYPE_COHERENT) ? 3 : 1; // TODO: handle the case where the call to hsa_set_memory_type happens after // hsa_ext_image_create. hsa_region_t local_region = {0}; status = HSA::hsa_agent_iterate_regions(agent_, GetLocalMemoryRegion, &local_region); assert(status == HSA_STATUS_SUCCESS); local_memory_base_address_ = 0; if (local_region.handle != 0) { status = HSA::hsa_region_get_info(local_region, static_cast(HSA_AMD_REGION_INFO_BASE), &local_memory_base_address_); assert(status == HSA_STATUS_SUCCESS); } // Zeroed the queue object so it can be created on demand. blit_queue_.queue_ = NULL; blit_queue_.cached_index_ = 0; return HSA_STATUS_SUCCESS; } void ImageManagerKv::Cleanup() { if (blit_queue_.queue_ != NULL) { HSA::hsa_queue_destroy(blit_queue_.queue_); } if (addr_lib_ != NULL) { AddrDestroy(addr_lib_); } } ImageProperty ImageManagerKv::GetImageProperty( hsa_agent_t component, const hsa_ext_image_format_t& format, hsa_ext_image_geometry_t geometry) const { return image_lut_.MapFormat(format, geometry); } void ImageManagerKv::GetImageInfoMaxDimension(hsa_agent_t component, hsa_ext_image_geometry_t geometry, uint32_t& width, uint32_t& height, uint32_t& depth, uint32_t& array_size) const { width = image_lut_.GetMaxWidth(geometry); height = image_lut_.GetMaxHeight(geometry); depth = image_lut_.GetMaxDepth(geometry); array_size = image_lut_.GetMaxArraySize(geometry); } hsa_status_t ImageManagerKv::CalculateImageSizeAndAlignment( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t& image_info) const { ADDR_COMPUTE_SURFACE_INFO_OUTPUT out = {0}; hsa_profile_t profile; hsa_status_t status = HSA::hsa_agent_get_info(component, HSA_AGENT_INFO_PROFILE, &profile); Image::TileMode tileMode = Image::TileMode::LINEAR; if (image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE) { tileMode = (profile == HSA_PROFILE_BASE && desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB)? Image::TileMode::TILED : Image::TileMode::LINEAR; } if (!GetAddrlibSurfaceInfo(component, desc, tileMode, image_data_row_pitch, image_data_slice_pitch, out)) { return HSA_STATUS_ERROR; } size_t rowPitch = (out.bpp >> 3) * out.pitch; size_t slicePitch = rowPitch * out.height; if (desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB && image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR && ((image_data_row_pitch && (rowPitch != image_data_row_pitch)) || (image_data_slice_pitch && (slicePitch != image_data_slice_pitch)))) { return static_cast(HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED); } image_info.size = out.surfSize; assert(image_info.size != 0); image_info.alignment = out.baseAlign; assert(image_info.alignment != 0); return HSA_STATUS_SUCCESS; } static const uint64_t kLimitSystem = 1ULL << 48; bool ImageManagerKv::IsLocalMemory(const void* address) const { uintptr_t u_address = reinterpret_cast(address); uint32_t major_ver = MajorVerFromDevID(chip_id_); if (major_ver >= 8) { return true; } #ifdef HSA_LARGE_MODEL // Fast path without querying local memory region info. // User mode system memory addressable by CPU is 0 to 2^48. return (u_address >= kLimitSystem); #else // No local memory on 32 bit. return false; #endif } hsa_status_t ImageManagerKv::PopulateImageSrd(Image& image, const metadata_amd_t* descriptor) const { metadata_amd_ci_vi_t* desc = (metadata_amd_ci_vi_t*)descriptor; bool atc_access = true; uint32_t mtype = mtype_; const void* image_data_addr = image.data; ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); if((image_prop.cap == HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED) || (image_prop.element_size == 0)) return (hsa_status_t)HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED; uint32_t hwPixelSize = image_lut_.GetPixelSize(desc->word1.bitfields.data_format, desc->word1.bitfields.num_format); if(image_prop.element_size!=hwPixelSize) return (hsa_status_t)HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); if (IsLocalMemory(image.data)) { atc_access = false; mtype = 1; image_data_addr = reinterpret_cast( reinterpret_cast(image.data) - local_memory_base_address_); } image.srd[0]=desc->word0.u32_all; image.srd[1]=desc->word1.u32_all; image.srd[2]=desc->word2.u32_all; image.srd[3]=desc->word3.u32_all; image.srd[4]=desc->word4.u32_all; image.srd[5]=desc->word5.u32_all; image.srd[6]=desc->word6.u32_all; image.srd[7]=desc->word7.u32_all; ((SQ_IMG_RSRC_WORD0*)(&image.srd[0]))->bits.base_address = PtrLow40Shift8(image_data_addr); ((SQ_IMG_RSRC_WORD1*)(&image.srd[1]))->bits.base_address_hi = PtrHigh64Shift40(image_data_addr); ((SQ_IMG_RSRC_WORD1*)(&image.srd[1]))->bits.data_format = image_prop.data_format; ((SQ_IMG_RSRC_WORD1*)(&image.srd[1]))->bits.num_format = image_prop.data_type; ((SQ_IMG_RSRC_WORD1*)(&image.srd[1]))->bits.mtype = mtype; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.atc=atc_access; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.dst_sel_x = swizzle.x; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.dst_sel_y = swizzle.y; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.dst_sel_z = swizzle.z; ((SQ_IMG_RSRC_WORD3*)(&image.srd[3]))->bits.dst_sel_w = swizzle.w; ((SQ_IMG_RSRC_WORD7*)(&image.srd[7]))->bits.meta_data_address += PtrLow40Shift8(image_data_addr); //Looks like this is only used for CPU copies. image.row_pitch = (desc->word4.bits.pitch+1)*image_prop.element_size; image.slice_pitch = image.row_pitch * (desc->word2.bits.height+1); //Used by HSAIL shader ABI image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } hsa_status_t ImageManagerKv::PopulateImageSrd(Image& image) const { ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); assert(image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); assert(image_prop.element_size != 0); bool atc_access = true; uint32_t mtype = mtype_; const void* image_data_addr = image.data; if (IsLocalMemory(image.data)) { atc_access = false; mtype = 1; image_data_addr = reinterpret_cast( reinterpret_cast(image.data) - local_memory_base_address_); } if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { SQ_BUF_RSRC_WORD0 word0; SQ_BUF_RSRC_WORD1 word1; SQ_BUF_RSRC_WORD2 word2; SQ_BUF_RSRC_WORD3 word3; word0.u32_all = 0; word0.bits.base_address = PtrLow32(image_data_addr); word1.u32_all = 0; word1.bits.base_address_hi = PtrHigh32(image_data_addr); word1.bits.stride = image_prop.element_size; word1.bits.swizzle_enable = false; word1.bits.cache_swizzle = false; uint32_t major_ver = MajorVerFromDevID(chip_id_); word2.bits.num_records = (major_ver < 8) ? image.desc.width : image.desc.width * image_prop.element_size; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); word3.u32_all = 0; word3.bits.dst_sel_x = swizzle.x; word3.bits.dst_sel_y = swizzle.y; word3.bits.dst_sel_z = swizzle.z; word3.bits.dst_sel_w = swizzle.w; word3.bits.num_format = image_prop.data_type; word3.bits.data_format = image_prop.data_format; word3.bits.atc = atc_access; word3.bits.element_size = image_prop.element_size; word3.bits.type = image_lut_.MapGeometry(image.desc.geometry); word3.bits.mtype = mtype; image.srd[0] = word0.u32_all; image.srd[1] = word1.u32_all; image.srd[2] = word2.u32_all; image.srd[3] = word3.u32_all; image.row_pitch = image.desc.width * image_prop.element_size; image.slice_pitch = image.row_pitch; } else { SQ_IMG_RSRC_WORD0 word0; SQ_IMG_RSRC_WORD1 word1; SQ_IMG_RSRC_WORD2 word2; SQ_IMG_RSRC_WORD3 word3; SQ_IMG_RSRC_WORD4 word4; SQ_IMG_RSRC_WORD5 word5; SQ_IMG_RSRC_WORD6 word6; SQ_IMG_RSRC_WORD7 word7; ADDR_COMPUTE_SURFACE_INFO_OUTPUT out = {0}; if (!GetAddrlibSurfaceInfo(image.component, image.desc, image.tile_mode, image.row_pitch, image.slice_pitch, out)) { return HSA_STATUS_ERROR; } assert((out.bpp / 8) == image_prop.element_size); const size_t row_pitch_size = out.pitch * image_prop.element_size; word0.bits.base_address = PtrLow40Shift8(image_data_addr); word1.u32_all = 0; word1.bits.base_address_hi = PtrHigh64Shift40(image_data_addr); word1.bits.min_lod = 0; word1.bits.data_format = image_prop.data_format; word1.bits.num_format = image_prop.data_type; word1.bits.mtype = mtype; word2.u32_all = 0; word2.bits.width = image.desc.width - 1; word2.bits.height = image.desc.height - 1; word2.bits.perf_mod = 0; word2.bits.interlaced = 0; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); word3.u32_all = 0; word3.bits.dst_sel_x = swizzle.x; word3.bits.dst_sel_y = swizzle.y; word3.bits.dst_sel_z = swizzle.z; word3.bits.dst_sel_w = swizzle.w; word3.bits.tiling_index = out.tileIndex; word3.bits.pow2_pad = (IsPowerOfTwo(row_pitch_size) && IsPowerOfTwo(image.desc.height)) ? 1 : 0; word3.bits.type = image_lut_.MapGeometry(image.desc.geometry); word3.bits.atc = atc_access; const bool image_array = (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DA || image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_2DA || image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_2DADEPTH); const bool image_3d = (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_3D); word4.u32_all = 0; word4.bits.depth = (image_array) ? std::max(image.desc.array_size, static_cast(1)) - 1 : (image_3d) ? image.desc.depth - 1 : 0; word4.bits.pitch = out.pitch - 1; word5.u32_all = 0; word5.bits.last_array = (image_array) ? (std::max(image.desc.array_size, static_cast(1)) - 1) : 0; word6.u32_all = 0; word7.u32_all = 0; image.srd[0] = word0.u32_all; image.srd[1] = word1.u32_all; image.srd[2] = word2.u32_all; image.srd[3] = word3.u32_all; image.srd[4] = word4.u32_all; image.srd[5] = word5.u32_all; image.srd[6] = word6.u32_all; image.srd[7] = word7.u32_all; image.row_pitch = row_pitch_size; image.slice_pitch = out.sliceSize; } image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } hsa_status_t ImageManagerKv::ModifyImageSrd( Image& image, hsa_ext_image_format_t& new_format) const { image.desc.format = new_format; ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); assert(image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); assert(image_prop.element_size != 0); if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); SQ_BUF_RSRC_WORD3* word3 = reinterpret_cast(&image.srd[3]); word3->bits.dst_sel_x = swizzle.x; word3->bits.dst_sel_y = swizzle.y; word3->bits.dst_sel_z = swizzle.z; word3->bits.dst_sel_w = swizzle.w; word3->bits.num_format = image_prop.data_type; word3->bits.data_format = image_prop.data_format; } else { SQ_IMG_RSRC_WORD1* word1 = reinterpret_cast(&image.srd[1]); word1->bits.data_format = image_prop.data_format; word1->bits.num_format = image_prop.data_type; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); SQ_IMG_RSRC_WORD3* word3 = reinterpret_cast(&image.srd[3]); word3->bits.dst_sel_x = swizzle.x; word3->bits.dst_sel_y = swizzle.y; word3->bits.dst_sel_z = swizzle.z; word3->bits.dst_sel_w = swizzle.w; } image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } hsa_status_t ImageManagerKv::PopulateSamplerSrd(Sampler& sampler) const { const hsa_ext_sampler_descriptor_t sampler_descriptor = sampler.desc; SQ_IMG_SAMP_WORD0 word0; SQ_IMG_SAMP_WORD1 word1; SQ_IMG_SAMP_WORD2 word2; SQ_IMG_SAMP_WORD3 word3; word0.u32_all = 0; switch (sampler_descriptor.address_mode) { case HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_EDGE: word0.bits.clamp_x = static_cast(SQ_TEX_CLAMP_LAST_TEXEL); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_BORDER: word0.bits.clamp_x = static_cast(SQ_TEX_CLAMP_BORDER); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_MIRRORED_REPEAT: word0.bits.clamp_x = static_cast(SQ_TEX_MIRROR); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_UNDEFINED: case HSA_EXT_SAMPLER_ADDRESSING_MODE_REPEAT: word0.bits.clamp_x = static_cast(SQ_TEX_WRAP); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } word0.bits.clamp_y = word0.bits.clamp_x; word0.bits.clamp_z = word0.bits.clamp_x; word0.bits.force_unormalized = (sampler_descriptor.coordinate_mode == HSA_EXT_SAMPLER_COORDINATE_MODE_UNNORMALIZED); word1.u32_all = 0; word1.bits.max_lod = 4095; word2.u32_all = 0; switch (sampler_descriptor.filter_mode) { case HSA_EXT_SAMPLER_FILTER_MODE_NEAREST: word2.bits.xy_mag_filter = static_cast(SQ_TEX_XY_FILTER_POINT); break; case HSA_EXT_SAMPLER_FILTER_MODE_LINEAR: word2.bits.xy_mag_filter = static_cast(SQ_TEX_XY_FILTER_BILINEAR); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } word2.bits.xy_min_filter = word2.bits.xy_mag_filter; word2.bits.z_filter = SQ_TEX_Z_FILTER_NONE; word2.bits.mip_filter = SQ_TEX_MIP_FILTER_NONE; word3.u32_all = 0; // TODO: check this bit with HSAIL spec. word3.bits.border_color_type = SQ_TEX_BORDER_COLOR_TRANS_BLACK; sampler.srd[0] = word0.u32_all; sampler.srd[1] = word1.u32_all; sampler.srd[2] = word2.u32_all; sampler.srd[3] = word3.u32_all; return HSA_STATUS_SUCCESS; } hsa_status_t ImageManagerKv::CopyBufferToImage( const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, const Image& dst_image, const hsa_ext_image_region_t& image_region) { if (BlitQueueInit().queue_ == NULL) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } return ImageRuntime::instance()->blit_kernel().CopyBufferToImage( blit_queue_, blit_code_catalog_, src_memory, src_row_pitch, src_slice_pitch, dst_image, image_region); } hsa_status_t ImageManagerKv::CopyImageToBuffer( const Image& src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t& image_region) { if (BlitQueueInit().queue_ == NULL) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } return ImageRuntime::instance()->blit_kernel().CopyImageToBuffer( blit_queue_, blit_code_catalog_, src_image, dst_memory, dst_row_pitch, dst_slice_pitch, image_region); } hsa_status_t ImageManagerKv::CopyImage(const Image& dst_image, const Image& src_image, const hsa_dim3_t& dst_origin, const hsa_dim3_t& src_origin, const hsa_dim3_t size) { if (BlitQueueInit().queue_ == NULL) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } const hsa_ext_image_format_t src_format = src_image.desc.format; const hsa_ext_image_channel_order32_t src_order = src_format.channel_order; const hsa_ext_image_channel_type32_t src_type = src_format.channel_type; const hsa_ext_image_format_t dst_format = dst_image.desc.format; const hsa_ext_image_channel_order32_t dst_order = dst_format.channel_order; const hsa_ext_image_channel_type32_t dst_type = dst_format.channel_type; BlitKernel::KernelOp copy_type = BlitKernel::KERNEL_OP_COPY_IMAGE_DEFAULT; if ((src_order == dst_order) && (src_type == dst_type)) { return ImageRuntime::instance()->blit_kernel().CopyImage(blit_queue_, blit_code_catalog_, dst_image, src_image, dst_origin, src_origin, size, copy_type); } // Source and destination format must be the same, except for // SRGBA <--> RGBA images. if ((src_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8) && (dst_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8)) { if ((src_order == HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA) && (dst_order == HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA)) { copy_type = BlitKernel::KERNEL_OP_COPY_IMAGE_STANDARD_TO_LINEAR; } else if ((src_order == HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA) && (dst_order == HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA)) { copy_type = BlitKernel::KERNEL_OP_COPY_IMAGE_LINEAR_TO_STANDARD; } if (copy_type != BlitKernel::KERNEL_OP_COPY_IMAGE_DEFAULT) { // KV and CZ don't have write support for SRGBA image, so treat the // destination image as RGBA image. SQ_IMG_RSRC_WORD1* word1 = reinterpret_cast( &const_cast(dst_image).srd[1]); // Destination can be linear or standard, preserve the original value. uint32_t num_format_original = word1->bits.num_format; word1->bits.num_format = TYPE_UNORM; hsa_status_t status = ImageRuntime::instance()->blit_kernel().CopyImage( blit_queue_, blit_code_catalog_, dst_image, src_image, dst_origin, src_origin, size, copy_type); // Revert to the original format after the copy operation is finished. word1->bits.num_format = num_format_original; return status; } } return HSA_STATUS_ERROR_INVALID_ARGUMENT; } hsa_status_t ImageManagerKv::FillImage(const Image& image, const void* pattern, const hsa_ext_image_region_t& region) { if (BlitQueueInit().queue_ == NULL) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } Image* image_view = const_cast(&image); SQ_BUF_RSRC_WORD3* word3_buff = NULL; SQ_IMG_RSRC_WORD3* word3_image = NULL; uint32_t dst_sel_w_original = 0; if (image_view->desc.format.channel_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010) { // Force GPU to ignore the last two bits (alpha bits). if (image_view->desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { word3_buff = reinterpret_cast(&image_view->srd[3]); dst_sel_w_original = word3_buff->bits.dst_sel_w; word3_buff->bits.dst_sel_w = SEL_0; } else { word3_image = reinterpret_cast(&image_view->srd[3]); dst_sel_w_original = word3_image->bits.dst_sel_w; word3_image->bits.dst_sel_w = SEL_0; } } SQ_IMG_RSRC_WORD1* word1 = NULL; uint32_t num_format_original = 0; const void* new_pattern = pattern; float fill_value[4] = {0}; switch (image_view->desc.format.channel_order) { case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX: case HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA: { // KV and CZ don't have write support for SRGBA image, so convert pattern // to standard form and treat the image as RGBA image. const float* pattern_f = reinterpret_cast(pattern); fill_value[0] = LinearToStandardRGB(pattern_f[0]); fill_value[1] = LinearToStandardRGB(pattern_f[1]); fill_value[2] = LinearToStandardRGB(pattern_f[2]); fill_value[3] = pattern_f[3]; new_pattern = fill_value; word1 = reinterpret_cast(&image_view->srd[1]); num_format_original = word1->bits.num_format; word1->bits.num_format = TYPE_UNORM; } break; default: break; } hsa_status_t status = ImageRuntime::instance()->blit_kernel().FillImage( blit_queue_, blit_code_catalog_, *image_view, new_pattern, region); // Revert back original configuration. if (word3_buff != NULL) { word3_buff->bits.dst_sel_w = dst_sel_w_original; } if (word3_image != NULL) { word3_image->bits.dst_sel_w = dst_sel_w_original; } if (word1 != NULL) { word1->bits.num_format = num_format_original; } return status; } hsa_status_t ImageManagerKv::GetLocalMemoryRegion(hsa_region_t region, void* data) { if (data == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } hsa_region_segment_t segment; hsa_status_t stat = HSA::hsa_region_get_info(region, HSA_REGION_INFO_SEGMENT, &segment); if (stat != HSA_STATUS_SUCCESS) { return stat; } if (segment != HSA_REGION_SEGMENT_GLOBAL) { return HSA_STATUS_SUCCESS; } uint32_t base = 0; stat = HSA::hsa_region_get_info(region, HSA_REGION_INFO_GLOBAL_FLAGS, &base); if (stat != HSA_STATUS_SUCCESS) { return stat; } if ((base & HSA_REGION_GLOBAL_FLAG_COARSE_GRAINED) != 0) { hsa_region_t* local_memory_region = (hsa_region_t*)data; *local_memory_region = region; } return HSA_STATUS_SUCCESS; } AddrFormat ImageManagerKv::GetAddrlibFormat(const ImageProperty& image_prop) { switch (image_prop.data_format) { case FMT_8: return ADDR_FMT_8; break; case FMT_16: return (image_prop.data_type != TYPE_FLOAT) ? ADDR_FMT_16 : ADDR_FMT_16_FLOAT; break; case FMT_8_8: return ADDR_FMT_8_8; break; case FMT_32: return (image_prop.data_type != TYPE_FLOAT) ? ADDR_FMT_32 : ADDR_FMT_32_FLOAT; break; case FMT_16_16: return (image_prop.data_type != TYPE_FLOAT) ? ADDR_FMT_16_16 : ADDR_FMT_16_16_FLOAT; break; case FMT_2_10_10_10: return ADDR_FMT_2_10_10_10; break; case FMT_8_8_8_8: return ADDR_FMT_8_8_8_8; break; case FMT_32_32: return (image_prop.data_type != TYPE_FLOAT) ? ADDR_FMT_32_32 : ADDR_FMT_32_32_FLOAT; break; case FMT_16_16_16_16: return (image_prop.data_type != TYPE_FLOAT) ? ADDR_FMT_16_16_16_16 : ADDR_FMT_16_16_16_16_FLOAT; break; case FMT_32_32_32_32: return (image_prop.data_type != TYPE_FLOAT) ? ADDR_FMT_32_32_32_32 : ADDR_FMT_32_32_32_32_FLOAT; break; case FMT_5_6_5: return ADDR_FMT_5_6_5; break; case FMT_1_5_5_5: return ADDR_FMT_1_5_5_5; break; case FMT_8_24: return ADDR_FMT_8_24; break; default: assert(false && "Should not reach here"); return ADDR_FMT_INVALID; break; } assert(false && "Should not reach here"); return ADDR_FMT_INVALID; } VOID* ADDR_API ImageManagerKv::AllocSysMem(const ADDR_ALLOCSYSMEM_INPUT* input) { return malloc(input->sizeInBytes); } ADDR_E_RETURNCODE ADDR_API ImageManagerKv::FreeSysMem(const ADDR_FREESYSMEM_INPUT* input) { free(input->pVirtAddr); return ADDR_OK; } bool ImageManagerKv::GetAddrlibSurfaceInfo( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, Image::TileMode tileMode, size_t image_data_row_pitch, size_t image_data_slice_pitch, ADDR_COMPUTE_SURFACE_INFO_OUTPUT& out) const { const ImageProperty image_prop = GetImageProperty(component, desc.format, desc.geometry); const AddrFormat addrlib_format = GetAddrlibFormat(image_prop); const uint32_t width = static_cast(desc.width); const uint32_t height = static_cast(desc.height); static const size_t kMinNumSlice = 1; const uint32_t num_slice = static_cast( std::max(kMinNumSlice, std::max(desc.array_size, desc.depth))); uint32_t major_ver = MajorVerFromDevID(chip_id_); if (major_ver >= 9) { ADDR2_COMPUTE_SURFACE_INFO_INPUT in = {0}; in.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_INPUT); in.format = addrlib_format; in.bpp = static_cast(image_prop.element_size) * 8; in.width = width; in.height = height; in.numSlices = num_slice; in.pitchInElement = image_data_row_pitch / image_prop.element_size; switch(desc.geometry) { case HSA_EXT_IMAGE_GEOMETRY_1D: case HSA_EXT_IMAGE_GEOMETRY_1DB: in.resourceType = ADDR_RSRC_TEX_1D; break; case HSA_EXT_IMAGE_GEOMETRY_2D: case HSA_EXT_IMAGE_GEOMETRY_2DDEPTH: case HSA_EXT_IMAGE_GEOMETRY_1DA: in.resourceType = ADDR_RSRC_TEX_2D; break; case HSA_EXT_IMAGE_GEOMETRY_3D: case HSA_EXT_IMAGE_GEOMETRY_2DA: case HSA_EXT_IMAGE_GEOMETRY_2DADEPTH: in.resourceType = ADDR_RSRC_TEX_3D; break; } in.flags.texture = 1; ADDR2_GET_PREFERRED_SURF_SETTING_INPUT prefSettingsInput = { 0 }; ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT prefSettingsOutput = { 0 }; prefSettingsInput.size = sizeof(prefSettingsInput); prefSettingsInput.flags = in.flags; prefSettingsInput.bpp = in.bpp; prefSettingsInput.format = in.format; prefSettingsInput.width = in.width; prefSettingsInput.height = in.height; prefSettingsInput.numFrags = in.numFrags; prefSettingsInput.numSamples = in.numSamples; prefSettingsInput.numMipLevels = in.numMipLevels; prefSettingsInput.numSlices = in.numSlices; prefSettingsInput.resourceLoction = ADDR_RSRC_LOC_UNDEF; prefSettingsInput.resourceType = in.resourceType; // Disallow all swizzles but linear. if (tileMode == Image::TileMode::LINEAR) { prefSettingsInput.forbiddenBlock.macroThin4KB = 1; prefSettingsInput.forbiddenBlock.macroThick4KB = 1; prefSettingsInput.forbiddenBlock.macroThin64KB = 1; prefSettingsInput.forbiddenBlock.macroThick64KB = 1; } prefSettingsInput.forbiddenBlock.micro = 1; // but don't ever allow the 256b swizzle modes prefSettingsInput.forbiddenBlock.var = 1; // and don't allow variable-size block modes if (ADDR_OK != Addr2GetPreferredSurfaceSetting(addr_lib_, &prefSettingsInput, &prefSettingsOutput)) { return false; } in.swizzleMode = prefSettingsOutput.swizzleMode; ADDR2_COMPUTE_SURFACE_INFO_OUTPUT out2 = {0}; out.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_OUTPUT); if (ADDR_OK != Addr2ComputeSurfaceInfo(addr_lib_, &in, &out2)) { return false; } out.pitch = out2.pitch; out.height = out2.height; out.surfSize = out2.surfSize; out.bpp = out2.bpp; out.baseAlign = out2.baseAlign; out.tileIndex = in.swizzleMode; out.sliceSize = out2.sliceSize; return true; } ADDR_COMPUTE_SURFACE_INFO_INPUT in = {0}; in.size = sizeof(ADDR_COMPUTE_SURFACE_INFO_INPUT); in.tileMode = (tileMode == Image::TileMode::LINEAR)? ADDR_TM_LINEAR_ALIGNED : ADDR_TM_2D_TILED_THIN1; in.format = addrlib_format; in.bpp = static_cast(image_prop.element_size) * 8; in.numSamples = 1; in.width = width; in.height = height; in.numSlices = num_slice; in.flags.texture = 1; in.flags.noStencil = 1; in.flags.opt4Space = 0; in.tileType = ADDR_NON_DISPLAYABLE; in.tileIndex = -1; if (image_data_row_pitch != 0) { in.width = image_data_row_pitch / image_prop.element_size; // in.pitchAlign = image_data_row_pitch / image_prop.element_size; // in.heightAlign = image_data_slice_pitch / image_data_row_pitch; } if (ADDR_OK != AddrComputeSurfaceInfo(addr_lib_, &in, &out)) { return false; } assert(out.tileIndex != -1); return (out.tileIndex != -1) ? true : false; } size_t ImageManagerKv::CalWorkingSizeBytes(hsa_ext_image_geometry_t geometry, hsa_dim3_t size_pixel, uint32_t element_size) const { switch (geometry) { case HSA_EXT_IMAGE_GEOMETRY_1D: case HSA_EXT_IMAGE_GEOMETRY_1DB: return size_pixel.x * element_size; case HSA_EXT_IMAGE_GEOMETRY_2D: case HSA_EXT_IMAGE_GEOMETRY_2DDEPTH: case HSA_EXT_IMAGE_GEOMETRY_1DA: return size_pixel.x * size_pixel.y * element_size; default: return size_pixel.x * size_pixel.y * size_pixel.z * element_size; } } BlitQueue& ImageManagerKv::BlitQueueInit() { if (blit_queue_.queue_ == NULL) { // Queue is a precious resource, so only create it when it is needed. std::lock_guard lock(lock_); if (blit_queue_.queue_ == NULL) { // Create the kernel queue. blit_queue_.cached_index_ = 0; uint32_t max_queue_size = 0; hsa_status_t status = HSA::hsa_agent_get_info(agent_, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &max_queue_size); status = HSA::hsa_queue_create(agent_, max_queue_size, HSA_QUEUE_TYPE_MULTI, NULL, NULL, UINT_MAX, UINT_MAX, &blit_queue_.queue_); if (HSA_STATUS_SUCCESS != status) { blit_queue_.queue_ = NULL; return blit_queue_; } // Get the kernel handles. status = ImageRuntime::instance()->blit_kernel().BuildBlitCode(agent_, blit_code_catalog_); if (HSA_STATUS_SUCCESS != status) { blit_code_catalog_.clear(); HSA::hsa_queue_destroy(blit_queue_.queue_); blit_queue_.queue_ = NULL; return blit_queue_; } } } assert(blit_queue_.queue_ != NULL && blit_code_catalog_.size() == BlitKernel::KERNEL_OP_COUNT); return blit_queue_; } } // namespace image } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/image/image_manager_kv.h000077500000000000000000000150511420110115200221260ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_IMAGE_MANAGER_KV_H #define HSA_RUNTIME_EXT_IMAGE_IMAGE_MANAGER_KV_H #include "addrlib/inc/addrinterface.h" #include "blit_kernel.h" #include "image_lut_kv.h" #include "image_manager.h" namespace rocr { namespace image { class ImageManagerKv : public ImageManager { public: explicit ImageManagerKv(); virtual ~ImageManagerKv(); virtual hsa_status_t Initialize(hsa_agent_t agent_handle); virtual void Cleanup(); /// @brief Retrieve device specific image property of a certain format /// and geometry. virtual ImageProperty GetImageProperty( hsa_agent_t component, const hsa_ext_image_format_t& format, hsa_ext_image_geometry_t geometry) const; /// @brief Retrieve device specific supported max width, height, depth, /// and array size of an image geometry. virtual void GetImageInfoMaxDimension(hsa_agent_t component, hsa_ext_image_geometry_t geometry, uint32_t& width, uint32_t& height, uint32_t& depth, uint32_t& array_size) const; /// @brief Calculate the size and alignment of the backing storage of an /// image. virtual hsa_status_t CalculateImageSizeAndAlignment( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t& image_info) const; /// @brief Fill image structure with device specific image object. virtual hsa_status_t PopulateImageSrd(Image& image) const; /// @brief Fill image structure with device specific image object using the given format. virtual hsa_status_t PopulateImageSrd(Image& image, const metadata_amd_t* desc) const; /// @brief Modify device specific image object according to the specified /// new format. virtual hsa_status_t ModifyImageSrd(Image& image, hsa_ext_image_format_t& new_format) const; /// @brief Fill sampler structure with device specific sampler object. virtual hsa_status_t PopulateSamplerSrd(Sampler& sampler) const; // @brief Copy the content of a linear memory to an image object. virtual hsa_status_t CopyBufferToImage( const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, const Image& dst_image, const hsa_ext_image_region_t& image_region); /// @brief Copy the content of an image object to a linear memory. virtual hsa_status_t CopyImageToBuffer( const Image& src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t& image_region); /// @brief Transfer images backing storage using agent copy. virtual hsa_status_t CopyImage(const Image& dst_image, const Image& src_image, const hsa_dim3_t& dst_origin, const hsa_dim3_t& src_origin, const hsa_dim3_t size); /// @brief Fill image backing storage using agent copy. virtual hsa_status_t FillImage(const Image& image, const void* pattern, const hsa_ext_image_region_t& region); protected: static hsa_status_t GetLocalMemoryRegion(hsa_region_t region, void* data); static AddrFormat GetAddrlibFormat(const ImageProperty& image_prop); static VOID* ADDR_API AllocSysMem(const ADDR_ALLOCSYSMEM_INPUT* input); static ADDR_E_RETURNCODE ADDR_API FreeSysMem(const ADDR_FREESYSMEM_INPUT* input); bool GetAddrlibSurfaceInfo(hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, Image::TileMode tileMode, size_t image_data_row_pitch, size_t image_data_slice_pitch, ADDR_COMPUTE_SURFACE_INFO_OUTPUT& out) const; size_t CalWorkingSizeBytes(hsa_ext_image_geometry_t geometry, hsa_dim3_t size_pixel, uint32_t element_size) const; virtual bool IsLocalMemory(const void* address) const; BlitQueue& BlitQueueInit(); ImageLutKv image_lut_; ADDR_HANDLE addr_lib_; hsa_agent_t agent_; uint32_t family_type_; uint32_t chip_id_; BlitQueue blit_queue_; std::vector blit_code_catalog_; uint32_t mtype_; uintptr_t local_memory_base_address_; std::mutex lock_; private: DISALLOW_COPY_AND_ASSIGN(ImageManagerKv); }; } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_IMAGE_MANAGER_KV_H ROCR-Runtime-rocm-5.0.0/src/image/image_manager_nv.cpp000077500000000000000000000712411420110115200224670ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #define NOMINMAX #include "image_manager_nv.h" #include #include #include #include "inc/hsa_ext_amd.h" #include "core/inc/hsa_internal.h" #include "addrlib/src/core/addrlib.h" #include "image_runtime.h" #include "resource.h" #include "resource_nv.h" #include "util.h" #include "device_info.h" namespace rocr { namespace image { //----------------------------------------------------------------------------- // Workaround switch to combined format/type codes and missing gfx10 // specific look up table. Only covers types used in image_lut_kv.cpp. //----------------------------------------------------------------------------- struct formatconverstion_t { FMT fmt; type type; FORMAT format; }; // Format/Type to combined format code table. // Sorted and indexed to allow fast searches. static const formatconverstion_t FormatLUT[] = { {FMT_1_5_5_5, TYPE_UNORM, CFMT_1_5_5_5_UNORM}, {FMT_10_10_10_2, TYPE_UNORM, CFMT_10_10_10_2_UNORM}, {FMT_10_10_10_2, TYPE_SNORM, CFMT_10_10_10_2_SNORM}, {FMT_10_10_10_2, TYPE_UINT, CFMT_10_10_10_2_UINT}, {FMT_10_10_10_2, TYPE_SINT, CFMT_10_10_10_2_SINT}, {FMT_16, TYPE_UNORM, CFMT_16_UNORM}, {FMT_16, TYPE_SNORM, CFMT_16_SNORM}, {FMT_16, TYPE_UINT, CFMT_16_UINT}, {FMT_16, TYPE_SINT, CFMT_16_SINT}, {FMT_16, TYPE_FLOAT, CFMT_16_FLOAT}, {FMT_16_16, TYPE_UNORM, CFMT_16_16_UNORM}, {FMT_16_16, TYPE_SNORM, CFMT_16_16_SNORM}, {FMT_16_16, TYPE_UINT, CFMT_16_16_UINT}, {FMT_16_16, TYPE_SINT, CFMT_16_16_SINT}, {FMT_16_16, TYPE_FLOAT, CFMT_16_16_FLOAT}, {FMT_16_16_16_16, TYPE_UNORM, CFMT_16_16_16_16_UNORM}, {FMT_16_16_16_16, TYPE_SNORM, CFMT_16_16_16_16_SNORM}, {FMT_16_16_16_16, TYPE_UINT, CFMT_16_16_16_16_UINT}, {FMT_16_16_16_16, TYPE_SINT, CFMT_16_16_16_16_SINT}, {FMT_16_16_16_16, TYPE_FLOAT, CFMT_16_16_16_16_FLOAT}, {FMT_2_10_10_10, TYPE_UNORM, CFMT_2_10_10_10_UNORM}, {FMT_2_10_10_10, TYPE_SNORM, CFMT_2_10_10_10_SNORM}, {FMT_2_10_10_10, TYPE_UINT, CFMT_2_10_10_10_UINT}, {FMT_2_10_10_10, TYPE_SINT, CFMT_2_10_10_10_SINT}, {FMT_24_8, TYPE_UNORM, CFMT_24_8_UNORM}, {FMT_24_8, TYPE_UINT, CFMT_24_8_UINT}, {FMT_32, TYPE_UINT, CFMT_32_UINT}, {FMT_32, TYPE_SINT, CFMT_32_SINT}, {FMT_32, TYPE_FLOAT, CFMT_32_FLOAT}, {FMT_32_32, TYPE_UINT, CFMT_32_32_UINT}, {FMT_32_32, TYPE_SINT, CFMT_32_32_SINT}, {FMT_32_32, TYPE_FLOAT, CFMT_32_32_FLOAT}, {FMT_32_32_32, TYPE_UINT, CFMT_32_32_32_UINT}, {FMT_32_32_32, TYPE_SINT, CFMT_32_32_32_SINT}, {FMT_32_32_32, TYPE_FLOAT, CFMT_32_32_32_FLOAT}, {FMT_32_32_32_32, TYPE_UINT, CFMT_32_32_32_32_UINT}, {FMT_32_32_32_32, TYPE_SINT, CFMT_32_32_32_32_SINT}, {FMT_32_32_32_32, TYPE_FLOAT, CFMT_32_32_32_32_FLOAT}, {FMT_5_5_5_1, TYPE_UNORM, CFMT_5_5_5_1_UNORM}, {FMT_5_6_5, TYPE_UNORM, CFMT_5_6_5_UNORM}, {FMT_8, TYPE_UNORM, CFMT_8_UNORM}, {FMT_8, TYPE_SNORM, CFMT_8_SNORM}, {FMT_8, TYPE_UINT, CFMT_8_UINT}, {FMT_8, TYPE_SINT, CFMT_8_SINT}, {FMT_8, TYPE_SRGB, CFMT_8_SRGB}, {FMT_8_24, TYPE_UNORM, CFMT_8_24_UNORM}, {FMT_8_24, TYPE_UINT, CFMT_8_24_UINT}, {FMT_8_8, TYPE_UNORM, CFMT_8_8_UNORM}, {FMT_8_8, TYPE_SNORM, CFMT_8_8_SNORM}, {FMT_8_8, TYPE_UINT, CFMT_8_8_UINT}, {FMT_8_8, TYPE_SINT, CFMT_8_8_SINT}, {FMT_8_8, TYPE_SRGB, CFMT_8_8_SRGB}, {FMT_8_8_8_8, TYPE_UNORM, CFMT_8_8_8_8_UNORM}, {FMT_8_8_8_8, TYPE_SNORM, CFMT_8_8_8_8_SNORM}, {FMT_8_8_8_8, TYPE_UINT, CFMT_8_8_8_8_UINT}, {FMT_8_8_8_8, TYPE_SINT, CFMT_8_8_8_8_SINT}, {FMT_8_8_8_8, TYPE_SRGB, CFMT_8_8_8_8_SRGB} }; static const int FormatLUTSize = sizeof(FormatLUT)/sizeof(formatconverstion_t); //Index in FormatLUT to start search, indexed by FMT enum. static const int FormatEntryPoint[] = { 57, 40, 5, 47, 26, 10, 57, 57, 1, 20, 52, 29, 15, 32, 35, 57, 39, 0, 38, 57, 45, 24 }; static FORMAT GetCombinedFormat(uint8_t fmt, uint8_t type) { assert(fmt < sizeof(FormatEntryPoint)/sizeof(int) && "FMT out of range."); int start = FormatEntryPoint[fmt]; int stop = std::min(start + 6, FormatLUTSize); // Only 6 types are used in image_kv_lut.cpp for(int i=start; i> 3) * out.pitch; size_t slicePitch = rowPitch * out.height; if (desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB && image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR && ((image_data_row_pitch && (rowPitch != image_data_row_pitch)) || (image_data_slice_pitch && (slicePitch != image_data_slice_pitch)))) { return static_cast( HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED); } image_info.size = out.surfSize; assert(image_info.size != 0); image_info.alignment = out.baseAlign; assert(image_info.alignment != 0); return HSA_STATUS_SUCCESS; } bool ImageManagerNv::IsLocalMemory(const void* address) const { return true; } hsa_status_t ImageManagerNv::PopulateImageSrd(Image& image, const metadata_amd_t* descriptor) const { const metadata_amd_nv_t* desc = reinterpret_cast(descriptor); bool atc_access = true; const void* image_data_addr = image.data; ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); if ((image_prop.cap == HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED) || (image_prop.element_size == 0)) return (hsa_status_t)HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); if (IsLocalMemory(image.data)) { atc_access = false; image_data_addr = reinterpret_cast( reinterpret_cast(image.data) - local_memory_base_address_); } image.srd[0] = desc->word0.u32All; image.srd[1] = desc->word1.u32All; image.srd[2] = desc->word2.u32All; image.srd[3] = desc->word3.u32All; image.srd[4] = desc->word4.u32All; image.srd[5] = desc->word5.u32All; image.srd[6] = desc->word6.u32All; image.srd[7] = desc->word7.u32All; if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { SQ_BUF_RSRC_WORD0 word0; SQ_BUF_RSRC_WORD1 word1; SQ_BUF_RSRC_WORD2 word2; SQ_BUF_RSRC_WORD3 word3; word0.val = 0; word0.f.BASE_ADDRESS = PtrLow32(image_data_addr); word1.val = image.srd[1]; word1.f.BASE_ADDRESS_HI = PtrHigh32(image_data_addr); word1.f.STRIDE = image_prop.element_size; word3.val = image.srd[3]; word3.f.DST_SEL_X = swizzle.x; word3.f.DST_SEL_Y = swizzle.y; word3.f.DST_SEL_Z = swizzle.z; word3.f.DST_SEL_W = swizzle.w; word3.f.FORMAT = GetCombinedFormat(image_prop.data_format, image_prop.data_type); word3.f.INDEX_STRIDE = image_prop.element_size; image.srd[0] = word0.val; image.srd[1] = word1.val; image.srd[3] = word3.val; } else { uint32_t hwPixelSize = image_lut_.GetPixelSize(image_prop.data_format, image_prop.data_type); if (image_prop.element_size != hwPixelSize) { return (hsa_status_t)HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED; } reinterpret_cast(&image.srd[0])->bits.BASE_ADDRESS = PtrLow40Shift8(image_data_addr); reinterpret_cast(&image.srd[1])->bits.BASE_ADDRESS_HI = PtrHigh64Shift40(image_data_addr); reinterpret_cast(&image.srd[1])->bits.FORMAT = GetCombinedFormat(image_prop.data_format, image_prop.data_type); reinterpret_cast(&image.srd[3])->bits.DST_SEL_X = swizzle.x; reinterpret_cast(&image.srd[3])->bits.DST_SEL_Y = swizzle.y; reinterpret_cast(&image.srd[3])->bits.DST_SEL_Z = swizzle.z; reinterpret_cast(&image.srd[3])->bits.DST_SEL_W = swizzle.w; if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DA || image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1D) { reinterpret_cast(&image.srd[3])->bits.TYPE = image_lut_.MapGeometry(image.desc.geometry); } // Imported metadata holds the offset to metadata, add the image base address. uintptr_t meta = uintptr_t(((SQ_IMG_RSRC_WORD7*)(&image.srd[7]))->bits.META_DATA_ADDRESS_HI) << 16; meta |= uintptr_t(((SQ_IMG_RSRC_WORD6*)(&image.srd[6]))->bits.META_DATA_ADDRESS) << 8; meta += reinterpret_cast(image_data_addr); ((SQ_IMG_RSRC_WORD6*)(&image.srd[6]))->bits.META_DATA_ADDRESS = PtrLow16Shift8((void*)meta); ((SQ_IMG_RSRC_WORD7*)(&image.srd[7]))->bits.META_DATA_ADDRESS_HI = PtrHigh64Shift16((void*)meta); } // Looks like this is only used for CPU copies. image.row_pitch = 0; image.slice_pitch = 0; // Used by HSAIL shader ABI image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } static TEX_BC_SWIZZLE GetBcSwizzle(const Swizzle& swizzle) { SEL r = (SEL)swizzle.x; SEL g = (SEL)swizzle.y; SEL b = (SEL)swizzle.z; SEL a = (SEL)swizzle.w; TEX_BC_SWIZZLE bcSwizzle = TEX_BC_Swizzle_XYZW; if (a == SEL_X) { // Have to use either TEX_BC_Swizzle_WZYX or TEX_BC_Swizzle_WXYZ // // For the pre-defined border color values (white, opaque black, // transparent black), the only thing that matters is that the alpha // channel winds up in the correct place (because the RGB channels are // all the same) so either of these TEX_BC_Swizzle enumerations will // work. Not sure what happens with border color palettes. if (b == SEL_Y) { // ABGR bcSwizzle = TEX_BC_Swizzle_WZYX; } else if ((r == SEL_X) && (g == SEL_X) && (b == SEL_X)) { // RGBA bcSwizzle = TEX_BC_Swizzle_XYZW; } else { // ARGB bcSwizzle = TEX_BC_Swizzle_WXYZ; } } else if (r == SEL_X) { // Have to use either TEX_BC_Swizzle_XYZW or TEX_BC_Swizzle_XWYZ if (g == SEL_Y) { // RGBA bcSwizzle = TEX_BC_Swizzle_XYZW; } else if ((g == SEL_X) && (b == SEL_X) && (a == SEL_W)) { // RGBA bcSwizzle = TEX_BC_Swizzle_XYZW; } else { // RAGB bcSwizzle = TEX_BC_Swizzle_XWYZ; } } else if (g == SEL_X) { // GRAB, have to use TEX_BC_Swizzle_YXWZ bcSwizzle = TEX_BC_Swizzle_YXWZ; } else if (b == SEL_X) { // BGRA, have to use TEX_BC_Swizzle_ZYXW bcSwizzle = TEX_BC_Swizzle_ZYXW; } return bcSwizzle; } hsa_status_t ImageManagerNv::PopulateImageSrd(Image& image) const { ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); assert(image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); assert(image_prop.element_size != 0); bool atc_access = true; const void* image_data_addr = image.data; if (IsLocalMemory(image.data)) { atc_access = false; image_data_addr = reinterpret_cast( reinterpret_cast(image.data) - local_memory_base_address_); } if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { SQ_BUF_RSRC_WORD0 word0; SQ_BUF_RSRC_WORD1 word1; SQ_BUF_RSRC_WORD2 word2; SQ_BUF_RSRC_WORD3 word3; word0.val = 0; word0.f.BASE_ADDRESS = PtrLow32(image_data_addr); word1.val = 0; word1.f.BASE_ADDRESS_HI = PtrHigh32(image_data_addr); word1.f.STRIDE = image_prop.element_size; word1.f.SWIZZLE_ENABLE = false; word1.f.CACHE_SWIZZLE = false; word2.f.NUM_RECORDS = image.desc.width * image_prop.element_size; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); word3.val = 0; word3.f.RESOURCE_LEVEL = 1; word3.f.DST_SEL_X = swizzle.x; word3.f.DST_SEL_Y = swizzle.y; word3.f.DST_SEL_Z = swizzle.z; word3.f.DST_SEL_W = swizzle.w; word3.f.FORMAT = GetCombinedFormat(image_prop.data_format, image_prop.data_type); word3.f.INDEX_STRIDE = image_prop.element_size; word3.f.TYPE = image_lut_.MapGeometry(image.desc.geometry); image.srd[0] = word0.val; image.srd[1] = word1.val; image.srd[2] = word2.val; image.srd[3] = word3.val; image.row_pitch = image.desc.width * image_prop.element_size; image.slice_pitch = image.row_pitch; } else { SQ_IMG_RSRC_WORD0 word0; SQ_IMG_RSRC_WORD1 word1; SQ_IMG_RSRC_WORD2 word2; SQ_IMG_RSRC_WORD3 word3; SQ_IMG_RSRC_WORD4 word4; SQ_IMG_RSRC_WORD5 word5; SQ_IMG_RSRC_WORD5 word6; SQ_IMG_RSRC_WORD5 word7; ADDR2_COMPUTE_SURFACE_INFO_OUTPUT out = {0}; uint32_t swizzleMode = GetAddrlibSurfaceInfoNv( image.component, image.desc, image.tile_mode, image.row_pitch, image.slice_pitch, out); if (swizzleMode == (uint32_t)(-1)) { return HSA_STATUS_ERROR; } assert((out.bpp / 8) == image_prop.element_size); const size_t row_pitch_size = out.pitch * image_prop.element_size; word0.f.BASE_ADDRESS = PtrLow40Shift8(image_data_addr); word1.val = 0; word1.f.BASE_ADDRESS_HI = PtrHigh64Shift40(image_data_addr); word1.f.MIN_LOD = 0; word1.f.FORMAT = GetCombinedFormat(image_prop.data_format, image_prop.data_type); // Only take the lowest 2 bits of (image.desc.width - 1) word1.f.WIDTH = BitSelect<0, 1>(image.desc.width - 1); word2.val = 0; // Take the high 12 bits of (image.desc.width - 1) word2.f.WIDTH_HI = BitSelect<2, 13>(image.desc.width - 1); word2.f.HEIGHT = image.desc.height ? image.desc.height - 1 : 0; word2.f.RESOURCE_LEVEL = 1; const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); word3.val = 0; word3.f.DST_SEL_X = swizzle.x; word3.f.DST_SEL_Y = swizzle.y; word3.f.DST_SEL_Z = swizzle.z; word3.f.DST_SEL_W = swizzle.w; word3.f.SW_MODE = swizzleMode; word3.f.BC_SWIZZLE = GetBcSwizzle(swizzle); word3.f.TYPE = image_lut_.MapGeometry(image.desc.geometry); const bool image_array = (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DA || image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_2DA || image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_2DADEPTH); const bool image_3d = (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_3D); word4.val = 0; word4.f.DEPTH = (image_array) // Doesn't hurt but isn't array_size already >0? ? std::max(image.desc.array_size, static_cast(1)) - 1 : (image_3d) ? image.desc.depth - 1 : 0; uint32_t minor_ver = MinorVerFromDevID(chip_id_); // For 1d, 2d and 2d-msaa in gfx1030 and beyond this is pitch-1 if ((minor_ver >= 3) && !image_array && !image_3d) word4.f.PITCH = out.pitch - 1; word5.val = 0; word6.val = 0; word7.val = 0; image.srd[0] = word0.val; image.srd[1] = word1.val; image.srd[2] = word2.val; image.srd[3] = word3.val; image.srd[4] = word4.val; image.srd[5] = word5.val; image.srd[6] = word6.val; image.srd[7] = word7.val; image.row_pitch = row_pitch_size; image.slice_pitch = out.sliceSize; } image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } hsa_status_t ImageManagerNv::ModifyImageSrd( Image& image, hsa_ext_image_format_t& new_format) const { image.desc.format = new_format; ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); assert(image_prop.cap != HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED); assert(image_prop.element_size != 0); if (image.desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); SQ_BUF_RSRC_WORD3* word3 = reinterpret_cast(&image.srd[3]); word3->bits.DST_SEL_X = swizzle.x; word3->bits.DST_SEL_Y = swizzle.y; word3->bits.DST_SEL_Z = swizzle.z; word3->bits.DST_SEL_W = swizzle.w; word3->bits.FORMAT = GetCombinedFormat(image_prop.data_format, image_prop.data_type); } else { SQ_IMG_RSRC_WORD1* word1 = reinterpret_cast(&image.srd[1]); word1->bits.FORMAT = GetCombinedFormat(image_prop.data_format, image_prop.data_type); const Swizzle swizzle = image_lut_.MapSwizzle(image.desc.format.channel_order); SQ_IMG_RSRC_WORD3* word3 = reinterpret_cast(&image.srd[3]); word3->bits.DST_SEL_X = swizzle.x; word3->bits.DST_SEL_Y = swizzle.y; word3->bits.DST_SEL_Z = swizzle.z; word3->bits.DST_SEL_W = swizzle.w; } image.srd[8] = image.desc.format.channel_type; image.srd[9] = image.desc.format.channel_order; image.srd[10] = static_cast(image.desc.width); return HSA_STATUS_SUCCESS; } hsa_status_t ImageManagerNv::PopulateSamplerSrd(Sampler& sampler) const { const hsa_ext_sampler_descriptor_t sampler_descriptor = sampler.desc; SQ_IMG_SAMP_WORD0 word0; SQ_IMG_SAMP_WORD1 word1; SQ_IMG_SAMP_WORD2 word2; SQ_IMG_SAMP_WORD3 word3; word0.u32All = 0; switch (sampler_descriptor.address_mode) { case HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_EDGE: word0.bits.CLAMP_X = static_cast(SQ_TEX_CLAMP_LAST_TEXEL); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_BORDER: word0.bits.CLAMP_X = static_cast(SQ_TEX_CLAMP_BORDER); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_MIRRORED_REPEAT: word0.bits.CLAMP_X = static_cast(SQ_TEX_MIRROR); break; case HSA_EXT_SAMPLER_ADDRESSING_MODE_UNDEFINED: case HSA_EXT_SAMPLER_ADDRESSING_MODE_REPEAT: word0.bits.CLAMP_X = static_cast(SQ_TEX_WRAP); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } word0.bits.CLAMP_Y = word0.bits.CLAMP_X; word0.bits.CLAMP_Z = word0.bits.CLAMP_X; word0.bits.FORCE_UNNORMALIZED = (sampler_descriptor.coordinate_mode == HSA_EXT_SAMPLER_COORDINATE_MODE_UNNORMALIZED); word1.u32All = 0; word1.bits.MAX_LOD = 4095; word2.u32All = 0; switch (sampler_descriptor.filter_mode) { case HSA_EXT_SAMPLER_FILTER_MODE_NEAREST: word2.bits.XY_MAG_FILTER = static_cast(SQ_TEX_XY_FILTER_POINT); break; case HSA_EXT_SAMPLER_FILTER_MODE_LINEAR: word2.bits.XY_MAG_FILTER = static_cast(SQ_TEX_XY_FILTER_BILINEAR); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } word2.bits.XY_MIN_FILTER = word2.bits.XY_MAG_FILTER; word2.bits.Z_FILTER = SQ_TEX_Z_FILTER_NONE; word2.bits.MIP_FILTER = SQ_TEX_MIP_FILTER_NONE; word3.u32All = 0; // TODO: check this bit with HSAIL spec. word3.bits.BORDER_COLOR_TYPE = SQ_TEX_BORDER_COLOR_TRANS_BLACK; sampler.srd[0] = word0.u32All; sampler.srd[1] = word1.u32All; sampler.srd[2] = word2.u32All; sampler.srd[3] = word3.u32All; return HSA_STATUS_SUCCESS; } uint32_t ImageManagerNv::GetAddrlibSurfaceInfoNv( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, Image::TileMode tileMode, size_t image_data_row_pitch, size_t image_data_slice_pitch, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT& out) const { const ImageProperty image_prop = GetImageProperty(component, desc.format, desc.geometry); const AddrFormat addrlib_format = GetAddrlibFormat(image_prop); const uint32_t width = static_cast(desc.width); const uint32_t height = static_cast(desc.height); static const size_t kMinNumSlice = 1; const uint32_t num_slice = static_cast( std::max(kMinNumSlice, std::max(desc.array_size, desc.depth))); uint32_t minor_ver = MinorVerFromDevID(chip_id_); ADDR2_COMPUTE_SURFACE_INFO_INPUT in = {0}; in.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_INPUT); in.format = addrlib_format; in.bpp = static_cast(image_prop.element_size) * 8; in.width = width; in.height = height; in.numSlices = num_slice; // Custom Pitch is supported in gfx1030 and beyond if (minor_ver >= 3) in.pitchInElement = image_data_row_pitch / image_prop.element_size; switch (desc.geometry) { case HSA_EXT_IMAGE_GEOMETRY_1D: case HSA_EXT_IMAGE_GEOMETRY_1DB: case HSA_EXT_IMAGE_GEOMETRY_1DA: in.resourceType = ADDR_RSRC_TEX_1D; break; case HSA_EXT_IMAGE_GEOMETRY_2D: case HSA_EXT_IMAGE_GEOMETRY_2DDEPTH: case HSA_EXT_IMAGE_GEOMETRY_2DA: case HSA_EXT_IMAGE_GEOMETRY_2DADEPTH: in.resourceType = ADDR_RSRC_TEX_2D; break; case HSA_EXT_IMAGE_GEOMETRY_3D: in.resourceType = ADDR_RSRC_TEX_3D; break; } in.flags.texture = 1; ADDR2_GET_PREFERRED_SURF_SETTING_INPUT prefSettingsInput = { 0 }; ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT prefSettingsOutput = { 0 }; prefSettingsInput.size = sizeof(prefSettingsInput); prefSettingsInput.flags = in.flags; prefSettingsInput.bpp = in.bpp; prefSettingsInput.format = in.format; prefSettingsInput.width = in.width; prefSettingsInput.height = in.height; prefSettingsInput.numFrags = in.numFrags; prefSettingsInput.numSamples = in.numSamples; prefSettingsInput.numMipLevels = in.numMipLevels; prefSettingsInput.numSlices = in.numSlices; prefSettingsInput.resourceLoction = ADDR_RSRC_LOC_UNDEF; prefSettingsInput.resourceType = in.resourceType; // Disallow all swizzles but linear. if (tileMode == Image::TileMode::LINEAR) { prefSettingsInput.forbiddenBlock.macroThin4KB = 1; prefSettingsInput.forbiddenBlock.macroThick4KB = 1; prefSettingsInput.forbiddenBlock.macroThin64KB = 1; prefSettingsInput.forbiddenBlock.macroThick64KB = 1; prefSettingsInput.forbiddenBlock.micro = 1; prefSettingsInput.forbiddenBlock.var = 1; } else { // Debug setting, simplifies buffer alignment until language runtimes have official gfx10 // support. prefSettingsInput.forbiddenBlock.macroThin64KB = 1; prefSettingsInput.forbiddenBlock.macroThick64KB = 1; } // but don't ever allow the 256b swizzle modes //prefSettingsInput.forbiddenBlock.micro = 1; // and don't allow variable-size block modes //prefSettingsInput.forbiddenBlock.var = 1; if (ADDR_OK != Addr2GetPreferredSurfaceSetting(addr_lib_, &prefSettingsInput, &prefSettingsOutput)) { return (uint32_t)(-1); } in.swizzleMode = prefSettingsOutput.swizzleMode; out.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_OUTPUT); if (ADDR_OK != Addr2ComputeSurfaceInfo(addr_lib_, &in, &out)) { return (uint32_t)(-1); } if (out.surfSize == 0) { return (uint32_t)(-1); } return in.swizzleMode; } hsa_status_t ImageManagerNv::FillImage(const Image& image, const void* pattern, const hsa_ext_image_region_t& region) { if (BlitQueueInit().queue_ == NULL) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } Image* image_view = const_cast(&image); SQ_BUF_RSRC_WORD3* word3_buff = NULL; SQ_IMG_RSRC_WORD3* word3_image = NULL; uint32_t dst_sel_w_original = 0; if (image_view->desc.format.channel_type == HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010) { // Force GPU to ignore the last two bits (alpha bits). if (image_view->desc.geometry == HSA_EXT_IMAGE_GEOMETRY_1DB) { word3_buff = reinterpret_cast(&image_view->srd[3]); dst_sel_w_original = word3_buff->bits.DST_SEL_W; word3_buff->bits.DST_SEL_W = SEL_0; } else { word3_image = reinterpret_cast(&image_view->srd[3]); dst_sel_w_original = word3_image->bits.DST_SEL_W; word3_image->bits.DST_SEL_W = SEL_0; } } SQ_IMG_RSRC_WORD1* word1 = NULL; uint32_t num_format_original = 0; const void* new_pattern = pattern; float fill_value[4] = {0}; switch (image_view->desc.format.channel_order) { case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB: case HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX: case HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA: { // KV and CZ don't have write support for SRGBA image, so convert pattern // to standard form and treat the image as RGBA image. const float* pattern_f = reinterpret_cast(pattern); fill_value[0] = LinearToStandardRGB(pattern_f[0]); fill_value[1] = LinearToStandardRGB(pattern_f[1]); fill_value[2] = LinearToStandardRGB(pattern_f[2]); fill_value[3] = pattern_f[3]; new_pattern = fill_value; ImageProperty image_prop = image_lut_.MapFormat(image.desc.format, image.desc.geometry); word1 = reinterpret_cast(&image_view->srd[1]); num_format_original = word1->bits.FORMAT; word1->bits.FORMAT = GetCombinedFormat(image_prop.data_format, TYPE_UNORM); } break; default: break; } hsa_status_t status = ImageRuntime::instance()->blit_kernel().FillImage( blit_queue_, blit_code_catalog_, *image_view, new_pattern, region); // Revert back original configuration. if (word3_buff != NULL) { word3_buff->bits.DST_SEL_W = dst_sel_w_original; } if (word3_image != NULL) { word3_image->bits.DST_SEL_W = dst_sel_w_original; } if (word1 != NULL) { word1->bits.FORMAT = num_format_original; } return status; } } // namespace image } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/image/image_manager_nv.h000077500000000000000000000101031420110115200221220ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef EXT_IMAGE_IMAGE_MANAGER_NV_H_ #define EXT_IMAGE_IMAGE_MANAGER_NV_H_ #include "addrlib/inc/addrinterface.h" #include "image_manager_kv.h" namespace rocr { namespace image { class ImageManagerNv : public ImageManagerKv { public: ImageManagerNv(); virtual ~ImageManagerNv(); /// @brief Calculate the size and alignment of the backing storage of an /// image. virtual hsa_status_t CalculateImageSizeAndAlignment( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t& image_info) const; /// @brief Fill image structure with device specific image object. virtual hsa_status_t PopulateImageSrd(Image& image) const; /// @brief Fill image structure with device specific image object using the given format. virtual hsa_status_t PopulateImageSrd(Image& image, const metadata_amd_t* desc) const; /// @brief Modify device specific image object according to the specified /// new format. virtual hsa_status_t ModifyImageSrd(Image& image, hsa_ext_image_format_t& new_format) const; /// @brief Fill sampler structure with device specific sampler object. virtual hsa_status_t PopulateSamplerSrd(Sampler& sampler) const; /// @brief Fill image backing storage using agent copy. virtual hsa_status_t FillImage(const Image& image, const void* pattern, const hsa_ext_image_region_t& region); protected: uint32_t GetAddrlibSurfaceInfoNv(hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, Image::TileMode tileMode, size_t image_data_row_pitch, size_t image_data_slice_pitch, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT& out) const; bool IsLocalMemory(const void* address) const; private: DISALLOW_COPY_AND_ASSIGN(ImageManagerNv); }; } // namespace image } // namespace rocr #endif // EXT_IMAGE_IMAGE_MANAGER_NV_H_ ROCR-Runtime-rocm-5.0.0/src/image/image_runtime.cpp000077500000000000000000000444561420110115200220450ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #define NOMINMAX #include "image_runtime.h" #include #include #include #include "core/inc/hsa_internal.h" #include "core/inc/hsa_ext_amd_impl.h" #include "resource.h" #include "image_manager_kv.h" #include "image_manager_ai.h" #include "image_manager_nv.h" #include "device_info.h" namespace rocr { namespace image { std::atomic ImageRuntime::instance_(NULL); std::mutex ImageRuntime::instance_mutex_; hsa_status_t FindKernelArgPool(hsa_amd_memory_pool_t pool, void* data) { assert(data != nullptr); hsa_status_t err; hsa_amd_segment_t segment; uint32_t flag; size_t size; err = AMD::hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT, &segment); assert(err == HSA_STATUS_SUCCESS); if (segment != HSA_AMD_SEGMENT_GLOBAL) return HSA_STATUS_SUCCESS; err = AMD::hsa_amd_memory_pool_get_info( pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS, &flag); assert(err == HSA_STATUS_SUCCESS); err = AMD::hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SIZE, &size); assert(err == HSA_STATUS_SUCCESS); if (((HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_KERNARG_INIT & flag) == 1) && (size != 0)) { *(reinterpret_cast(data)) = pool; // Found the kernarg pool, stop the iteration. return HSA_STATUS_INFO_BREAK; } return HSA_STATUS_SUCCESS; } hsa_status_t ImageRuntime::CreateImageManager(hsa_agent_t agent, void* data) { ImageRuntime* runtime = reinterpret_cast(data); hsa_device_type_t hsa_device_type; hsa_status_t hsa_error_code = HSA::hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &hsa_device_type); if (hsa_error_code != HSA_STATUS_SUCCESS) { return hsa_error_code; } if (hsa_device_type == HSA_DEVICE_TYPE_GPU) { uint32_t chip_id; hsa_error_code = GetGPUAsicID(agent, &chip_id); uint32_t major_ver = MajorVerFromDevID(chip_id); ImageManager* image_manager; if (major_ver >= 10) { image_manager = new ImageManagerNv(); } else if (major_ver >= 9) { image_manager = new ImageManagerAi(); } else { image_manager = new ImageManagerKv(); } hsa_error_code = image_manager->Initialize(agent); if (hsa_error_code != HSA_STATUS_SUCCESS) { delete image_manager; return hsa_error_code; } runtime->image_managers_[agent.handle] = image_manager; } else if (hsa_device_type == HSA_DEVICE_TYPE_CPU) { uint32_t caches[4] = {0}; hsa_error_code = HSA::hsa_agent_get_info(agent, HSA_AGENT_INFO_CACHE_SIZE, caches); if (hsa_error_code != HSA_STATUS_SUCCESS) { return hsa_error_code; } runtime->cpu_l2_cache_size_ = caches[1]; if (runtime->kernarg_pool_.handle == 0) hsa_amd_agent_iterate_memory_pools(agent, FindKernelArgPool, &runtime->kernarg_pool_); } return HSA_STATUS_SUCCESS; } ImageRuntime* ImageRuntime::instance() { ImageRuntime* instance = instance_.load(std::memory_order_acquire); if (instance == NULL) { // Protect the initialization from multi threaded access. std::lock_guard lock(instance_mutex_); // Make sure we are not initializing it twice. instance = instance_.load(std::memory_order_relaxed); if (instance != NULL) { return instance; } instance = CreateSingleton(); if (instance == NULL) { return NULL; } // UnloadCallback = &ext_image::ImageRuntime::DestroySingleton; } return instance; } ImageRuntime* ImageRuntime::CreateSingleton() { ImageRuntime* instance = new ImageRuntime(); if (HSA_STATUS_SUCCESS != instance->blit_kernel_.Initialize()) { instance->Cleanup(); delete instance; return NULL; } if (HSA_STATUS_SUCCESS != HSA::hsa_iterate_agents(CreateImageManager, instance)) { instance->Cleanup(); delete instance; return NULL; } assert(instance->kernarg_pool_.handle != 0); assert(instance->image_managers_.size() != 0); instance_.store(instance, std::memory_order_release); return instance; } void ImageRuntime::DestroySingleton() { ImageRuntime* instance = instance_.load(std::memory_order_acquire); if (instance == NULL) { return; } instance->Cleanup(); instance_.store(NULL, std::memory_order_release); delete instance; } hsa_status_t ImageRuntime::GetImageInfoMaxDimension(hsa_agent_t component, hsa_agent_info_t attribute, void* value) { uint32_t* value_u32 = NULL; uint32_t* value_u32_v2 = NULL; uint32_t* value_u32_v3 = NULL; hsa_ext_image_geometry_t geometry; size_t image_attribute = static_cast(attribute); switch (image_attribute) { case HSA_EXT_AGENT_INFO_IMAGE_1D_MAX_ELEMENTS: geometry = HSA_EXT_IMAGE_GEOMETRY_1D; value_u32 = static_cast(value); break; case HSA_EXT_AGENT_INFO_IMAGE_1DA_MAX_ELEMENTS: geometry = HSA_EXT_IMAGE_GEOMETRY_1DA; value_u32 = static_cast(value); break; case HSA_EXT_AGENT_INFO_IMAGE_1DB_MAX_ELEMENTS: geometry = HSA_EXT_IMAGE_GEOMETRY_1DB; value_u32 = static_cast(value); break; case HSA_EXT_AGENT_INFO_IMAGE_2D_MAX_ELEMENTS: geometry = HSA_EXT_IMAGE_GEOMETRY_2D; value_u32_v2 = static_cast(value); break; case HSA_EXT_AGENT_INFO_IMAGE_2DA_MAX_ELEMENTS: geometry = HSA_EXT_IMAGE_GEOMETRY_2DA; value_u32_v2 = static_cast(value); break; case HSA_EXT_AGENT_INFO_IMAGE_2DDEPTH_MAX_ELEMENTS: geometry = HSA_EXT_IMAGE_GEOMETRY_2DDEPTH; value_u32_v2 = static_cast(value); break; case HSA_EXT_AGENT_INFO_IMAGE_2DADEPTH_MAX_ELEMENTS: geometry = HSA_EXT_IMAGE_GEOMETRY_2DADEPTH; value_u32_v2 = static_cast(value); break; case HSA_EXT_AGENT_INFO_IMAGE_3D_MAX_ELEMENTS: geometry = HSA_EXT_IMAGE_GEOMETRY_3D; value_u32_v3 = static_cast(value); break; case HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS: geometry = HSA_EXT_IMAGE_GEOMETRY_2DA; value_u32 = static_cast(value); break; default: return HSA_STATUS_ERROR_INVALID_ARGUMENT; } uint32_t width = 0; uint32_t height = 0; uint32_t depth = 0; uint32_t array_size = 0; hsa_device_type_t device_type; hsa_status_t status = HSA::hsa_agent_get_info(component, HSA_AGENT_INFO_DEVICE, &device_type); if (status != HSA_STATUS_SUCCESS) { return status; } // Image is only supported on a GPU device. if (device_type == HSA_DEVICE_TYPE_GPU) { image_manager(component)->GetImageInfoMaxDimension( component, geometry, width, height, depth, array_size); } if (value_u32_v3 != NULL) { value_u32_v3[0] = width; value_u32_v3[1] = height; value_u32_v3[2] = depth; } else if (value_u32_v2 != NULL) { value_u32_v2[0] = width; value_u32_v2[1] = height; } else { *value_u32 = (image_attribute == HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS) ? array_size : width; } return HSA_STATUS_SUCCESS; } hsa_status_t ImageRuntime::GetImageCapability( hsa_agent_t component, const hsa_ext_image_format_t& format, hsa_ext_image_geometry_t geometry, uint32_t& capability) { hsa_device_type_t device_type; hsa_status_t status = HSA::hsa_agent_get_info(component, HSA_AGENT_INFO_DEVICE, &device_type); if (status != HSA_STATUS_SUCCESS) { return status; } if (device_type == HSA_DEVICE_TYPE_GPU) { ImageManager* manager = image_manager(component); capability = manager->GetImageProperty(component, format, geometry).cap; } else { // Image is only supported on a GPU device. capability = 0; } return HSA_STATUS_SUCCESS; } hsa_status_t ImageRuntime::GetImageSizeAndAlignment( hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t& image_info) { image_info.alignment = 0; image_info.size = 0; // Validate the image format and geometry. uint32_t capability = 0; hsa_status_t status = GetImageCapability(component, desc.format, desc.geometry, capability); if (status != HSA_STATUS_SUCCESS) { return status; } if (capability == 0) { return static_cast( HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED); } const hsa_ext_image_geometry_t geometry = desc.geometry; uint32_t max_width = 0; uint32_t max_height = 0; uint32_t max_depth = 0; uint32_t max_array_size = 0; ImageManager* manager = image_manager(component); // Validate the image dimension. manager->GetImageInfoMaxDimension(component, geometry, max_width, max_height, max_depth, max_array_size); if (desc.width > max_width || desc.height > max_height || desc.depth > max_depth || desc.array_size > max_array_size) { return static_cast( HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED); } return manager->CalculateImageSizeAndAlignment(component, desc, image_data_layout, image_data_row_pitch, image_data_slice_pitch, image_info); } hsa_status_t ImageRuntime::CreateImageHandle( hsa_agent_t component, const hsa_ext_image_descriptor_t& image_descriptor, const void* image_data, const hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_t& image_handle) { image_handle.handle = 0; assert(image_data != NULL); // Validate image dimension. hsa_ext_image_data_info_t image_info = {0}; hsa_status_t status = GetImageSizeAndAlignment(component, image_descriptor, image_data_layout, image_data_row_pitch, image_data_slice_pitch, image_info); if (status != HSA_STATUS_SUCCESS) { return status; } // Validate image address alignment. if (!IsMultipleOf(reinterpret_cast(image_data), image_info.alignment)) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } Image* image = Image::Create(component); image->component = component; image->desc = image_descriptor; image->permission = access_permission; image->data = const_cast(image_data); image->row_pitch = image_data_row_pitch; image->slice_pitch = image_data_slice_pitch; hsa_profile_t profile; status = HSA::hsa_agent_get_info(component, HSA_AGENT_INFO_PROFILE, &profile); if (image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR) { image->tile_mode = Image::TileMode::LINEAR; } else { Image::TileMode tileMode = (profile == HSA_PROFILE_BASE && image_descriptor.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB) ? Image::TileMode::TILED : Image::TileMode::LINEAR; image->tile_mode = tileMode; } image_manager(component)->PopulateImageSrd(*image); image_handle.handle = image->Convert(); return HSA_STATUS_SUCCESS; } hsa_status_t ImageRuntime::CreateImageHandleWithLayout( hsa_agent_t component, const hsa_ext_image_descriptor_t& image_descriptor, const hsa_amd_image_descriptor_t* image_layout, const void* image_data, const hsa_access_permission_t access_permission, hsa_ext_image_t& image_handle) { if(!IsMultipleOf(image_data, 256)) return HSA_STATUS_ERROR_INVALID_ALLOCATION; if(image_layout->version!=1) return (hsa_status_t)HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED; uint32_t id; HSA::hsa_agent_get_info(component, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_CHIP_ID, &id); if(image_layout->deviceID!=(0x1002<<16|id)) return (hsa_status_t)HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED; const metadata_amd_t* desc = reinterpret_cast(image_layout); Image* image = Image::Create(component); image->component = component; image->desc = image_descriptor; image->permission = access_permission; image->data = const_cast(image_data); image->tile_mode = Image::TILED; hsa_status_t err=image_manager(component)->PopulateImageSrd(*image, desc); if(err!=HSA_STATUS_SUCCESS) { Image::Destroy(image); return err; } image_handle.handle = image->Convert(); return HSA_STATUS_SUCCESS; } hsa_status_t ImageRuntime::DestroyImageHandle( const hsa_ext_image_t& image_handle) { const Image* image = Image::Convert(image_handle.handle); if (image == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } Image::Destroy(const_cast(image)); return HSA_STATUS_SUCCESS; } hsa_status_t ImageRuntime::CopyBufferToImage( const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, const hsa_ext_image_t& dst_image_handle, const hsa_ext_image_region_t& image_region) { const Image* dst_image = Image::Convert(dst_image_handle.handle); if (dst_image == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } ImageManager* manager = image_manager(dst_image->component); return manager->CopyBufferToImage(src_memory, src_row_pitch, src_slice_pitch, *dst_image, image_region); } hsa_status_t ImageRuntime::CopyImageToBuffer( const hsa_ext_image_t& src_image_handle, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t& image_region) { const Image* src_image = Image::Convert(src_image_handle.handle); if (src_image == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } ImageManager* manager = image_manager(src_image->component); return manager->CopyImageToBuffer(*src_image, dst_memory, dst_row_pitch, dst_slice_pitch, image_region); } hsa_status_t ImageRuntime::CopyImage(const hsa_ext_image_t& src_image_handle, const hsa_ext_image_t& dst_image_handle, const hsa_dim3_t& src_origin, const hsa_dim3_t& dst_origin, const hsa_dim3_t size) { const Image* src_image = Image::Convert(src_image_handle.handle); if (src_image == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } const Image* dst_image = Image::Convert(dst_image_handle.handle); if (dst_image == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (src_image->component.handle != dst_image->component.handle) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } ImageManager* manager = image_manager(src_image->component); return manager->CopyImage(*dst_image, *src_image, dst_origin, src_origin, size); } hsa_status_t ImageRuntime::FillImage( const hsa_ext_image_t& image_handle, const void* pattern, const hsa_ext_image_region_t& image_region) { const Image* image = Image::Convert(image_handle.handle); if (image == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } ImageManager* manager = image_manager(image->component); return manager->FillImage(*image, pattern, image_region); } hsa_status_t ImageRuntime::CreateSamplerHandle( hsa_agent_t component, const hsa_ext_sampler_descriptor_t& sampler_descriptor, hsa_ext_sampler_t& sampler_handle) { sampler_handle.handle = 0; hsa_device_type_t device_type; hsa_status_t status = HSA::hsa_agent_get_info(component, HSA_AGENT_INFO_DEVICE, &device_type); if (status != HSA_STATUS_SUCCESS) { return status; } // Sampler is only supported on a GPU device. if (device_type != HSA_DEVICE_TYPE_GPU) { return HSA_STATUS_ERROR_INVALID_AGENT; } Sampler* sampler = Sampler::Create(component); if (sampler == NULL) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } sampler->component = component; sampler->desc = sampler_descriptor; image_manager(component)->PopulateSamplerSrd(*sampler); sampler_handle.handle = sampler->Convert(); return HSA_STATUS_SUCCESS; } hsa_status_t ImageRuntime::DestroySamplerHandle( hsa_ext_sampler_t& sampler_handle) { const Sampler* sampler = Sampler::Convert(sampler_handle.handle); if (sampler == NULL) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } Sampler::Destroy(sampler); return HSA_STATUS_SUCCESS; } ImageRuntime::ImageRuntime() : cpu_l2_cache_size_(0), kernarg_pool_({0}) {} ImageRuntime::~ImageRuntime() {} void ImageRuntime::Cleanup() { std::map::iterator it; for (it = image_managers_.begin(); it != image_managers_.end(); ++it) { it->second->Cleanup(); delete it->second; } blit_kernel_.Cleanup(); } } // namespace image } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/image/image_runtime.h000066400000000000000000000164371420110115200215050ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_IMAGE_RUNTIME_H #define HSA_RUNTIME_EXT_IMAGE_IMAGE_RUNTIME_H #include #include #include #include "inc/hsa.h" #include "inc/hsa_ext_image.h" #include "inc/hsa_ext_amd.h" #include "blit_kernel.h" #include "image_manager.h" #include "util.h" namespace rocr { namespace image { class ImageRuntime { public: /// @brief Getter for the ImageRuntime singleton object. static ImageRuntime* instance(); /// @brief Destroy singleton object. static void DestroySingleton(); /// @brief Retrieve maximum size of width, height, depth, array size in pixels /// for a particular geometry on a component. hsa_status_t GetImageInfoMaxDimension(hsa_agent_t component, hsa_agent_info_t attribute, void* value); /// @brief Query image support with particular format and geometry. hsa_status_t GetImageCapability(hsa_agent_t component, const hsa_ext_image_format_t& format, hsa_ext_image_geometry_t geometry, uint32_t& capability); /// @brief Query the size and address alignment of the backing storage of /// the image. hsa_status_t GetImageSizeAndAlignment(hsa_agent_t component, const hsa_ext_image_descriptor_t& desc, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t& image_info); /// @brief Create device image object and return its handle. hsa_status_t CreateImageHandle( hsa_agent_t component, const hsa_ext_image_descriptor_t& image_descriptor, const void* image_data, const hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_t& image); /// @brief Create device image object and return its handle. hsa_status_t CreateImageHandleWithLayout( hsa_agent_t component, const hsa_ext_image_descriptor_t& image_descriptor, const hsa_amd_image_descriptor_t* image_layout, const void* image_data, const hsa_access_permission_t access_permission, hsa_ext_image_t& image); /// @brief Destroy the device image object referenced by the handle. hsa_status_t DestroyImageHandle(const hsa_ext_image_t& image); /// @brief Copy the content of a linear memory to an image object. hsa_status_t CopyBufferToImage(const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, const hsa_ext_image_t& dst_image, const hsa_ext_image_region_t& image_region); /// @brief Copy the content of an image object to a linear memory. hsa_status_t CopyImageToBuffer(const hsa_ext_image_t& src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t& image_region); /// @brief Copy the content of an image object to another image object. hsa_status_t CopyImage(const hsa_ext_image_t& src_image, const hsa_ext_image_t& dst_image, const hsa_dim3_t& src_origin, const hsa_dim3_t& dst_origin, const hsa_dim3_t size); /// @brief Fill the content of an image object with a pattern. hsa_status_t FillImage(const hsa_ext_image_t& image, const void* pattern, const hsa_ext_image_region_t& image_region); /// @brief Create device sampler object and return its handle. hsa_status_t CreateSamplerHandle( hsa_agent_t component, const hsa_ext_sampler_descriptor_t& sampler_descriptor, hsa_ext_sampler_t& sampler); /// @brief Destroy the device sampler object referenced by the handle. hsa_status_t DestroySamplerHandle(hsa_ext_sampler_t& sampler); ImageManager* image_manager(hsa_agent_t agent) { std::map::iterator it = image_managers_.find(agent.handle); return (it != image_managers_.end()) ? it->second : NULL; } BlitKernel& blit_kernel() { return blit_kernel_; } size_t cpu_l2_cache_size() const { return cpu_l2_cache_size_; } hsa_amd_memory_pool_t kernarg_pool() const { return kernarg_pool_; } private: /// @brief Initialize singleton object, must be called once. static ImageRuntime* CreateSingleton(); static hsa_status_t CreateImageManager(hsa_agent_t agent, void* data); ImageRuntime(); ~ImageRuntime(); void Cleanup(); /// Pointer to singleton object. static std::atomic instance_; static std::mutex instance_mutex_; /// @brief Contains mapping of agent and its corresponding ::ImageManager /// object. std::map image_managers_; /// @brief Manages kernel for accessing images. BlitKernel blit_kernel_; size_t cpu_l2_cache_size_; hsa_amd_memory_pool_t kernarg_pool_; DISALLOW_COPY_AND_ASSIGN(ImageRuntime); }; } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_IMAGE_RUNTIME_H ROCR-Runtime-rocm-5.0.0/src/image/inc/000077500000000000000000000000001420110115200172455ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/image/inc/hsa_ext_image_impl.h000066400000000000000000000150541420110115200232410ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2020-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_H #define HSA_RUNTIME_EXT_IMAGE_H #include "inc/hsa.h" #include "inc/hsa_ext_amd.h" #include "inc/hsa_ext_image.h" #include "core/inc/hsa_ext_interface.h" //---------------------------------------------------------------------------// // APIs that implement Image functionality //---------------------------------------------------------------------------// namespace rocr { namespace image { hsa_status_t hsa_amd_image_get_info_max_dim(hsa_agent_t agent, hsa_agent_info_t attribute, void* value); hsa_status_t hsa_ext_image_get_capability(hsa_agent_t agent, hsa_ext_image_geometry_t image_geometry, const hsa_ext_image_format_t* image_format, uint32_t* capability_mask); hsa_status_t hsa_ext_image_data_get_info(hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_info_t* image_data_info); hsa_status_t hsa_ext_image_create(hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, const void* image_data, hsa_access_permission_t access_permission, hsa_ext_image_t* image); hsa_status_t hsa_ext_image_destroy(hsa_agent_t agent, hsa_ext_image_t image); hsa_status_t hsa_ext_image_copy(hsa_agent_t agent, hsa_ext_image_t src_image, const hsa_dim3_t* src_offset, hsa_ext_image_t dst_image, const hsa_dim3_t* dst_offset, const hsa_dim3_t* range); hsa_status_t hsa_ext_image_import(hsa_agent_t agent, const void* src_memory, size_t src_row_pitch, size_t src_slice_pitch, hsa_ext_image_t dst_image, const hsa_ext_image_region_t* image_region); hsa_status_t hsa_ext_image_export(hsa_agent_t agent, hsa_ext_image_t src_image, void* dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t* image_region); hsa_status_t hsa_ext_image_clear(hsa_agent_t agent, hsa_ext_image_t image, const void* data, const hsa_ext_image_region_t* image_region); hsa_status_t hsa_ext_sampler_create(hsa_agent_t agent, const hsa_ext_sampler_descriptor_t* sampler_descriptor, hsa_ext_sampler_t* sampler); hsa_status_t hsa_ext_sampler_destroy(hsa_agent_t agent, hsa_ext_sampler_t sampler); hsa_status_t hsa_ext_image_get_capability_with_layout(hsa_agent_t agent, hsa_ext_image_geometry_t image_geometry, const hsa_ext_image_format_t* image_format, hsa_ext_image_data_layout_t image_data_layout, uint32_t* capability_mask); hsa_status_t hsa_ext_image_data_get_info_with_layout( hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t* image_data_info); hsa_status_t hsa_ext_image_create_with_layout( hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, const void* image_data, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_t* image); hsa_status_t hsa_amd_image_create(hsa_agent_t agent, const hsa_ext_image_descriptor_t* image_descriptor, const hsa_amd_image_descriptor_t* image_layout, const void* image_data, hsa_access_permission_t access_permission, hsa_ext_image_t* image); // Update Api table with func pointers that implement functionality void LoadImage(core::ImageExtTableInternal* image_api, decltype(::hsa_amd_image_create)** interface_api); // Release resources acquired by Image implementation void ReleaseImageRsrcs(); } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_H ROCR-Runtime-rocm-5.0.0/src/image/resource.h000066400000000000000000000141001420110115200204700ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_RESOURCE_H #define HSA_RUNTIME_EXT_IMAGE_RESOURCE_H #include #include #include "inc/hsa.h" #include "inc/hsa_ext_image.h" #include "util.h" #define HSA_IMAGE_OBJECT_SIZE_DWORD 12 #define HSA_IMAGE_OBJECT_ALIGNMENT 16 #define HSA_SAMPLER_OBJECT_SIZE_DWORD 8 #define HSA_SAMPLER_OBJECT_ALIGNMENT 16 #define GEOMETRY_COUNT 8 #define ORDER_COUNT 20 #define TYPE_COUNT 16 #define RO HSA_EXT_IMAGE_CAPABILITY_READ_ONLY #define ROWO \ (HSA_EXT_IMAGE_CAPABILITY_READ_ONLY | HSA_EXT_IMAGE_CAPABILITY_WRITE_ONLY) #define RW \ (HSA_EXT_IMAGE_CAPABILITY_READ_ONLY | HSA_EXT_IMAGE_CAPABILITY_WRITE_ONLY | \ HSA_EXT_IMAGE_CAPABILITY_READ_WRITE) namespace rocr { namespace image { typedef struct metadata_amd_s { uint32_t version; // Must be 1 uint32_t vendorID; // AMD | CZ uint32_t words[8]; uint32_t mip_offsets[0]; //Mip level offset bits [39:8] for each level (if any) } metadata_amd_t; /// @brief Structure to represent image access component. typedef struct Swizzle { uint8_t x; uint8_t y; uint8_t z; uint8_t w; } Swizzle; /// @brief Structure to contain the property of an image with a particular /// format and geometry. typedef struct ImageProperty { uint8_t cap; // hsa_ext_image_format_capability_t mask. uint8_t element_size; // size per pixel in bytes. uint8_t data_format; // device specific channel ordering. uint8_t data_type; // device specific channel type. } ImageProperty; /// @brief Structure to represent an HSA image object. typedef struct Image { private: Image() { component.handle = 0; permission = HSA_ACCESS_PERMISSION_RO; data = NULL; std::memset(srd, 0, sizeof(srd)); std::memset(&desc, 0, sizeof(desc)); row_pitch = slice_pitch = 0; tile_mode = LINEAR; } ~Image() {} public: typedef enum TileMode { LINEAR, TILED } TileMode; /// @brief Create an Image. static Image* Create(hsa_agent_t agent); /// @brief Destroy an Image. static void Destroy(const Image* image); /// @brief Convert from vendor representation to HSA handle. uint64_t Convert() const { return reinterpret_cast(srd); } /// @brief Convert from HSA handle to vendor representation. static Image* Convert(uint64_t handle) { return reinterpret_cast(handle - offsetof(Image, srd)); } // Vendor specific image object. __ALIGNED__( HSA_IMAGE_OBJECT_ALIGNMENT) uint32_t srd[HSA_IMAGE_OBJECT_SIZE_DWORD]; // HSA component of the image object. hsa_agent_t component; // HSA image descriptor of the image object. hsa_ext_image_descriptor_t desc; // HSA image access permission of the image object. hsa_access_permission_t permission; // Backing storage of the image object. void* data; // Device specific row pitch of the image object in size. size_t row_pitch; // Device specific slice pitch of the image object in size. size_t slice_pitch; // Device specific tile mode TileMode tile_mode; } Image; /// @brief Structure to represent an HSA sampler object. typedef struct Sampler { private: Sampler() { component.handle = 0; std::memset(srd, 0, sizeof(srd)); std::memset(&desc, 0, sizeof(desc)); } ~Sampler() {} public: /// @brief Create a Sampler. static Sampler* Create(hsa_agent_t agent); /// @brief Destroy a Sampler. static void Destroy(const Sampler* sampler); /// @brief Convert from vendor representation to HSA handle. uint64_t Convert() { return reinterpret_cast(srd); } /// @brief Convert from HSA handle to vendor representation. static Sampler* Convert(uint64_t handle) { return reinterpret_cast(handle - offsetof(Sampler, srd)); } // Vendor specific sampler object. __ALIGNED__(HSA_SAMPLER_OBJECT_ALIGNMENT) uint32_t srd[HSA_SAMPLER_OBJECT_SIZE_DWORD]; // HSA component of the sampler object. hsa_agent_t component; // HSA sampler descriptor of the image object. hsa_ext_sampler_descriptor_t desc; } Sampler; } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_RESOURCE_H ROCR-Runtime-rocm-5.0.0/src/image/resource_ai.h000066400000000000000000001444701420110115200211570ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_RESOURCE_AI_H #define HSA_RUNTIME_EXT_IMAGE_RESOURCE_AI_H #if defined(LITTLEENDIAN_CPU) #elif defined(BIGENDIAN_CPU) #else #error "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined" #endif namespace rocr { namespace image { union SQ_BUF_RSRC_WORD0 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS : 32; #elif defined(BIGENDIAN_CPU) unsigned int BASE_ADDRESS : 32; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_BUF_RSRC_WORD1 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS_HI : 16; unsigned int STRIDE : 14; unsigned int CACHE_SWIZZLE : 1; unsigned int SWIZZLE_ENABLE : 1; #elif defined(BIGENDIAN_CPU) unsigned int SWIZZLE_ENABLE : 1; unsigned int CACHE_SWIZZLE : 1; unsigned int STRIDE : 14; unsigned int BASE_ADDRESS_HI : 16; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_BUF_RSRC_WORD2 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int NUM_RECORDS : 32; #elif defined(BIGENDIAN_CPU) unsigned int NUM_RECORDS : 32; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_BUF_RSRC_WORD3 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int DST_SEL_X : 3; unsigned int DST_SEL_Y : 3; unsigned int DST_SEL_Z : 3; unsigned int DST_SEL_W : 3; unsigned int NUM_FORMAT : 3; unsigned int DATA_FORMAT : 4; unsigned int USER_VM_ENABLE : 1; unsigned int USER_VM_MODE : 1; unsigned int INDEX_STRIDE : 2; unsigned int ADD_TID_ENABLE : 1; unsigned int : 3; unsigned int NV : 1; unsigned int : 2; unsigned int TYPE : 2; #elif defined(BIGENDIAN_CPU) unsigned int TYPE : 2; unsigned int : 2; unsigned int NV : 1; unsigned int : 3; unsigned int ADD_TID_ENABLE : 1; unsigned int INDEX_STRIDE : 2; unsigned int USER_VM_MODE : 1; unsigned int USER_VM_ENABLE : 1; unsigned int DATA_FORMAT : 4; unsigned int NUM_FORMAT : 3; unsigned int DST_SEL_W : 3; unsigned int DST_SEL_Z : 3; unsigned int DST_SEL_Y : 3; unsigned int DST_SEL_X : 3; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_RSRC_WORD0 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS : 32; #elif defined(BIGENDIAN_CPU) unsigned int BASE_ADDRESS : 32; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_RSRC_WORD1 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS_HI : 8; unsigned int MIN_LOD : 12; unsigned int DATA_FORMAT : 6; unsigned int NUM_FORMAT : 4; unsigned int NV : 1; unsigned int META_DIRECT : 1; #elif defined(BIGENDIAN_CPU) unsigned int META_DIRECT : 1; unsigned int NV : 1; unsigned int NUM_FORMAT : 4; unsigned int DATA_FORMAT : 6; unsigned int MIN_LOD : 12; unsigned int BASE_ADDRESS_HI : 8; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_RSRC_WORD2 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int WIDTH : 14; unsigned int HEIGHT : 14; unsigned int PERF_MOD : 3; unsigned int : 1; #elif defined(BIGENDIAN_CPU) unsigned int : 1; unsigned int PERF_MOD : 3; unsigned int HEIGHT : 14; unsigned int WIDTH : 14; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_RSRC_WORD3 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int DST_SEL_X : 3; unsigned int DST_SEL_Y : 3; unsigned int DST_SEL_Z : 3; unsigned int DST_SEL_W : 3; unsigned int BASE_LEVEL : 4; unsigned int LAST_LEVEL : 4; unsigned int SW_MODE : 5; unsigned int : 3; unsigned int TYPE : 4; #elif defined(BIGENDIAN_CPU) unsigned int TYPE : 4; unsigned int : 3; unsigned int SW_MODE : 5; unsigned int LAST_LEVEL : 4; unsigned int BASE_LEVEL : 4; unsigned int DST_SEL_W : 3; unsigned int DST_SEL_Z : 3; unsigned int DST_SEL_Y : 3; unsigned int DST_SEL_X : 3; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_RSRC_WORD4 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int DEPTH : 13; unsigned int PITCH : 16; unsigned int BC_SWIZZLE : 3; #elif defined(BIGENDIAN_CPU) unsigned int BC_SWIZZLE : 3; unsigned int PITCH : 16; unsigned int DEPTH : 13; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_RSRC_WORD5 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ARRAY : 13; unsigned int ARRAY_PITCH : 4; unsigned int META_DATA_ADDRESS_HI : 8; unsigned int META_LINEAR : 1; unsigned int META_PIPE_ALIGNED : 1; unsigned int META_RB_ALIGNED : 1; unsigned int MAX_MIP : 4; #elif defined(BIGENDIAN_CPU) unsigned int MAX_MIP : 4; unsigned int META_RB_ALIGNED : 1; unsigned int META_PIPE_ALIGNED : 1; unsigned int META_LINEAR : 1; unsigned int META_DATA_ADDRESS_HI : 8; unsigned int ARRAY_PITCH : 4; unsigned int BASE_ARRAY : 13; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_RSRC_WORD6 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int MIN_LOD_WARN : 12; unsigned int COUNTER_BANK_ID : 8; unsigned int LOD_HDW_CNT_EN : 1; unsigned int COMPRESSION_EN : 1; unsigned int ALPHA_IS_ON_MSB : 1; unsigned int COLOR_TRANSFORM : 1; unsigned int LOST_ALPHA_BITS : 4; unsigned int LOST_COLOR_BITS : 4; #elif defined(BIGENDIAN_CPU) unsigned int LOST_COLOR_BITS : 4; unsigned int LOST_ALPHA_BITS : 4; unsigned int COLOR_TRANSFORM : 1; unsigned int ALPHA_IS_ON_MSB : 1; unsigned int COMPRESSION_EN : 1; unsigned int LOD_HDW_CNT_EN : 1; unsigned int COUNTER_BANK_ID : 8; unsigned int MIN_LOD_WARN : 12; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_RSRC_WORD7 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int META_DATA_ADDRESS : 32; #elif defined(BIGENDIAN_CPU) unsigned int META_DATA_ADDRESS : 32; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_SAMP_WORD0 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int CLAMP_X : 3; unsigned int CLAMP_Y : 3; unsigned int CLAMP_Z : 3; unsigned int MAX_ANISO_RATIO : 3; unsigned int DEPTH_COMPARE_FUNC : 3; unsigned int FORCE_UNNORMALIZED : 1; unsigned int ANISO_THRESHOLD : 3; unsigned int MC_COORD_TRUNC : 1; unsigned int FORCE_DEGAMMA : 1; unsigned int ANISO_BIAS : 6; unsigned int TRUNC_COORD : 1; unsigned int DISABLE_CUBE_WRAP : 1; unsigned int FILTER_MODE : 2; unsigned int COMPAT_MODE : 1; #elif defined(BIGENDIAN_CPU) unsigned int COMPAT_MODE : 1; unsigned int FILTER_MODE : 2; unsigned int DISABLE_CUBE_WRAP : 1; unsigned int TRUNC_COORD : 1; unsigned int ANISO_BIAS : 6; unsigned int FORCE_DEGAMMA : 1; unsigned int MC_COORD_TRUNC : 1; unsigned int ANISO_THRESHOLD : 3; unsigned int FORCE_UNNORMALIZED : 1; unsigned int DEPTH_COMPARE_FUNC : 3; unsigned int MAX_ANISO_RATIO : 3; unsigned int CLAMP_Z : 3; unsigned int CLAMP_Y : 3; unsigned int CLAMP_X : 3; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_SAMP_WORD1 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int MIN_LOD : 12; unsigned int MAX_LOD : 12; unsigned int PERF_MIP : 4; unsigned int PERF_Z : 4; #elif defined(BIGENDIAN_CPU) unsigned int PERF_Z : 4; unsigned int PERF_MIP : 4; unsigned int MAX_LOD : 12; unsigned int MIN_LOD : 12; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_SAMP_WORD2 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int LOD_BIAS : 14; unsigned int LOD_BIAS_SEC : 6; unsigned int XY_MAG_FILTER : 2; unsigned int XY_MIN_FILTER : 2; unsigned int Z_FILTER : 2; unsigned int MIP_FILTER : 2; unsigned int MIP_POINT_PRECLAMP : 1; unsigned int BLEND_ZERO_PRT : 1; unsigned int FILTER_PREC_FIX : 1; unsigned int ANISO_OVERRIDE : 1; #elif defined(BIGENDIAN_CPU) unsigned int ANISO_OVERRIDE : 1; unsigned int FILTER_PREC_FIX : 1; unsigned int BLEND_ZERO_PRT : 1; unsigned int MIP_POINT_PRECLAMP : 1; unsigned int MIP_FILTER : 2; unsigned int Z_FILTER : 2; unsigned int XY_MIN_FILTER : 2; unsigned int XY_MAG_FILTER : 2; unsigned int LOD_BIAS_SEC : 6; unsigned int LOD_BIAS : 14; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; union SQ_IMG_SAMP_WORD3 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int BORDER_COLOR_PTR : 12; unsigned int SKIP_DEGAMMA : 1; unsigned int : 17; unsigned int BORDER_COLOR_TYPE : 2; #elif defined(BIGENDIAN_CPU) unsigned int BORDER_COLOR_TYPE : 2; unsigned int : 17; unsigned int SKIP_DEGAMMA : 1; unsigned int BORDER_COLOR_PTR : 12; #endif } bitfields, bits; unsigned int u32All; signed int i32All; float f32All; }; #define SQ_BUF_RSRC_WORD0_REG_SIZE 32 #define SQ_BUF_RSRC_WORD0_BASE_ADDRESS_SIZE 32 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_buf_rsrc_word0_t { unsigned int base_address : SQ_BUF_RSRC_WORD0_BASE_ADDRESS_SIZE; } sq_buf_rsrc_word0_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_buf_rsrc_word0_t { unsigned int base_address : SQ_BUF_RSRC_WORD0_BASE_ADDRESS_SIZE; } sq_buf_rsrc_word0_t; #endif typedef union { unsigned int val : 32; sq_buf_rsrc_word0_t f; } sq_buf_rsrc_word0_u; #define SQ_BUF_RSRC_WORD1_REG_SIZE 32 #define SQ_BUF_RSRC_WORD1_BASE_ADDRESS_HI_SIZE 16 #define SQ_BUF_RSRC_WORD1_STRIDE_SIZE 14 #define SQ_BUF_RSRC_WORD1_CACHE_SWIZZLE_SIZE 1 #define SQ_BUF_RSRC_WORD1_SWIZZLE_ENABLE_SIZE 1 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_buf_rsrc_word1_t { unsigned int base_address_hi : SQ_BUF_RSRC_WORD1_BASE_ADDRESS_HI_SIZE; unsigned int stride : SQ_BUF_RSRC_WORD1_STRIDE_SIZE; unsigned int cache_swizzle : SQ_BUF_RSRC_WORD1_CACHE_SWIZZLE_SIZE; unsigned int swizzle_enable : SQ_BUF_RSRC_WORD1_SWIZZLE_ENABLE_SIZE; } sq_buf_rsrc_word1_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_buf_rsrc_word1_t { unsigned int swizzle_enable : SQ_BUF_RSRC_WORD1_SWIZZLE_ENABLE_SIZE; unsigned int cache_swizzle : SQ_BUF_RSRC_WORD1_CACHE_SWIZZLE_SIZE; unsigned int stride : SQ_BUF_RSRC_WORD1_STRIDE_SIZE; unsigned int base_address_hi : SQ_BUF_RSRC_WORD1_BASE_ADDRESS_HI_SIZE; } sq_buf_rsrc_word1_t; #endif typedef union { unsigned int val : 32; sq_buf_rsrc_word1_t f; } sq_buf_rsrc_word1_u; #define SQ_BUF_RSRC_WORD2_REG_SIZE 32 #define SQ_BUF_RSRC_WORD2_NUM_RECORDS_SIZE 32 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_buf_rsrc_word2_t { unsigned int num_records : SQ_BUF_RSRC_WORD2_NUM_RECORDS_SIZE; } sq_buf_rsrc_word2_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_buf_rsrc_word2_t { unsigned int num_records : SQ_BUF_RSRC_WORD2_NUM_RECORDS_SIZE; } sq_buf_rsrc_word2_t; #endif typedef union { unsigned int val : 32; sq_buf_rsrc_word2_t f; } sq_buf_rsrc_word2_u; #define SQ_BUF_RSRC_WORD3_REG_SIZE 32 #define SQ_BUF_RSRC_WORD3_DST_SEL_X_SIZE 3 #define SQ_BUF_RSRC_WORD3_DST_SEL_Y_SIZE 3 #define SQ_BUF_RSRC_WORD3_DST_SEL_Z_SIZE 3 #define SQ_BUF_RSRC_WORD3_DST_SEL_W_SIZE 3 #define SQ_BUF_RSRC_WORD3_NUM_FORMAT_SIZE 3 #define SQ_BUF_RSRC_WORD3_DATA_FORMAT_SIZE 4 #define SQ_BUF_RSRC_WORD3_USER_VM_ENABLE_SIZE 1 #define SQ_BUF_RSRC_WORD3_USER_VM_MODE_SIZE 1 #define SQ_BUF_RSRC_WORD3_INDEX_STRIDE_SIZE 2 #define SQ_BUF_RSRC_WORD3_ADD_TID_ENABLE_SIZE 1 #define SQ_BUF_RSRC_WORD3_NV_SIZE 1 #define SQ_BUF_RSRC_WORD3_TYPE_SIZE 2 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_buf_rsrc_word3_t { unsigned int dst_sel_x : SQ_BUF_RSRC_WORD3_DST_SEL_X_SIZE; unsigned int dst_sel_y : SQ_BUF_RSRC_WORD3_DST_SEL_Y_SIZE; unsigned int dst_sel_z : SQ_BUF_RSRC_WORD3_DST_SEL_Z_SIZE; unsigned int dst_sel_w : SQ_BUF_RSRC_WORD3_DST_SEL_W_SIZE; unsigned int num_format : SQ_BUF_RSRC_WORD3_NUM_FORMAT_SIZE; unsigned int data_format : SQ_BUF_RSRC_WORD3_DATA_FORMAT_SIZE; unsigned int user_vm_enable : SQ_BUF_RSRC_WORD3_USER_VM_ENABLE_SIZE; unsigned int user_vm_mode : SQ_BUF_RSRC_WORD3_USER_VM_MODE_SIZE; unsigned int index_stride : SQ_BUF_RSRC_WORD3_INDEX_STRIDE_SIZE; unsigned int add_tid_enable : SQ_BUF_RSRC_WORD3_ADD_TID_ENABLE_SIZE; unsigned int : 3; unsigned int nv : SQ_BUF_RSRC_WORD3_NV_SIZE; unsigned int : 2; unsigned int type : SQ_BUF_RSRC_WORD3_TYPE_SIZE; } sq_buf_rsrc_word3_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_buf_rsrc_word3_t { unsigned int type : SQ_BUF_RSRC_WORD3_TYPE_SIZE; unsigned int : 2; unsigned int nv : SQ_BUF_RSRC_WORD3_NV_SIZE; unsigned int : 3; unsigned int add_tid_enable : SQ_BUF_RSRC_WORD3_ADD_TID_ENABLE_SIZE; unsigned int index_stride : SQ_BUF_RSRC_WORD3_INDEX_STRIDE_SIZE; unsigned int user_vm_mode : SQ_BUF_RSRC_WORD3_USER_VM_MODE_SIZE; unsigned int user_vm_enable : SQ_BUF_RSRC_WORD3_USER_VM_ENABLE_SIZE; unsigned int data_format : SQ_BUF_RSRC_WORD3_DATA_FORMAT_SIZE; unsigned int num_format : SQ_BUF_RSRC_WORD3_NUM_FORMAT_SIZE; unsigned int dst_sel_w : SQ_BUF_RSRC_WORD3_DST_SEL_W_SIZE; unsigned int dst_sel_z : SQ_BUF_RSRC_WORD3_DST_SEL_Z_SIZE; unsigned int dst_sel_y : SQ_BUF_RSRC_WORD3_DST_SEL_Y_SIZE; unsigned int dst_sel_x : SQ_BUF_RSRC_WORD3_DST_SEL_X_SIZE; } sq_buf_rsrc_word3_t; #endif typedef union { unsigned int val : 32; sq_buf_rsrc_word3_t f; } sq_buf_rsrc_word3_u; #define SQ_IMG_RSRC_WORD0_REG_SIZE 32 #define SQ_IMG_RSRC_WORD0_BASE_ADDRESS_SIZE 32 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_rsrc_word0_t { unsigned int base_address : SQ_IMG_RSRC_WORD0_BASE_ADDRESS_SIZE; } sq_img_rsrc_word0_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_rsrc_word0_t { unsigned int base_address : SQ_IMG_RSRC_WORD0_BASE_ADDRESS_SIZE; } sq_img_rsrc_word0_t; #endif typedef union { unsigned int val : 32; sq_img_rsrc_word0_t f; } sq_img_rsrc_word0_u; #define SQ_IMG_RSRC_WORD1_REG_SIZE 32 #define SQ_IMG_RSRC_WORD1_BASE_ADDRESS_HI_SIZE 8 #define SQ_IMG_RSRC_WORD1_MIN_LOD_SIZE 12 #define SQ_IMG_RSRC_WORD1_DATA_FORMAT_SIZE 6 #define SQ_IMG_RSRC_WORD1_NUM_FORMAT_SIZE 4 #define SQ_IMG_RSRC_WORD1_NV_SIZE 1 #define SQ_IMG_RSRC_WORD1_META_DIRECT_SIZE 1 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_rsrc_word1_t { unsigned int base_address_hi : SQ_IMG_RSRC_WORD1_BASE_ADDRESS_HI_SIZE; unsigned int min_lod : SQ_IMG_RSRC_WORD1_MIN_LOD_SIZE; unsigned int data_format : SQ_IMG_RSRC_WORD1_DATA_FORMAT_SIZE; unsigned int num_format : SQ_IMG_RSRC_WORD1_NUM_FORMAT_SIZE; unsigned int nv : SQ_IMG_RSRC_WORD1_NV_SIZE; unsigned int meta_direct : SQ_IMG_RSRC_WORD1_META_DIRECT_SIZE; } sq_img_rsrc_word1_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_rsrc_word1_t { unsigned int meta_direct : SQ_IMG_RSRC_WORD1_META_DIRECT_SIZE; unsigned int nv : SQ_IMG_RSRC_WORD1_NV_SIZE; unsigned int num_format : SQ_IMG_RSRC_WORD1_NUM_FORMAT_SIZE; unsigned int data_format : SQ_IMG_RSRC_WORD1_DATA_FORMAT_SIZE; unsigned int min_lod : SQ_IMG_RSRC_WORD1_MIN_LOD_SIZE; unsigned int base_address_hi : SQ_IMG_RSRC_WORD1_BASE_ADDRESS_HI_SIZE; } sq_img_rsrc_word1_t; #endif typedef union { unsigned int val : 32; sq_img_rsrc_word1_t f; } sq_img_rsrc_word1_u; #define SQ_IMG_RSRC_WORD2_REG_SIZE 32 #define SQ_IMG_RSRC_WORD2_WIDTH_SIZE 14 #define SQ_IMG_RSRC_WORD2_HEIGHT_SIZE 14 #define SQ_IMG_RSRC_WORD2_PERF_MOD_SIZE 3 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_rsrc_word2_t { unsigned int width : SQ_IMG_RSRC_WORD2_WIDTH_SIZE; unsigned int height : SQ_IMG_RSRC_WORD2_HEIGHT_SIZE; unsigned int perf_mod : SQ_IMG_RSRC_WORD2_PERF_MOD_SIZE; unsigned int : 1; } sq_img_rsrc_word2_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_rsrc_word2_t { unsigned int : 1; unsigned int perf_mod : SQ_IMG_RSRC_WORD2_PERF_MOD_SIZE; unsigned int height : SQ_IMG_RSRC_WORD2_HEIGHT_SIZE; unsigned int width : SQ_IMG_RSRC_WORD2_WIDTH_SIZE; } sq_img_rsrc_word2_t; #endif typedef union { unsigned int val : 32; sq_img_rsrc_word2_t f; } sq_img_rsrc_word2_u; #define SQ_IMG_RSRC_WORD3_REG_SIZE 32 #define SQ_IMG_RSRC_WORD3_DST_SEL_X_SIZE 3 #define SQ_IMG_RSRC_WORD3_DST_SEL_Y_SIZE 3 #define SQ_IMG_RSRC_WORD3_DST_SEL_Z_SIZE 3 #define SQ_IMG_RSRC_WORD3_DST_SEL_W_SIZE 3 #define SQ_IMG_RSRC_WORD3_BASE_LEVEL_SIZE 4 #define SQ_IMG_RSRC_WORD3_LAST_LEVEL_SIZE 4 #define SQ_IMG_RSRC_WORD3_SW_MODE_SIZE 5 #define SQ_IMG_RSRC_WORD3_TYPE_SIZE 4 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_rsrc_word3_t { unsigned int dst_sel_x : SQ_IMG_RSRC_WORD3_DST_SEL_X_SIZE; unsigned int dst_sel_y : SQ_IMG_RSRC_WORD3_DST_SEL_Y_SIZE; unsigned int dst_sel_z : SQ_IMG_RSRC_WORD3_DST_SEL_Z_SIZE; unsigned int dst_sel_w : SQ_IMG_RSRC_WORD3_DST_SEL_W_SIZE; unsigned int base_level : SQ_IMG_RSRC_WORD3_BASE_LEVEL_SIZE; unsigned int last_level : SQ_IMG_RSRC_WORD3_LAST_LEVEL_SIZE; unsigned int sw_mode : SQ_IMG_RSRC_WORD3_SW_MODE_SIZE; unsigned int : 3; unsigned int type : SQ_IMG_RSRC_WORD3_TYPE_SIZE; } sq_img_rsrc_word3_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_rsrc_word3_t { unsigned int type : SQ_IMG_RSRC_WORD3_TYPE_SIZE; unsigned int : 3; unsigned int sw_mode : SQ_IMG_RSRC_WORD3_SW_MODE_SIZE; unsigned int last_level : SQ_IMG_RSRC_WORD3_LAST_LEVEL_SIZE; unsigned int base_level : SQ_IMG_RSRC_WORD3_BASE_LEVEL_SIZE; unsigned int dst_sel_w : SQ_IMG_RSRC_WORD3_DST_SEL_W_SIZE; unsigned int dst_sel_z : SQ_IMG_RSRC_WORD3_DST_SEL_Z_SIZE; unsigned int dst_sel_y : SQ_IMG_RSRC_WORD3_DST_SEL_Y_SIZE; unsigned int dst_sel_x : SQ_IMG_RSRC_WORD3_DST_SEL_X_SIZE; } sq_img_rsrc_word3_t; #endif typedef union { unsigned int val : 32; sq_img_rsrc_word3_t f; } sq_img_rsrc_word3_u; #define SQ_IMG_RSRC_WORD4_REG_SIZE 32 #define SQ_IMG_RSRC_WORD4_DEPTH_SIZE 13 #define SQ_IMG_RSRC_WORD4_PITCH_SIZE 16 #define SQ_IMG_RSRC_WORD4_BC_SWIZZLE_SIZE 3 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_rsrc_word4_t { unsigned int depth : SQ_IMG_RSRC_WORD4_DEPTH_SIZE; unsigned int pitch : SQ_IMG_RSRC_WORD4_PITCH_SIZE; unsigned int bc_swizzle : SQ_IMG_RSRC_WORD4_BC_SWIZZLE_SIZE; } sq_img_rsrc_word4_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_rsrc_word4_t { unsigned int bc_swizzle : SQ_IMG_RSRC_WORD4_BC_SWIZZLE_SIZE; unsigned int pitch : SQ_IMG_RSRC_WORD4_PITCH_SIZE; unsigned int depth : SQ_IMG_RSRC_WORD4_DEPTH_SIZE; } sq_img_rsrc_word4_t; #endif typedef union { unsigned int val : 32; sq_img_rsrc_word4_t f; } sq_img_rsrc_word4_u; #define SQ_IMG_RSRC_WORD5_REG_SIZE 32 #define SQ_IMG_RSRC_WORD5_BASE_ARRAY_SIZE 13 #define SQ_IMG_RSRC_WORD5_ARRAY_PITCH_SIZE 4 #define SQ_IMG_RSRC_WORD5_META_DATA_ADDRESS_SIZE 8 #define SQ_IMG_RSRC_WORD5_META_LINEAR_SIZE 1 #define SQ_IMG_RSRC_WORD5_META_PIPE_ALIGNED_SIZE 1 #define SQ_IMG_RSRC_WORD5_META_RB_ALIGNED_SIZE 1 #define SQ_IMG_RSRC_WORD5_MAX_MIP_SIZE 4 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_rsrc_word5_t { unsigned int base_array : SQ_IMG_RSRC_WORD5_BASE_ARRAY_SIZE; unsigned int array_pitch : SQ_IMG_RSRC_WORD5_ARRAY_PITCH_SIZE; unsigned int meta_data_address : SQ_IMG_RSRC_WORD5_META_DATA_ADDRESS_SIZE; unsigned int meta_linear : SQ_IMG_RSRC_WORD5_META_LINEAR_SIZE; unsigned int meta_pipe_aligned : SQ_IMG_RSRC_WORD5_META_PIPE_ALIGNED_SIZE; unsigned int meta_rb_aligned : SQ_IMG_RSRC_WORD5_META_RB_ALIGNED_SIZE; unsigned int max_mip : SQ_IMG_RSRC_WORD5_MAX_MIP_SIZE; } sq_img_rsrc_word5_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_rsrc_word5_t { unsigned int max_mip : SQ_IMG_RSRC_WORD5_MAX_MIP_SIZE; unsigned int meta_rb_aligned : SQ_IMG_RSRC_WORD5_META_RB_ALIGNED_SIZE; unsigned int meta_pipe_aligned : SQ_IMG_RSRC_WORD5_META_PIPE_ALIGNED_SIZE; unsigned int meta_linear : SQ_IMG_RSRC_WORD5_META_LINEAR_SIZE; unsigned int meta_data_address : SQ_IMG_RSRC_WORD5_META_DATA_ADDRESS_SIZE; unsigned int array_pitch : SQ_IMG_RSRC_WORD5_ARRAY_PITCH_SIZE; unsigned int base_array : SQ_IMG_RSRC_WORD5_BASE_ARRAY_SIZE; } sq_img_rsrc_word5_t; #endif typedef union { unsigned int val : 32; sq_img_rsrc_word5_t f; } sq_img_rsrc_word5_u; #define SQ_IMG_RSRC_WORD6_REG_SIZE 32 #define SQ_IMG_RSRC_WORD6_MIN_LOD_WARN_SIZE 12 #define SQ_IMG_RSRC_WORD6_COUNTER_BANK_ID_SIZE 8 #define SQ_IMG_RSRC_WORD6_LOD_HDW_CNT_EN_SIZE 1 #define SQ_IMG_RSRC_WORD6_COMPRESSION_EN_SIZE 1 #define SQ_IMG_RSRC_WORD6_ALPHA_IS_ON_MSB_SIZE 1 #define SQ_IMG_RSRC_WORD6_COLOR_TRANSFORM_SIZE 1 #define SQ_IMG_RSRC_WORD6_LOST_ALPHA_BITS_SIZE 4 #define SQ_IMG_RSRC_WORD6_LOST_COLOR_BITS_SIZE 4 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_rsrc_word6_t { unsigned int min_lod_warn : SQ_IMG_RSRC_WORD6_MIN_LOD_WARN_SIZE; unsigned int counter_bank_id : SQ_IMG_RSRC_WORD6_COUNTER_BANK_ID_SIZE; unsigned int lod_hdw_cnt_en : SQ_IMG_RSRC_WORD6_LOD_HDW_CNT_EN_SIZE; unsigned int compression_en : SQ_IMG_RSRC_WORD6_COMPRESSION_EN_SIZE; unsigned int alpha_is_on_msb : SQ_IMG_RSRC_WORD6_ALPHA_IS_ON_MSB_SIZE; unsigned int color_transform : SQ_IMG_RSRC_WORD6_COLOR_TRANSFORM_SIZE; unsigned int lost_alpha_bits : SQ_IMG_RSRC_WORD6_LOST_ALPHA_BITS_SIZE; unsigned int lost_color_bits : SQ_IMG_RSRC_WORD6_LOST_COLOR_BITS_SIZE; } sq_img_rsrc_word6_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_rsrc_word6_t { unsigned int lost_color_bits : SQ_IMG_RSRC_WORD6_LOST_COLOR_BITS_SIZE; unsigned int lost_alpha_bits : SQ_IMG_RSRC_WORD6_LOST_ALPHA_BITS_SIZE; unsigned int color_transform : SQ_IMG_RSRC_WORD6_COLOR_TRANSFORM_SIZE; unsigned int alpha_is_on_msb : SQ_IMG_RSRC_WORD6_ALPHA_IS_ON_MSB_SIZE; unsigned int compression_en : SQ_IMG_RSRC_WORD6_COMPRESSION_EN_SIZE; unsigned int lod_hdw_cnt_en : SQ_IMG_RSRC_WORD6_LOD_HDW_CNT_EN_SIZE; unsigned int counter_bank_id : SQ_IMG_RSRC_WORD6_COUNTER_BANK_ID_SIZE; unsigned int min_lod_warn : SQ_IMG_RSRC_WORD6_MIN_LOD_WARN_SIZE; } sq_img_rsrc_word6_t; #endif typedef union { unsigned int val : 32; sq_img_rsrc_word6_t f; } sq_img_rsrc_word6_u; #define SQ_IMG_RSRC_WORD7_REG_SIZE 32 #define SQ_IMG_RSRC_WORD7_META_DATA_ADDRESS_SIZE 32 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_rsrc_word7_t { unsigned int meta_data_address : SQ_IMG_RSRC_WORD7_META_DATA_ADDRESS_SIZE; } sq_img_rsrc_word7_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_rsrc_word7_t { unsigned int meta_data_address : SQ_IMG_RSRC_WORD7_META_DATA_ADDRESS_SIZE; } sq_img_rsrc_word7_t; #endif typedef union { unsigned int val : 32; sq_img_rsrc_word7_t f; } sq_img_rsrc_word7_u; #define SQ_IMG_SAMP_WORD0_REG_SIZE 32 #define SQ_IMG_SAMP_WORD0_CLAMP_X_SIZE 3 #define SQ_IMG_SAMP_WORD0_CLAMP_Y_SIZE 3 #define SQ_IMG_SAMP_WORD0_CLAMP_Z_SIZE 3 #define SQ_IMG_SAMP_WORD0_MAX_ANISO_RATIO_SIZE 3 #define SQ_IMG_SAMP_WORD0_DEPTH_COMPARE_FUNC_SIZE 3 #define SQ_IMG_SAMP_WORD0_FORCE_UNNORMALIZED_SIZE 1 #define SQ_IMG_SAMP_WORD0_ANISO_THRESHOLD_SIZE 3 #define SQ_IMG_SAMP_WORD0_MC_COORD_TRUNC_SIZE 1 #define SQ_IMG_SAMP_WORD0_FORCE_DEGAMMA_SIZE 1 #define SQ_IMG_SAMP_WORD0_ANISO_BIAS_SIZE 6 #define SQ_IMG_SAMP_WORD0_TRUNC_COORD_SIZE 1 #define SQ_IMG_SAMP_WORD0_DISABLE_CUBE_WRAP_SIZE 1 #define SQ_IMG_SAMP_WORD0_FILTER_MODE_SIZE 2 #define SQ_IMG_SAMP_WORD0_COMPAT_MODE_SIZE 1 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_samp_word0_t { unsigned int clamp_x : SQ_IMG_SAMP_WORD0_CLAMP_X_SIZE; unsigned int clamp_y : SQ_IMG_SAMP_WORD0_CLAMP_Y_SIZE; unsigned int clamp_z : SQ_IMG_SAMP_WORD0_CLAMP_Z_SIZE; unsigned int max_aniso_ratio : SQ_IMG_SAMP_WORD0_MAX_ANISO_RATIO_SIZE; unsigned int depth_compare_func : SQ_IMG_SAMP_WORD0_DEPTH_COMPARE_FUNC_SIZE; unsigned int force_unnormalized : SQ_IMG_SAMP_WORD0_FORCE_UNNORMALIZED_SIZE; unsigned int aniso_threshold : SQ_IMG_SAMP_WORD0_ANISO_THRESHOLD_SIZE; unsigned int mc_coord_trunc : SQ_IMG_SAMP_WORD0_MC_COORD_TRUNC_SIZE; unsigned int force_degamma : SQ_IMG_SAMP_WORD0_FORCE_DEGAMMA_SIZE; unsigned int aniso_bias : SQ_IMG_SAMP_WORD0_ANISO_BIAS_SIZE; unsigned int trunc_coord : SQ_IMG_SAMP_WORD0_TRUNC_COORD_SIZE; unsigned int disable_cube_wrap : SQ_IMG_SAMP_WORD0_DISABLE_CUBE_WRAP_SIZE; unsigned int filter_mode : SQ_IMG_SAMP_WORD0_FILTER_MODE_SIZE; unsigned int compat_mode : SQ_IMG_SAMP_WORD0_COMPAT_MODE_SIZE; } sq_img_samp_word0_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_samp_word0_t { unsigned int compat_mode : SQ_IMG_SAMP_WORD0_COMPAT_MODE_SIZE; unsigned int filter_mode : SQ_IMG_SAMP_WORD0_FILTER_MODE_SIZE; unsigned int disable_cube_wrap : SQ_IMG_SAMP_WORD0_DISABLE_CUBE_WRAP_SIZE; unsigned int trunc_coord : SQ_IMG_SAMP_WORD0_TRUNC_COORD_SIZE; unsigned int aniso_bias : SQ_IMG_SAMP_WORD0_ANISO_BIAS_SIZE; unsigned int force_degamma : SQ_IMG_SAMP_WORD0_FORCE_DEGAMMA_SIZE; unsigned int mc_coord_trunc : SQ_IMG_SAMP_WORD0_MC_COORD_TRUNC_SIZE; unsigned int aniso_threshold : SQ_IMG_SAMP_WORD0_ANISO_THRESHOLD_SIZE; unsigned int force_unnormalized : SQ_IMG_SAMP_WORD0_FORCE_UNNORMALIZED_SIZE; unsigned int depth_compare_func : SQ_IMG_SAMP_WORD0_DEPTH_COMPARE_FUNC_SIZE; unsigned int max_aniso_ratio : SQ_IMG_SAMP_WORD0_MAX_ANISO_RATIO_SIZE; unsigned int clamp_z : SQ_IMG_SAMP_WORD0_CLAMP_Z_SIZE; unsigned int clamp_y : SQ_IMG_SAMP_WORD0_CLAMP_Y_SIZE; unsigned int clamp_x : SQ_IMG_SAMP_WORD0_CLAMP_X_SIZE; } sq_img_samp_word0_t; #endif typedef union { unsigned int val : 32; sq_img_samp_word0_t f; } sq_img_samp_word0_u; #define SQ_IMG_SAMP_WORD1_REG_SIZE 32 #define SQ_IMG_SAMP_WORD1_MIN_LOD_SIZE 12 #define SQ_IMG_SAMP_WORD1_MAX_LOD_SIZE 12 #define SQ_IMG_SAMP_WORD1_PERF_MIP_SIZE 4 #define SQ_IMG_SAMP_WORD1_PERF_Z_SIZE 4 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_samp_word1_t { unsigned int min_lod : SQ_IMG_SAMP_WORD1_MIN_LOD_SIZE; unsigned int max_lod : SQ_IMG_SAMP_WORD1_MAX_LOD_SIZE; unsigned int perf_mip : SQ_IMG_SAMP_WORD1_PERF_MIP_SIZE; unsigned int perf_z : SQ_IMG_SAMP_WORD1_PERF_Z_SIZE; } sq_img_samp_word1_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_samp_word1_t { unsigned int perf_z : SQ_IMG_SAMP_WORD1_PERF_Z_SIZE; unsigned int perf_mip : SQ_IMG_SAMP_WORD1_PERF_MIP_SIZE; unsigned int max_lod : SQ_IMG_SAMP_WORD1_MAX_LOD_SIZE; unsigned int min_lod : SQ_IMG_SAMP_WORD1_MIN_LOD_SIZE; } sq_img_samp_word1_t; #endif typedef union { unsigned int val : 32; sq_img_samp_word1_t f; } sq_img_samp_word1_u; #define SQ_IMG_SAMP_WORD2_REG_SIZE 32 #define SQ_IMG_SAMP_WORD2_LOD_BIAS_SIZE 14 #define SQ_IMG_SAMP_WORD2_LOD_BIAS_SEC_SIZE 6 #define SQ_IMG_SAMP_WORD2_XY_MAG_FILTER_SIZE 2 #define SQ_IMG_SAMP_WORD2_XY_MIN_FILTER_SIZE 2 #define SQ_IMG_SAMP_WORD2_Z_FILTER_SIZE 2 #define SQ_IMG_SAMP_WORD2_MIP_FILTER_SIZE 2 #define SQ_IMG_SAMP_WORD2_MIP_POINT_PRECLAMP_SIZE 1 #define SQ_IMG_SAMP_WORD2_BLEND_ZERO_PRT_SIZE 1 #define SQ_IMG_SAMP_WORD2_FILTER_PREC_FIX_SIZE 1 #define SQ_IMG_SAMP_WORD2_ANISO_OVERRIDE_SIZE 1 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_samp_word2_t { unsigned int lod_bias : SQ_IMG_SAMP_WORD2_LOD_BIAS_SIZE; unsigned int lod_bias_sec : SQ_IMG_SAMP_WORD2_LOD_BIAS_SEC_SIZE; unsigned int xy_mag_filter : SQ_IMG_SAMP_WORD2_XY_MAG_FILTER_SIZE; unsigned int xy_min_filter : SQ_IMG_SAMP_WORD2_XY_MIN_FILTER_SIZE; unsigned int z_filter : SQ_IMG_SAMP_WORD2_Z_FILTER_SIZE; unsigned int mip_filter : SQ_IMG_SAMP_WORD2_MIP_FILTER_SIZE; unsigned int mip_point_preclamp : SQ_IMG_SAMP_WORD2_MIP_POINT_PRECLAMP_SIZE; unsigned int blend_zero_prt : SQ_IMG_SAMP_WORD2_BLEND_ZERO_PRT_SIZE; unsigned int filter_prec_fix : SQ_IMG_SAMP_WORD2_FILTER_PREC_FIX_SIZE; unsigned int aniso_override : SQ_IMG_SAMP_WORD2_ANISO_OVERRIDE_SIZE; } sq_img_samp_word2_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_samp_word2_t { unsigned int aniso_override : SQ_IMG_SAMP_WORD2_ANISO_OVERRIDE_SIZE; unsigned int filter_prec_fix : SQ_IMG_SAMP_WORD2_FILTER_PREC_FIX_SIZE; unsigned int blend_zero_prt : SQ_IMG_SAMP_WORD2_BLEND_ZERO_PRT_SIZE; unsigned int mip_point_preclamp : SQ_IMG_SAMP_WORD2_MIP_POINT_PRECLAMP_SIZE; unsigned int mip_filter : SQ_IMG_SAMP_WORD2_MIP_FILTER_SIZE; unsigned int z_filter : SQ_IMG_SAMP_WORD2_Z_FILTER_SIZE; unsigned int xy_min_filter : SQ_IMG_SAMP_WORD2_XY_MIN_FILTER_SIZE; unsigned int xy_mag_filter : SQ_IMG_SAMP_WORD2_XY_MAG_FILTER_SIZE; unsigned int lod_bias_sec : SQ_IMG_SAMP_WORD2_LOD_BIAS_SEC_SIZE; unsigned int lod_bias : SQ_IMG_SAMP_WORD2_LOD_BIAS_SIZE; } sq_img_samp_word2_t; #endif typedef union { unsigned int val : 32; sq_img_samp_word2_t f; } sq_img_samp_word2_u; #define SQ_IMG_SAMP_WORD3_REG_SIZE 32 #define SQ_IMG_SAMP_WORD3_BORDER_COLOR_PTR_SIZE 12 #define SQ_IMG_SAMP_WORD3_SKIP_DEGAMMA_SIZE 1 #define SQ_IMG_SAMP_WORD3_BORDER_COLOR_TYPE_SIZE 2 #if defined(LITTLEENDIAN_CPU) typedef struct _sq_img_samp_word3_t { unsigned int border_color_ptr : SQ_IMG_SAMP_WORD3_BORDER_COLOR_PTR_SIZE; unsigned int skip_degamma : SQ_IMG_SAMP_WORD3_SKIP_DEGAMMA_SIZE; unsigned int : 17; unsigned int border_color_type : SQ_IMG_SAMP_WORD3_BORDER_COLOR_TYPE_SIZE; } sq_img_samp_word3_t; #elif defined(BIGENDIAN_CPU) typedef struct _sq_img_samp_word3_t { unsigned int border_color_type : SQ_IMG_SAMP_WORD3_BORDER_COLOR_TYPE_SIZE; unsigned int : 17; unsigned int skip_degamma : SQ_IMG_SAMP_WORD3_SKIP_DEGAMMA_SIZE; unsigned int border_color_ptr : SQ_IMG_SAMP_WORD3_BORDER_COLOR_PTR_SIZE; } sq_img_samp_word3_t; #endif typedef union { unsigned int val : 32; sq_img_samp_word3_t f; } sq_img_samp_word3_u; typedef enum FMT { FMT_INVALID = 0x00000000, FMT_8 = 0x00000001, FMT_16 = 0x00000002, FMT_8_8 = 0x00000003, FMT_32 = 0x00000004, FMT_16_16 = 0x00000005, FMT_10_11_11 = 0x00000006, FMT_11_11_10 = 0x00000007, FMT_10_10_10_2 = 0x00000008, FMT_2_10_10_10 = 0x00000009, FMT_8_8_8_8 = 0x0000000a, FMT_32_32 = 0x0000000b, FMT_16_16_16_16 = 0x0000000c, FMT_32_32_32 = 0x0000000d, FMT_32_32_32_32 = 0x0000000e, FMT_RESERVED_4 = 0x0000000f, FMT_5_6_5 = 0x00000010, FMT_1_5_5_5 = 0x00000011, FMT_5_5_5_1 = 0x00000012, FMT_4_4_4_4 = 0x00000013, FMT_8_24 = 0x00000014, FMT_24_8 = 0x00000015, FMT_X24_8_32_FLOAT = 0x00000016, FMT_RESERVED_33 = 0x00000017, FMT_11_11_10_FLOAT = 0x00000018, FMT_16_FLOAT = 0x00000019, FMT_32_FLOAT = 0x0000001a, FMT_16_16_FLOAT = 0x0000001b, FMT_8_24_FLOAT = 0x0000001c, FMT_24_8_FLOAT = 0x0000001d, FMT_32_32_FLOAT = 0x0000001e, FMT_10_11_11_FLOAT = 0x0000001f, FMT_16_16_16_16_FLOAT = 0x00000020, FMT_3_3_2 = 0x00000021, FMT_6_5_5 = 0x00000022, FMT_32_32_32_32_FLOAT = 0x00000023, FMT_RESERVED_36 = 0x00000024, FMT_1 = 0x00000025, FMT_1_REVERSED = 0x00000026, FMT_GB_GR = 0x00000027, FMT_BG_RG = 0x00000028, FMT_32_AS_8 = 0x00000029, FMT_32_AS_8_8 = 0x0000002a, FMT_5_9_9_9_SHAREDEXP = 0x0000002b, FMT_8_8_8 = 0x0000002c, FMT_16_16_16 = 0x0000002d, FMT_16_16_16_FLOAT = 0x0000002e, FMT_4_4 = 0x0000002f, FMT_32_32_32_FLOAT = 0x00000030, FMT_BC1 = 0x00000031, FMT_BC2 = 0x00000032, FMT_BC3 = 0x00000033, FMT_BC4 = 0x00000034, FMT_BC5 = 0x00000035, FMT_BC6 = 0x00000036, FMT_BC7 = 0x00000037, FMT_32_AS_32_32_32_32 = 0x00000038, FMT_APC3 = 0x00000039, FMT_APC4 = 0x0000003a, FMT_APC5 = 0x0000003b, FMT_APC6 = 0x0000003c, FMT_APC7 = 0x0000003d, FMT_CTX1 = 0x0000003e, FMT_RESERVED_63 = 0x0000003f, } FMT; typedef enum type { TYPE_UNORM = 0x00000000, TYPE_SNORM = 0x00000001, TYPE_USCALED = 0x00000002, TYPE_SSCALED = 0x00000003, TYPE_UINT = 0x00000004, TYPE_SINT = 0x00000005, TYPE_RESERVED_6 = 0x00000006, TYPE_FLOAT = 0x00000007, TYPE_RESERVED_8 = 0x00000008, TYPE_SRGB = 0x00000009, TYPE_UNORM_UINT = 0x0000000a, } type; typedef enum SEL { SEL_0 = 0x00000000, SEL_1 = 0x00000001, SEL_X = 0x00000004, SEL_Y = 0x00000005, SEL_Z = 0x00000006, SEL_W = 0x00000007, } SEL; typedef enum SQ_RSRC_IMG_TYPE { SQ_RSRC_IMG_1D = 0x00000008, SQ_RSRC_IMG_2D = 0x00000009, SQ_RSRC_IMG_3D = 0x0000000a, SQ_RSRC_IMG_1D_ARRAY = 0x0000000c, SQ_RSRC_IMG_2D_ARRAY = 0x0000000d, } SQ_RSRC_IMG_TYPE; typedef enum SQ_TEX_XY_FILTER { SQ_TEX_XY_FILTER_POINT = 0x00000000, SQ_TEX_XY_FILTER_BILINEAR = 0x00000001, SQ_TEX_XY_FILTER_ANISO_POINT = 0x00000002, SQ_TEX_XY_FILTER_ANISO_BILINEAR = 0x00000003, } SQ_TEX_XY_FILTER; typedef enum SQ_TEX_Z_FILTER { SQ_TEX_Z_FILTER_NONE = 0x00000000, SQ_TEX_Z_FILTER_POINT = 0x00000001, SQ_TEX_Z_FILTER_LINEAR = 0x00000002, } SQ_TEX_Z_FILTER; typedef enum SQ_TEX_MIP_FILTER { SQ_TEX_MIP_FILTER_NONE = 0x00000000, SQ_TEX_MIP_FILTER_POINT = 0x00000001, SQ_TEX_MIP_FILTER_LINEAR = 0x00000002, SQ_TEX_MIP_FILTER_POINT_ANISO_ADJ__VI = 0x00000003, } SQ_TEX_MIP_FILTER; typedef enum SQ_TEX_CLAMP { SQ_TEX_WRAP = 0x00000000, SQ_TEX_MIRROR = 0x00000001, SQ_TEX_CLAMP_LAST_TEXEL = 0x00000002, SQ_TEX_MIRROR_ONCE_LAST_TEXEL = 0x00000003, SQ_TEX_CLAMP_HALF_BORDER = 0x00000004, SQ_TEX_MIRROR_ONCE_HALF_BORDER = 0x00000005, SQ_TEX_CLAMP_BORDER = 0x00000006, SQ_TEX_MIRROR_ONCE_BORDER = 0x00000007, } SQ_TEX_CLAMP; typedef enum SQ_TEX_BORDER_COLOR { SQ_TEX_BORDER_COLOR_TRANS_BLACK = 0x00000000, SQ_TEX_BORDER_COLOR_OPAQUE_BLACK = 0x00000001, SQ_TEX_BORDER_COLOR_OPAQUE_WHITE = 0x00000002, SQ_TEX_BORDER_COLOR_REGISTER = 0x00000003, } SQ_TEX_BORDER_COLOR; typedef enum TEX_BC_SWIZZLE { TEX_BC_Swizzle_XYZW = 0x00000000, TEX_BC_Swizzle_XWYZ = 0x00000001, TEX_BC_Swizzle_WZYX = 0x00000002, TEX_BC_Swizzle_WXYZ = 0x00000003, TEX_BC_Swizzle_ZYXW = 0x00000004, TEX_BC_Swizzle_YXWZ = 0x00000005, } TEX_BC_SWIZZLE; typedef struct metadata_amd_ai_s { uint32_t version; // Must be 1 uint32_t vendorID; // AMD SQ_IMG_RSRC_WORD0 word0; SQ_IMG_RSRC_WORD1 word1; SQ_IMG_RSRC_WORD2 word2; SQ_IMG_RSRC_WORD3 word3; SQ_IMG_RSRC_WORD4 word4; SQ_IMG_RSRC_WORD5 word5; SQ_IMG_RSRC_WORD6 word6; SQ_IMG_RSRC_WORD7 word7; uint32_t mip_offsets[0]; //Mip level offset bits [39:8] for each level (if any) } metadata_amd_ai_t; } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_RESOURCE_AI_H ROCR-Runtime-rocm-5.0.0/src/image/resource_kv.h000066400000000000000000000333721420110115200212040ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_RESOURCE_KV_H #define HSA_RUNTIME_EXT_IMAGE_RESOURCE_KV_H #if defined(LITTLEENDIAN_CPU) #elif defined(BIGENDIAN_CPU) #else #error "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined" #endif namespace rocr { namespace image { union SQ_BUF_RSRC_WORD0 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int base_address : 32; #elif defined(BIGENDIAN_CPU) unsigned int base_address : 32; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_BUF_RSRC_WORD1 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int base_address_hi : 16; unsigned int stride : 14; unsigned int cache_swizzle : 1; unsigned int swizzle_enable : 1; #elif defined(BIGENDIAN_CPU) unsigned int swizzle_enable : 1; unsigned int cache_swizzle : 1; unsigned int stride : 14; unsigned int base_address_hi : 16; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_BUF_RSRC_WORD2 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int num_records : 32; #elif defined(BIGENDIAN_CPU) unsigned int num_records : 32; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_BUF_RSRC_WORD3 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int dst_sel_x : 3; unsigned int dst_sel_y : 3; unsigned int dst_sel_z : 3; unsigned int dst_sel_w : 3; unsigned int num_format : 3; unsigned int data_format : 4; unsigned int element_size : 2; unsigned int index_stride : 2; unsigned int add_tid_enable : 1; unsigned int atc : 1; unsigned int hash_enable : 1; unsigned int heap : 1; unsigned int mtype : 3; unsigned int type : 2; #elif defined(BIGENDIAN_CPU) unsigned int type : 2; unsigned int mtype : 3; unsigned int heap : 1; unsigned int hash_enable : 1; unsigned int atc : 1; unsigned int add_tid_enable : 1; unsigned int index_stride : 2; unsigned int element_size : 2; unsigned int data_format : 4; unsigned int num_format : 3; unsigned int dst_sel_w : 3; unsigned int dst_sel_z : 3; unsigned int dst_sel_y : 3; unsigned int dst_sel_x : 3; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_RSRC_WORD0 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int base_address : 32; #elif defined(BIGENDIAN_CPU) unsigned int base_address : 32; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_RSRC_WORD1 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int base_address_hi : 8; unsigned int min_lod : 12; unsigned int data_format : 6; unsigned int num_format : 4; unsigned int mtype : 2; #elif defined(BIGENDIAN_CPU) unsigned int mtype : 2; unsigned int num_format : 4; unsigned int data_format : 6; unsigned int min_lod : 12; unsigned int base_address_hi : 8; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_RSRC_WORD2 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int width : 14; unsigned int height : 14; unsigned int perf_mod : 3; unsigned int interlaced : 1; #elif defined(BIGENDIAN_CPU) unsigned int interlaced : 1; unsigned int perf_mod : 3; unsigned int height : 14; unsigned int width : 14; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_RSRC_WORD3 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int dst_sel_x : 3; unsigned int dst_sel_y : 3; unsigned int dst_sel_z : 3; unsigned int dst_sel_w : 3; unsigned int base_level : 4; unsigned int last_level : 4; unsigned int tiling_index : 5; unsigned int pow2_pad : 1; unsigned int mtype : 1; unsigned int atc : 1; unsigned int type : 4; #elif defined(BIGENDIAN_CPU) unsigned int type : 4; unsigned int atc : 1; unsigned int mtype : 1; unsigned int pow2_pad : 1; unsigned int tiling_index : 5; unsigned int last_level : 4; unsigned int base_level : 4; unsigned int dst_sel_w : 3; unsigned int dst_sel_z : 3; unsigned int dst_sel_y : 3; unsigned int dst_sel_x : 3; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_RSRC_WORD4 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int depth : 13; unsigned int pitch : 14; unsigned int : 5; #elif defined(BIGENDIAN_CPU) unsigned int : 5; unsigned int pitch : 14; unsigned int depth : 13; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_RSRC_WORD5 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int base_array : 13; unsigned int last_array : 13; unsigned int : 6; #elif defined(BIGENDIAN_CPU) unsigned int : 6; unsigned int last_array : 13; unsigned int base_array : 13; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_RSRC_WORD6 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int min_lod_warn : 12; unsigned int counter_bank_id : 8; unsigned int lod_hdw_cnt_en : 1; unsigned int compression_en : 1; unsigned int alpha_is_on_msb : 1; unsigned int color_transform : 1; unsigned int lost_alpha_bits : 4; unsigned int lost_color_bits : 4; #elif defined(BIGENDIAN_CPU) unsigned int lost_color_bits : 4; unsigned int lost_alpha_bits : 4; unsigned int color_transform : 1; unsigned int alpha_is_on_msb : 1; unsigned int compression_en : 1; unsigned int lod_hdw_cnt_en : 1; unsigned int counter_bank_id : 8; unsigned int min_lod_warn : 12; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_RSRC_WORD7 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int meta_data_address : 32; #elif defined(BIGENDIAN_CPU) unsigned int meta_data_address : 32; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_SAMP_WORD0 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int clamp_x : 3; unsigned int clamp_y : 3; unsigned int clamp_z : 3; unsigned int max_aniso_ratio : 3; unsigned int depth_compare_func : 3; unsigned int force_unormalized : 1; unsigned int aniso_threshold : 3; unsigned int mc_coord_trunc : 1; unsigned int force_degamma : 1; unsigned int aniso_bias : 6; unsigned int trunc_coord : 1; unsigned int disable_cube_wrap : 1; unsigned int filter_mode : 2; unsigned int compat_mode : 1; #elif defined(BIGENDIAN_CPU) unsigned int compat_mode : 1; unsigned int filter_mode : 2; unsigned int disable_cube_wrap : 1; unsigned int trunc_coord : 1; unsigned int aniso_bias : 6; unsigned int force_degamma : 1; unsigned int mc_coord_trunc : 1; unsigned int aniso_threshold : 3; unsigned int force_unormalized : 1; unsigned int depth_compare_func : 3; unsigned int max_aniso_ratio : 3; unsigned int clamp_z : 3; unsigned int clamp_y : 3; unsigned int clamp_x : 3; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_SAMP_WORD1 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int min_lod : 12; unsigned int max_lod : 12; unsigned int perf_mip : 4; unsigned int perf_z : 4; #elif defined(BIGENDIAN_CPU) unsigned int perf_z : 4; unsigned int perf_mip : 4; unsigned int max_lod : 12; unsigned int min_lod : 12; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_SAMP_WORD2 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int lod_bias : 14; unsigned int lod_bias_sec : 6; unsigned int xy_mag_filter : 2; unsigned int xy_min_filter : 2; unsigned int z_filter : 2; unsigned int mip_filter : 2; unsigned int mip_point_preclamp : 1; unsigned int disable_lsb_ceil : 1; unsigned int filter_prec_fix : 1; unsigned int aniso_override_vi : 1; #elif defined(BIGENDIAN_CPU) unsigned int aniso_override_vi : 1; unsigned int filter_prec_fix : 1; unsigned int disable_lsb_ceil : 1; unsigned int mip_point_preclamp : 1; unsigned int mip_filter : 2; unsigned int z_filter : 2; unsigned int xy_min_filter : 2; unsigned int xy_mag_filter : 2; unsigned int lod_bias_sec : 6; unsigned int lod_bias : 14; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; union SQ_IMG_SAMP_WORD3 { struct { #if defined(LITTLEENDIAN_CPU) unsigned int border_color_ptr : 12; unsigned int : 18; unsigned int border_color_type : 2; #elif defined(BIGENDIAN_CPU) unsigned int border_color_type : 2; unsigned int : 18; unsigned int border_color_ptr : 12; #endif } bitfields, bits; unsigned int u32_all; signed int i32_all; float f32_all; }; typedef enum FMT { FMT_INVALID = 0x00000000, FMT_8 = 0x00000001, FMT_16 = 0x00000002, FMT_8_8 = 0x00000003, FMT_32 = 0x00000004, FMT_16_16 = 0x00000005, FMT_10_10_10_2 = 0x00000008, FMT_2_10_10_10 = 0x00000009, FMT_8_8_8_8 = 0x0000000a, FMT_32_32 = 0x0000000b, FMT_16_16_16_16 = 0x0000000c, FMT_32_32_32 = 0x0000000d, FMT_32_32_32_32 = 0x0000000e, FMT_5_6_5 = 0x00000010, FMT_1_5_5_5 = 0x00000011, FMT_5_5_5_1 = 0x00000012, FMT_8_24 = 0x00000014, FMT_24_8 = 0x00000015, FMT_X24_8_32 = 0x00000016, FMT_RESERVED_24__SI__CI = 0x00000018 } FMT; typedef enum type { TYPE_UNORM = 0x00000000, TYPE_SNORM = 0x00000001, TYPE_UINT = 0x00000004, TYPE_SINT = 0x00000005, TYPE_FLOAT = 0x00000007, TYPE_SRGB = 0x00000009 } type; typedef enum SEL { SEL_0 = 0x00000000, SEL_1 = 0x00000001, SEL_X = 0x00000004, SEL_Y = 0x00000005, SEL_Z = 0x00000006, SEL_W = 0x00000007, } SEL; typedef enum SQ_RSRC_IMG_TYPE { SQ_RSRC_IMG_1D = 0x00000008, SQ_RSRC_IMG_2D = 0x00000009, SQ_RSRC_IMG_3D = 0x0000000a, SQ_RSRC_IMG_1D_ARRAY = 0x0000000c, SQ_RSRC_IMG_2D_ARRAY = 0x0000000d, } SQ_RSRC_IMG_TYPE; typedef enum SQ_TEX_XY_FILTER { SQ_TEX_XY_FILTER_POINT = 0x00000000, SQ_TEX_XY_FILTER_BILINEAR = 0x00000001, SQ_TEX_XY_FILTER_ANISO_POINT = 0x00000002, SQ_TEX_XY_FILTER_ANISO_BILINEAR = 0x00000003, } SQ_TEX_XY_FILTER; typedef enum SQ_TEX_Z_FILTER { SQ_TEX_Z_FILTER_NONE = 0x00000000, SQ_TEX_Z_FILTER_POINT = 0x00000001, SQ_TEX_Z_FILTER_LINEAR = 0x00000002, } SQ_TEX_Z_FILTER; typedef enum SQ_TEX_MIP_FILTER { SQ_TEX_MIP_FILTER_NONE = 0x00000000, SQ_TEX_MIP_FILTER_POINT = 0x00000001, SQ_TEX_MIP_FILTER_LINEAR = 0x00000002, SQ_TEX_MIP_FILTER_POINT_ANISO_ADJ__VI = 0x00000003, } SQ_TEX_MIP_FILTER; typedef enum SQ_TEX_CLAMP { SQ_TEX_WRAP = 0x00000000, SQ_TEX_MIRROR = 0x00000001, SQ_TEX_CLAMP_LAST_TEXEL = 0x00000002, SQ_TEX_MIRROR_ONCE_LAST_TEXEL = 0x00000003, SQ_TEX_CLAMP_HALF_BORDER = 0x00000004, SQ_TEX_MIRROR_ONCE_HALF_BORDER = 0x00000005, SQ_TEX_CLAMP_BORDER = 0x00000006, SQ_TEX_MIRROR_ONCE_BORDER = 0x00000007, } SQ_TEX_CLAMP; typedef enum SQ_TEX_BORDER_COLOR { SQ_TEX_BORDER_COLOR_TRANS_BLACK = 0x00000000, SQ_TEX_BORDER_COLOR_OPAQUE_BLACK = 0x00000001, SQ_TEX_BORDER_COLOR_OPAQUE_WHITE = 0x00000002, SQ_TEX_BORDER_COLOR_REGISTER = 0x00000003, } SQ_TEX_BORDER_COLOR; typedef struct metadata_amd_ci_vi_s { uint32_t version; // Must be 1 uint32_t vendorID; // AMD | CZ SQ_IMG_RSRC_WORD0 word0; SQ_IMG_RSRC_WORD1 word1; SQ_IMG_RSRC_WORD2 word2; SQ_IMG_RSRC_WORD3 word3; SQ_IMG_RSRC_WORD4 word4; SQ_IMG_RSRC_WORD5 word5; SQ_IMG_RSRC_WORD6 word6; SQ_IMG_RSRC_WORD7 word7; uint32_t mip_offsets[0]; //Mip level offset bits [39:8] for each level (if any) } metadata_amd_ci_vi_t; } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_RESOURCE_KV_H ROCR-Runtime-rocm-5.0.0/src/image/resource_nv.h000077500000000000000000001074321420110115200212110ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef EXT_IMAGE_RESOURCE_NV_H_ #define EXT_IMAGE_RESOURCE_NV_H_ #if defined(LITTLEENDIAN_CPU) #elif defined(BIGENDIAN_CPU) #else #error "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined" #endif namespace rocr { namespace image { /**********************************************************/ /**********************************************************/ #define SQ_BUF_RSC_WRD0_REG_SZ 32 #define SQ_BUF_RSC_WRD0_BASE_ADDRESS_SZ 32 struct sq_buf_rsrc_word0_t { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS : SQ_BUF_RSC_WRD0_BASE_ADDRESS_SZ; #elif defined(BIGENDIAN_CPU) unsigned int BASE_ADDRESS : SQ_BUF_RSC_WRD0_BASE_ADDRESS_SZ; #endif }; union SQ_BUF_RSRC_WORD0 { sq_buf_rsrc_word0_t bitfields, bits, f; uint32_t val : SQ_BUF_RSC_WRD0_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_BUF_RSC_WRD1_REG_SZ 32 #define SQ_BUF_RSC_WRD1_BASE_ADDRESS_HI_SZ 16 #define SQ_BUF_RSC_WRD1_STRIDE_SZ 14 #define SQ_BUF_RSC_WRD1_CACHE_SWIZZLE_SZ 1 #define SQ_BUF_RSC_WRD1_SWIZZLE_ENABLE_SZ 1 struct sq_buf_rsrc_word1_t { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS_HI : SQ_BUF_RSC_WRD1_BASE_ADDRESS_HI_SZ; unsigned int STRIDE : SQ_BUF_RSC_WRD1_STRIDE_SZ; unsigned int CACHE_SWIZZLE : SQ_BUF_RSC_WRD1_CACHE_SWIZZLE_SZ; unsigned int SWIZZLE_ENABLE : SQ_BUF_RSC_WRD1_SWIZZLE_ENABLE_SZ; #elif defined(BIGENDIAN_CPU) unsigned int SWIZZLE_ENABLE : SQ_BUF_RSC_WRD1_SWIZZLE_ENABLE_SZ; unsigned int CACHE_SWIZZLE : SQ_BUF_RSC_WRD1_CACHE_SWIZZLE_SZ; unsigned int STRIDE : SQ_BUF_RSC_WRD1_STRIDE_SZ; unsigned int BASE_ADDRESS_HI : SQ_BUF_RSC_WRD1_BASE_ADDRESS_HI_SZ; #endif }; union SQ_BUF_RSRC_WORD1 { sq_buf_rsrc_word1_t bitfields, bits, f; uint32_t val : SQ_BUF_RSC_WRD1_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_BUF_RSC_WRD2_REG_SZ 32 #define SQ_BUF_RSC_WRD2_NUM_RECORDS_SZ 32 struct sq_buf_rsrc_word2_t { #if defined(LITTLEENDIAN_CPU) unsigned int NUM_RECORDS : SQ_BUF_RSC_WRD2_NUM_RECORDS_SZ; #elif defined(BIGENDIAN_CPU) unsigned int NUM_RECORDS : SQ_BUF_RSC_WRD2_NUM_RECORDS_SZ; #endif }; union SQ_BUF_RSRC_WORD2 { sq_buf_rsrc_word2_t bitfields, bits, f; uint32_t val : SQ_BUF_RSC_WRD2_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_BUF_RSC_WRD3_REG_SZ 32 #define SQ_BUF_RSC_WRD3_DST_SEL_X_SZ 3 #define SQ_BUF_RSC_WRD3_DST_SEL_Y_SZ 3 #define SQ_BUF_RSC_WRD3_DST_SEL_Z_SZ 3 #define SQ_BUF_RSC_WRD3_DST_SEL_W_SZ 3 #define SQ_BUF_RSC_WRD3_FORMAT_SZ 7 #define SQ_BUF_RSC_WRD3_INDEX_STRIDE_SZ 2 #define SQ_BUF_RSC_WRD3_ADD_TID_ENABLE_SZ 1 #define SQ_BUF_RSC_WRD3_RESOURCE_LEVEL 1 #define SQ_BUF_RSC_WRD3_RESERVED_1 2 #define SQ_BUF_RSC_WORD3_OOB_SELECT_SZ 2 #define SQ_BUF_RSC_WRD3_TYPE_SZ 2 struct sq_buf_rsrc_word3_t { #if defined(LITTLEENDIAN_CPU) unsigned int DST_SEL_X : SQ_BUF_RSC_WRD3_DST_SEL_X_SZ; unsigned int DST_SEL_Y : SQ_BUF_RSC_WRD3_DST_SEL_Y_SZ; unsigned int DST_SEL_Z : SQ_BUF_RSC_WRD3_DST_SEL_Z_SZ; unsigned int DST_SEL_W : SQ_BUF_RSC_WRD3_DST_SEL_W_SZ; unsigned int FORMAT : SQ_BUF_RSC_WRD3_FORMAT_SZ; unsigned int : 2; unsigned int INDEX_STRIDE : SQ_BUF_RSC_WRD3_INDEX_STRIDE_SZ; unsigned int ADD_TID_ENABLE : SQ_BUF_RSC_WRD3_ADD_TID_ENABLE_SZ; unsigned int RESOURCE_LEVEL : SQ_BUF_RSC_WRD3_RESOURCE_LEVEL; unsigned int : 1; unsigned int RESERVED_1 : SQ_BUF_RSC_WRD3_RESERVED_1; unsigned int OOB_SELECT : SQ_BUF_RSC_WORD3_OOB_SELECT_SZ; unsigned int TYPE : SQ_BUF_RSC_WRD3_TYPE_SZ; #elif defined(BIGENDIAN_CPU) unsigned int TYPE : SQ_BUF_RSC_WRD3_TYPE_SZ; unsigned int OOB_SELECT : SQ_BUF_RSC_WORD3_OOB_SELECT_SZ; unsigned int RESERVED_1 : SQ_BUF_RSC_WRD3_RESERVED_1; unsigned int : 1; unsigned int RESOURCE_LEVEL : SQ_BUF_RSC_WRD3_RESOURCE_LEVEL; unsigned int ADD_TID_ENABLE : SQ_BUF_RSC_WRD3_ADD_TID_ENABLE_SZ; unsigned int INDEX_STRIDE : SQ_BUF_RSC_WRD3_INDEX_STRIDE_SZ; unsigned int : 2; unsigned int FORMAT : SQ_BUF_RSC_WRD3_FORMAT_SZ; unsigned int DST_SEL_W : SQ_BUF_RSC_WRD3_DST_SEL_W_SZ; unsigned int DST_SEL_Z : SQ_BUF_RSC_WRD3_DST_SEL_Z_SZ; unsigned int DST_SEL_Y : SQ_BUF_RSC_WRD3_DST_SEL_Y_SZ; unsigned int DST_SEL_X : SQ_BUF_RSC_WRD3_DST_SEL_X_SZ; #endif }; union SQ_BUF_RSRC_WORD3 { sq_buf_rsrc_word3_t bitfields, bits, f; uint32_t val : SQ_BUF_RSC_WRD3_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ /**********************************************************/ /**********************************************************/ #define SQ_IMG_RSC_WRD0_REG_SZ 32 #define SQ_IMG_RSC_WRD0_BASE_ADDRESS_SZ 32 struct sq_img_rsrc_word0_t { #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS : SQ_IMG_RSC_WRD0_BASE_ADDRESS_SZ; #elif defined(BIGENDIAN_CPU) unsigned int BASE_ADDRESS : SQ_IMG_RSC_WRD0_BASE_ADDRESS_SZ; #endif }; union SQ_IMG_RSRC_WORD0 { sq_img_rsrc_word0_t bitfields, bits, f; uint32_t val : SQ_IMG_RSC_WRD0_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_RSC_WRD1_REG_SZ 32 #define SQ_IMG_RSC_WRD1_BASE_ADDRESS_HI_SZ 8 #define SQ_IMG_RSC_WRD1_MIN_LOD_SZ 12 #define SQ_IMG_RSC_WRD1_FORMAT_SZ 9 #define SQ_IMG_RSC_WRD1_WIDTH_LO 2 struct sq_img_rsrc_word1_t{ #if defined(LITTLEENDIAN_CPU) unsigned int BASE_ADDRESS_HI : SQ_IMG_RSC_WRD1_BASE_ADDRESS_HI_SZ; unsigned int MIN_LOD : SQ_IMG_RSC_WRD1_MIN_LOD_SZ; unsigned int FORMAT : SQ_IMG_RSC_WRD1_FORMAT_SZ; unsigned int : 1; unsigned int WIDTH : SQ_IMG_RSC_WRD1_WIDTH_LO; #elif defined(BIGENDIAN_CPU) unsigned int WIDTH : SQ_IMG_RSC_WRD1_WIDTH_LO; unsigned int : 1; unsigned int FORMAT : SQ_IMG_RSC_WRD1_FORMAT_SZ; unsigned int MIN_LOD : SQ_IMG_RSC_WRD1_MIN_LOD_SZ; unsigned int BASE_ADDRESS_HI : SQ_IMG_RSC_WRD1_BASE_ADDRESS_HI_SZ; #endif }; union SQ_IMG_RSRC_WORD1 { sq_img_rsrc_word1_t bitfields, bits, f; uint32_t val : SQ_IMG_RSC_WRD1_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_RSC_WRD2_REG_SZ 32 #define SQ_IMG_RSC_WRD2_WIDTH_HI_SZ 12 #define SQ_IMG_RSC_WRD2_HEIGHT_SZ 14 #define SQ_IMG_RSC_WRD2_RESOURCE_LEVEL_SZ 1 struct sq_img_rsrc_word2_t { #if defined(LITTLEENDIAN_CPU) unsigned int WIDTH_HI : SQ_IMG_RSC_WRD2_WIDTH_HI_SZ; unsigned int : 2; unsigned int HEIGHT : SQ_IMG_RSC_WRD2_HEIGHT_SZ; unsigned int : 2; unsigned int : 1; unsigned int RESOURCE_LEVEL : SQ_IMG_RSC_WRD2_RESOURCE_LEVEL_SZ; #elif defined(BIGENDIAN_CPU) unsigned int RESOURCE_LEVEL : SQ_IMG_RSC_WRD2_RESOURCE_LEVEL_SZ; unsigned int RESERVED : 1; unsigned int RESERVED : 2; unsigned int HEIGHT : SQ_IMG_RSC_WRD2_HEIGHT_SZ; unsigned int : 2; unsigned int WIDTH_HI : SQ_IMG_RSC_WRD2_WIDTH_SZ; #endif }; union SQ_IMG_RSRC_WORD2 { sq_img_rsrc_word2_t bitfields, bits, f; uint32_t val : SQ_IMG_RSC_WRD2_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_RSC_WRD3_REG_SZ 32 #define SQ_IMG_RSC_WRD3_DST_SEL_X_SZ 3 #define SQ_IMG_RSC_WRD3_DST_SEL_Y_SZ 3 #define SQ_IMG_RSC_WRD3_DST_SEL_Z_SZ 3 #define SQ_IMG_RSC_WRD3_DST_SEL_W_SZ 3 #define SQ_IMG_RSC_WRD3_BASE_LEVEL_SZ 4 #define SQ_IMG_RSC_WRD3_LAST_LEVEL_SZ 4 #define SQ_IMG_RSC_WRD3_SW_MODE_SZ 5 #define SQ_IMG_RSC_WRD3_BC_SWIZZLE_SZ 3 #define SQ_IMG_RSC_WRD3_TYPE_SZ 4 struct sq_img_rsrc_word3_t { #if defined(LITTLEENDIAN_CPU) unsigned int DST_SEL_X : SQ_IMG_RSC_WRD3_DST_SEL_X_SZ; unsigned int DST_SEL_Y : SQ_IMG_RSC_WRD3_DST_SEL_Y_SZ; unsigned int DST_SEL_Z : SQ_IMG_RSC_WRD3_DST_SEL_Z_SZ; unsigned int DST_SEL_W : SQ_IMG_RSC_WRD3_DST_SEL_W_SZ; unsigned int BASE_LEVEL : SQ_IMG_RSC_WRD3_BASE_LEVEL_SZ; unsigned int LAST_LEVEL : SQ_IMG_RSC_WRD3_LAST_LEVEL_SZ; unsigned int SW_MODE : SQ_IMG_RSC_WRD3_SW_MODE_SZ; unsigned int BC_SWIZZLE : SQ_IMG_RSC_WRD3_BC_SWIZZLE_SZ; unsigned int TYPE : SQ_IMG_RSC_WRD3_TYPE_SZ; #elif defined(BIGENDIAN_CPU) unsigned int TYPE : SQ_IMG_RSC_WRD3_TYPE_SZ; unsigned int BC_SWIZZLE : SQ_IMG_RSC_WRD3_BC_SWIZZLE_SZ; unsigned int W_MODE : SQ_IMG_RSC_WRD3_SW_MODE_SZ; unsigned int LAST_LEVEL : SQ_IMG_RSC_WRD3_LAST_LEVEL_SZ; unsigned int BASE_LEVEL : SQ_IMG_RSC_WRD3_BASE_LEVEL_SZ; unsigned int DST_SEL_W : SQ_IMG_RSC_WRD3_DST_SEL_W_SZ; unsigned int DST_SEL_Z : SQ_IMG_RSC_WRD3_DST_SEL_Z_SZ; unsigned int DST_SEL_Y : SQ_IMG_RSC_WRD3_DST_SEL_Y_SZ; unsigned int DST_SEL_X : SQ_IMG_RSC_WRD3_DST_SEL_X_SZ; #endif }; union SQ_IMG_RSRC_WORD3 { sq_img_rsrc_word3_t bitfields, bits, f; uint32_t val : SQ_IMG_RSC_WRD3_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_RSC_WRD4_REG_SZ 32 #define SQ_IMG_RSC_WRD4_DEPTH_SZ 13 #define SQ_IMG_RSC_WRD4_BASE_ARR_SZ 13 #define SQ_IMG_RSC_WRD4_PITCH_SZ 14 union sq_img_rsrc_word4_t { struct { #if defined(LITTLEENDIAN_CPU) // For arrays this is last slice in view, for 3D this is depth-1, For remaining this is pitch-1 unsigned int DEPTH : SQ_IMG_RSC_WRD4_DEPTH_SZ; unsigned int : 1; //Pitch[13] in gfx1030 unsigned int : 2; unsigned int BASE_ARRAY : SQ_IMG_RSC_WRD4_BASE_ARR_SZ; unsigned int : 3; #elif defined(BIGENDIAN_CPU) unsigned int : 3; unsigned int BASE_ARRAY : SQ_IMG_RSC_WRD4_BASE_ARR_SZ; unsigned int : 2; unsigned int : 1; //Pitch[13] in gfx1030 unsigned int DEPTH : SQ_IMG_RSC_WRD4_DEPTH_SZ; //Pitch[0:12] in gfx1030 #endif }; struct { #if defined(LITTLEENDIAN_CPU) // For 1d, 2d and 2d-msaa in gfx1030 this is pitch-1 unsigned int PITCH : SQ_IMG_RSC_WRD4_PITCH_SZ; unsigned int : SQ_IMG_RSC_WRD4_REG_SZ-SQ_IMG_RSC_WRD4_PITCH_SZ; #elif defined(BIGENDIAN_CPU) unsigned int : SQ_IMG_RSC_WRD4_REG_SZ-SQ_IMG_RSC_WRD4_PITCH_SZ; unsigned int PITCH : SQ_IMG_RSC_WRD4_PITCH_SZ; #endif }; }; union SQ_IMG_RSRC_WORD4 { sq_img_rsrc_word4_t bitfields, bits, f; uint32_t val : SQ_IMG_RSC_WRD4_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_RSC_WRD5_REG_SZ 32 #define SQ_IMG_RSC_WRD5_ARRAY_PITCH_SZ 4 #define SQ_IMG_RSC_WRD5_MAX_MIP_SZ 4 //#define SQ_IMG_RSC_WRD5_DSCAL_OR_MID_LOD_WRN_SZ 4 //#define SQ_IMG_RSC_WRD5_HSCAL_OR_MID_LOD_WRN_SZ 4 //#define SQ_IMG_RSC_WRD5_WSCAL_OR_MID_LOD_WRN_SZ 4 #define SQ_IMG_RSC_WRD5_MID_LOD_WRN_SZ 12 #define SQ_IMG_RSC_WRD5_PERF_MOD_SZ 3 #define SQ_IMG_RSC_WRD5_CORNER_SAMPLES_SZ 1 #define SQ_IMG_RSC_WRD5_LINKED_RESOURCE_SZ 1 #define SQ_IMG_RSC_WRD5_LOD_HDW_CNT_EN_SZ 1 #define SQ_IMG_RSC_WRD5_PRT_DEFAULT_SZ 1 #define SQ_IMG_RSC_WRD5_BIG_PAGE_SZ 1 struct sq_img_rsrc_word5_t { #if defined(LITTLEENDIAN_CPU) unsigned int ARRAY_PITCH : SQ_IMG_RSC_WRD5_ARRAY_PITCH_SZ; unsigned int MAX_MIP : SQ_IMG_RSC_WRD5_MAX_MIP_SZ; unsigned int MID_LOD_WRN : SQ_IMG_RSC_WRD5_MID_LOD_WRN_SZ; // unsigned int DSCAL_OR_MID_LOD_WRN : SQ_IMG_RSC_WRD5_DSCAL_OR_MID_LOD_WRN_SZ; // unsigned int HSCAL_OR_MID_LOD_WRN : SQ_IMG_RSC_WRD5_HSCAL_OR_MID_LOD_WRN_SZ; // unsigned int WSCAL_OR_MID_LOD_WRN : SQ_IMG_RSC_WRD5_WSCAL_OR_MID_LOD_WRN_SZ; unsigned int PERF_MOD : SQ_IMG_RSC_WRD5_PERF_MOD_SZ; unsigned int CORNER_SAMPLES : SQ_IMG_RSC_WRD5_CORNER_SAMPLES_SZ; unsigned int LINKED_RESOURCE : SQ_IMG_RSC_WRD5_LINKED_RESOURCE_SZ; unsigned int LOD_HDW_CNT_EN : SQ_IMG_RSC_WRD5_LOD_HDW_CNT_EN_SZ; unsigned int PRT_DEFAULT : SQ_IMG_RSC_WRD5_PRT_DEFAULT_SZ; unsigned int : 4; unsigned int BIG_PAGE : SQ_IMG_RSC_WRD5_BIG_PAGE_SZ; #elif defined(BIGENDIAN_CPU) unsigned int BIG_PAGE : SQ_IMG_RSC_WRD5_BIG_PAGE_SZ; unsigned int : 4; unsigned int PRT_DEFAULT : SQ_IMG_RSC_WRD5_PRT_DEFAULT_SZ; unsigned int LOD_HDW_CNT_EN : SQ_IMG_RSC_WRD5_LOD_HDW_CNT_EN_SZ; unsigned int LINKED_RESOURCE : SQ_IMG_RSC_WRD5_LINKED_RESOURCE_SZ; unsigned int CORNER_SAMPLES : SQ_IMG_RSC_WRD5_CORNER_SAMPLES_SZ; unsigned int PERF_MOD : SQ_IMG_RSC_WRD5_PERF_MOD_SZ; unsigned int MID_LOD_WRN : SQ_IMG_RSC_WRD5_MID_LOD_WRN_SZ; // unsigned int WSCAL_OR_MID_LOD_WRN : SQ_IMG_RSC_WRD5_WSCAL_OR_MID_LOD_WRN_SZ; // unsigned int HSCAL_OR_MID_LOD_WRN : SQ_IMG_RSC_WRD5_HSCAL_OR_MID_LOD_WRN_SZ; // unsigned int DSCAL_OR_MID_LOD_WRN : SQ_IMG_RSC_WRD5_DSCAL_OR_MID_LOD_WRN_SZ; unsigned int MAX_MIP : SQ_IMG_RSC_WRD5_MAX_MIP_SZ; unsigned int ARRAY_PITCH : SQ_IMG_RSC_WRD5_ARRAY_PITCH_SZ; #endif }; union SQ_IMG_RSRC_WORD5 { sq_img_rsrc_word5_t bitfields, bits, f; uint32_t val : SQ_IMG_RSC_WRD5_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_RSC_WRD6_REG_SZ 32 #define SQ_IMG_RSC_WRD6_COUNTER_BANK_ID_SZ 8 #define SQ_IMG_RSC_WRD6_RESERVED_2_SZ 2 #define SQ_IMG_RSC_WRD6_ITERATE_256_SZ 1 #define SQ_IMG_RSC_WRD6_MAX_UNCOMP_BLK_SZ_SZ 2 #define SQ_IMG_RSC_WRD6_MAX_COMP_BLK_SZ_SZ 2 #define SQ_IMG_RSC_WRD6_META_PIPE_ALIGNED_SZ 1 #define SQ_IMG_RSC_WRD6_WRITE_COMPRESS_EN_SZ 1 #define SQ_IMG_RSC_WRD6_COMPRESSION_ENABLE_SZ 1 #define SQ_IMG_RSC_WRD6_ALPHA_IS_ON_MSB_SZ 1 #define SQ_IMG_RSC_WRD6_COLOR_TRANSFORM_SZ 1 #define SQ_IMG_RSC_WRD6_META_DATA_ADDR_SZ 8 struct sq_img_rsrc_word6_t { #if defined(LITTLEENDIAN_CPU) unsigned int COUNTER_BANK_ID : SQ_IMG_RSC_WRD6_COUNTER_BANK_ID_SZ; unsigned int RESERVED_2 : SQ_IMG_RSC_WRD6_RESERVED_2_SZ; unsigned int ITERATE_256 : SQ_IMG_RSC_WRD6_ITERATE_256_SZ; unsigned int : 4; unsigned int MAX_UNCOMP_BLK_SZ : SQ_IMG_RSC_WRD6_MAX_UNCOMP_BLK_SZ_SZ; unsigned int MAX_COMP_BLK_SZ : SQ_IMG_RSC_WRD6_MAX_COMP_BLK_SZ_SZ; unsigned int META_PIPE_ALIGNED : SQ_IMG_RSC_WRD6_META_PIPE_ALIGNED_SZ; unsigned int WRITE_COMPRESS_ENABLE : SQ_IMG_RSC_WRD6_WRITE_COMPRESS_EN_SZ; unsigned int COMPRESSION_ENABLE : SQ_IMG_RSC_WRD6_COMPRESSION_ENABLE_SZ; unsigned int ALPHA_IS_ON_MSB : SQ_IMG_RSC_WRD6_ALPHA_IS_ON_MSB_SZ; unsigned int COLOR_TRANSFORM : SQ_IMG_RSC_WRD6_COLOR_TRANSFORM_SZ; unsigned int META_DATA_ADDRESS : SQ_IMG_RSC_WRD6_META_DATA_ADDR_SZ; #elif defined(BIGENDIAN_CPU) unsigned int META_DATA_ADDRESS : SQ_IMG_RSC_WRD6_META_DATA_ADDR_SZ; unsigned int COLOR_TRANSFORM : SQ_IMG_RSC_WRD6_COLOR_TRANSFORM_SZ; unsigned int ALPHA_IS_ON_MSB : SQ_IMG_RSC_WRD6_ALPHA_IS_ON_MSB_SZ; unsigned int COMPRESSION_ENABLE : SQ_IMG_RSC_WRD6_COMPRESSION_ENABLE_SZ; unsigned int WRITE_COMPRESS_ENABLE : SQ_IMG_RSC_WRD6_WRITE_COMPRESS_EN_SZ; unsigned int META_PIPE_ALIGNED : SQ_IMG_RSC_WRD6_META_PIPE_ALIGNED_SZ; unsigned int MAX_COMP_BLK_SZ : SQ_IMG_RSC_WRD6_MAX_COMP_BLK_SZ_SZ; unsigned int MAX_UNCOMP_BLK_SZ : SQ_IMG_RSC_WRD6_MAX_UNCOMP_BLK_SZ_SZ; unsigned int : 4; unsigned int ITERATE_256 : SQ_IMG_RSC_WRD6_ITERATE_256_SZ; unsigned int RESERVED_2 : SQ_IMG_RSC_WRD6_RESERVED_2_SZ; unsigned int COUNTER_BANK_ID : SQ_IMG_RSC_WRD6_COUNTER_BANK_ID_SZ; #endif }; union SQ_IMG_RSRC_WORD6 { sq_img_rsrc_word6_t bitfields, bits, f; uint32_t val : SQ_IMG_RSC_WRD6_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_RSC_WRD7_REG_SZ 32 #define SQ_IMG_RSC_WRD7_META_DATA_ADDRESS_HI_SZ 32 struct sq_img_rsrc_word7_t { #if defined(LITTLEENDIAN_CPU) unsigned int META_DATA_ADDRESS_HI : SQ_IMG_RSC_WRD7_META_DATA_ADDRESS_HI_SZ; #elif defined(BIGENDIAN_CPU) unsigned int META_DATA_ADDRESS_HI : SQ_IMG_RSC_WRD7_META_DATA_ADDRESS_HI_SZ; #endif }; union SQ_IMG_RSRC_WORD7 { sq_img_rsrc_word7_t bitfields, bits, f; uint32_t val : SQ_IMG_RSC_WRD7_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ /**********************************************************/ /**********************************************************/ #define SQ_IMG_SAMP_WORD0_REG_SZ 32 #define SQ_IMG_SAMP_WORD0_CLAMP_X_SZ 3 #define SQ_IMG_SAMP_WORD0_CLAMP_Y_SZ 3 #define SQ_IMG_SAMP_WORD0_CLAMP_Z_SZ 3 #define SQ_IMG_SAMP_WORD0_MAX_ANISO_RATIO_SZ 3 #define SQ_IMG_SAMP_WORD0_DEPTH_COMPARE_FUNC_SZ 3 #define SQ_IMG_SAMP_WORD0_FORCE_UNNORMALIZED_SZ 1 #define SQ_IMG_SAMP_WORD0_ANISO_THRESHOLD_SZ 3 #define SQ_IMG_SAMP_WORD0_MC_COORD_TRUNC_SZ 1 #define SQ_IMG_SAMP_WORD0_FORCE_DEGAMMA_SZ 1 #define SQ_IMG_SAMP_WORD0_ANISO_BIAS_SZ 6 #define SQ_IMG_SAMP_WORD0_TRUNC_COORD_SZ 1 #define SQ_IMG_SAMP_WORD0_DISABLE_CUBE_WRAP_SZ 1 #define SQ_IMG_SAMP_WORD0_FILTER_MODE_SZ 2 #define SQ_IMG_SAMP_WORD0_SKIP_DEGAMMA_SZ 1 struct sq_img_samp_word0_t { #if defined(LITTLEENDIAN_CPU) unsigned int CLAMP_X : SQ_IMG_SAMP_WORD0_CLAMP_X_SZ; unsigned int CLAMP_Y : SQ_IMG_SAMP_WORD0_CLAMP_Y_SZ; unsigned int CLAMP_Z : SQ_IMG_SAMP_WORD0_CLAMP_Z_SZ; unsigned int MAX_ANISO_RATIO : SQ_IMG_SAMP_WORD0_MAX_ANISO_RATIO_SZ; unsigned int DEPTH_COMPARE_FUNC : SQ_IMG_SAMP_WORD0_DEPTH_COMPARE_FUNC_SZ; unsigned int FORCE_UNNORMALIZED : SQ_IMG_SAMP_WORD0_FORCE_UNNORMALIZED_SZ; unsigned int ANISO_THRESHOLD : SQ_IMG_SAMP_WORD0_ANISO_THRESHOLD_SZ; unsigned int MC_COORD_TRUNC : SQ_IMG_SAMP_WORD0_MC_COORD_TRUNC_SZ; unsigned int FORCE_DEGAMMA : SQ_IMG_SAMP_WORD0_FORCE_DEGAMMA_SZ; unsigned int ANISO_BIAS : SQ_IMG_SAMP_WORD0_ANISO_BIAS_SZ; unsigned int TRUNC_COORD : SQ_IMG_SAMP_WORD0_TRUNC_COORD_SZ; unsigned int DISABLE_CUBE_WRAP : SQ_IMG_SAMP_WORD0_DISABLE_CUBE_WRAP_SZ; unsigned int FILTER_MODE : SQ_IMG_SAMP_WORD0_FILTER_MODE_SZ; unsigned int SKIP_DEGAMMA : SQ_IMG_SAMP_WORD0_SKIP_DEGAMMA_SZ; #elif defined(BIGENDIAN_CPU) unsigned int SKIP_DEGAMMA : SQ_IMG_SAMP_WORD0_SKIP_DEGAMMA_SZ; unsigned int FILTER_MODE : SQ_IMG_SAMP_WORD0_FILTER_MODE_SZ; unsigned int DISABLE_CUBE_WRAP : SQ_IMG_SAMP_WORD0_DISABLE_CUBE_WRAP_SZ; unsigned int TRUNC_COORD : SQ_IMG_SAMP_WORD0_TRUNC_COORD_SZ; unsigned int ANISO_BIAS : SQ_IMG_SAMP_WORD0_ANISO_BIAS_SZ; unsigned int FORCE_DEGAMMA : SQ_IMG_SAMP_WORD0_FORCE_DEGAMMA_SZ; unsigned int MC_COORD_TRUNC : SQ_IMG_SAMP_WORD0_MC_COORD_TRUNC_SZ; unsigned int ANISO_THRESHOLD : SQ_IMG_SAMP_WORD0_ANISO_THRESHOLD_SZ; unsigned int FORCE_UNNORMALIZED : SQ_IMG_SAMP_WORD0_FORCE_UNNORMALIZED_SZ; unsigned int DEPTH_COMPARE_FUNC : SQ_IMG_SAMP_WORD0_DEPTH_COMPARE_FUNC_SZ; unsigned int MAX_ANISO_RATIO : SQ_IMG_SAMP_WORD0_MAX_ANISO_RATIO_SZ; unsigned int CLAMP_Z : SQ_IMG_SAMP_WORD0_CLAMP_Z_SZ; unsigned int CLAMP_Y : SQ_IMG_SAMP_WORD0_CLAMP_Y_SZ; unsigned int CLAMP_X : SQ_IMG_SAMP_WORD0_CLAMP_X_SZ; #endif }; union SQ_IMG_SAMP_WORD0 { sq_img_samp_word0_t bitfields, bits, f; uint32_t val : SQ_IMG_SAMP_WORD0_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_SAMP_WORD1_REG_SZ 32 #define SQ_IMG_SAMP_WORD1_MIN_LOD_SZ 12 #define SQ_IMG_SAMP_WORD1_MAX_LOD_SZ 12 #define SQ_IMG_SAMP_WORD1_PERF_MIP_SZ 4 #define SQ_IMG_SAMP_WORD1_PERF_Z_SZ 4 struct sq_img_samp_word1_t { #if defined(LITTLEENDIAN_CPU) unsigned int MIN_LOD : SQ_IMG_SAMP_WORD1_MIN_LOD_SZ; unsigned int MAX_LOD : SQ_IMG_SAMP_WORD1_MAX_LOD_SZ; unsigned int PERF_MIP : SQ_IMG_SAMP_WORD1_PERF_MIP_SZ; unsigned int PERF_Z : SQ_IMG_SAMP_WORD1_PERF_Z_SZ; #elif defined(BIGENDIAN_CPU) unsigned int PERF_Z : SQ_IMG_SAMP_WORD1_PERF_Z_SZ; unsigned int PERF_MIP : SQ_IMG_SAMP_WORD1_PERF_MIP_SZ; unsigned int MAX_LOD : SQ_IMG_SAMP_WORD1_MAX_LOD_SZ; unsigned int MIN_LOD : SQ_IMG_SAMP_WORD1_MIN_LOD_SZ; #endif }; union SQ_IMG_SAMP_WORD1 { sq_img_samp_word1_t bitfields, bits, f; uint32_t val : SQ_IMG_SAMP_WORD1_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_SAMP_WORD2_REG_SZ 32 #define SQ_IMG_SAMP_WORD2_BC_LRS_LB_SZ 12 #define SQ_IMG_SAMP_WORD2_BC_OR_BCT_SZ 2 #define SQ_IMG_SAMP_WORD2_LOD_BIAS_SEC_SZ 6 #define SQ_IMG_SAMP_WORD2_XY_MAG_FILTER_SZ 2 #define SQ_IMG_SAMP_WORD2_XY_MIN_FILTER_SZ 2 #define SQ_IMG_SAMP_WORD2_Z_FILTER_SZ 2 #define SQ_IMG_SAMP_WORD2_MIP_FILTER_SZ 2 #define SQ_IMG_SAMP_WORD2_MIP_POINT_PRECLAMP_SZ 1 #define SQ_IMG_SAMP_WORD2_ANISO_OVERRIDE_SZ 1 #define SQ_IMG_SAMP_WORD2_BLEND_ZERO_PRT_SZ 1 #define SQ_IMG_SAMP_WORD2_DERIV_ADJUST_ENABLE_SZ 1 struct sq_img_samp_word2_t { #if defined(LITTLEENDIAN_CPU) unsigned int BC_LRS_LB : SQ_IMG_SAMP_WORD2_BC_LRS_LB_SZ; unsigned int BC_OR_BCT : SQ_IMG_SAMP_WORD2_BC_OR_BCT_SZ; unsigned int LOD_BIAS_SEC : SQ_IMG_SAMP_WORD2_LOD_BIAS_SEC_SZ; unsigned int XY_MAG_FILTER : SQ_IMG_SAMP_WORD2_XY_MAG_FILTER_SZ; unsigned int XY_MIN_FILTER : SQ_IMG_SAMP_WORD2_XY_MIN_FILTER_SZ; unsigned int Z_FILTER : SQ_IMG_SAMP_WORD2_Z_FILTER_SZ; unsigned int MIP_FILTER : SQ_IMG_SAMP_WORD2_MIP_FILTER_SZ; unsigned int MIP_POINT_PRECLAMP : SQ_IMG_SAMP_WORD2_MIP_POINT_PRECLAMP_SZ; unsigned int ANISO_OVERRIDE : SQ_IMG_SAMP_WORD2_ANISO_OVERRIDE_SZ; unsigned int BLEND_ZERO_PRT : SQ_IMG_SAMP_WORD2_BLEND_ZERO_PRT_SZ; unsigned int DERIV_ADJUST_EN : SQ_IMG_SAMP_WORD2_DERIV_ADJUST_ENABLE_SZ; #elif defined(BIGENDIAN_CPU) unsigned int DERIV_ADJUST_EN : SQ_IMG_SAMP_WORD2_DERIV_ADJUST_ENABLE_SZ; unsigned int BLEND_ZERO_PRT : SQ_IMG_SAMP_WORD2_BLEND_ZERO_PRT_SZ; unsigned int ANISO_OVERRIDE : SQ_IMG_SAMP_WORD2_ANISO_OVERRIDE_SZ; unsigned int MIP_POINT_PRECLAMP : SQ_IMG_SAMP_WORD2_MIP_POINT_PRECLAMP_SZ; unsigned int MIP_FILTER : SQ_IMG_SAMP_WORD2_MIP_FILTER_SZ; unsigned int Z_FILTER : SQ_IMG_SAMP_WORD2_Z_FILTER_SZ; unsigned int XY_MIN_FILTER : SQ_IMG_SAMP_WORD2_XY_MIN_FILTER_SZ; unsigned int XY_MAG_FILTER : SQ_IMG_SAMP_WORD2_XY_MAG_FILTER_SZ; unsigned int LOD_BIAS_SEC : SQ_IMG_SAMP_WORD2_LOD_BIAS_SEC_SZ; unsigned int BC_OR_BCT : SQ_IMG_SAMP_WORD2_BC_OR_BCT_SZ; unsigned int LOD_BIAS : SQ_IMG_SAMP_WORD2_BC_LRS_LB_SZ; #endif }; union SQ_IMG_SAMP_WORD2 { sq_img_samp_word2_t bitfields, bits, f; uint32_t val : SQ_IMG_SAMP_WORD2_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ #define SQ_IMG_SAMP_WORD3_REG_SZ 32 #define SQ_IMG_SAMP_WORD3_BCP_LRS_DAV_SZ 12 #define SQ_IMG_SAMP_WORD3_GRAD_ADJ_OR_DAV_SZ 16 #define SQ_IMG_SAMP_WORD3_RES_OR_DAV_SZ 2 #define SQ_IMG_SAMP_WORD3_BORD_COLOR_TYPE_SZ 2 struct sq_img_samp_word3_t { #if defined(LITTLEENDIAN_CPU) unsigned int BCP_LRS_DAV : SQ_IMG_SAMP_WORD3_BCP_LRS_DAV_SZ; unsigned int GRAD_ADJ_OR_DAV : SQ_IMG_SAMP_WORD3_GRAD_ADJ_OR_DAV_SZ; unsigned int RES_OR_DAV : SQ_IMG_SAMP_WORD3_RES_OR_DAV_SZ; unsigned int BORDER_COLOR_TYPE : SQ_IMG_SAMP_WORD3_BORD_COLOR_TYPE_SZ; #elif defined(BIGENDIAN_CPU) unsigned int BORDER_COLOR_TYPE : SQ_IMG_SAMP_WORD3_BORD_COLOR_TYPE_SZ; unsigned int RES_OR_DAV : SQ_IMG_SAMP_WORD3_RES_OR_DAV_SZ; unsigned int GRAD_ADJ_OR_DAV : SQ_IMG_SAMP_WORD3_GRAD_ADJ_OR_DAV_SZ; unsigned int BCP_LRS_DAV : SQ_IMG_SAMP_WORD3_BCP_LRS_DAV_SZ; #endif }; union SQ_IMG_SAMP_WORD3 { sq_img_samp_word3_t bitfields, bits, f; uint32_t val : SQ_IMG_SAMP_WORD3_REG_SZ; uint32_t u32All; int32_t i32All; float f32All; }; /***********/ /**************************************************************/ /**************************************************************/ /**************************************************************/ typedef enum FMT { FMT_INVALID = 0x00000000, FMT_8 = 0x00000001, FMT_16 = 0x00000002, FMT_8_8 = 0x00000003, FMT_32 = 0x00000004, FMT_16_16 = 0x00000005, FMT_10_11_11 = 0x00000006, FMT_11_11_10 = 0x00000007, FMT_10_10_10_2 = 0x00000008, FMT_2_10_10_10 = 0x00000009, FMT_8_8_8_8 = 0x0000000a, FMT_32_32 = 0x0000000b, FMT_16_16_16_16 = 0x0000000c, FMT_32_32_32 = 0x0000000d, FMT_32_32_32_32 = 0x0000000e, FMT_RESERVED_78 = 0x0000000f, FMT_5_6_5 = 0x00000010, FMT_1_5_5_5 = 0x00000011, FMT_5_5_5_1 = 0x00000012, FMT_4_4_4_4 = 0x00000013, FMT_8_24 = 0x00000014, FMT_24_8 = 0x00000015, FMT_X24_8_32 = 0x00000016, FMT_RESERVED_155 = 0x00000017, FMT_1 = 0x00000018, FMT_1_REVERSED = 0x00000019, FMT_GB_GR = 0x0000001a, FMT_BG_RG = 0x0000001b, FMT_4_4 = 0x0000001c, FMT_BC1 = 0x0000001d, FMT_BC2 = 0x0000001e, FMT_BC3 = 0x0000001f, FMT_BC4 = 0x00000020, FMT_BC5 = 0x00000021, FMT_BC6 = 0x00000022, FMT_BC7 = 0x00000023, FMT_6E4 = 0x00000024, FMT_5_9_9_9 = 0x00000025, FMT_FMASK8_S2 = 0x00000026, FMT_FMASK8_S4 = 0x00000027, FMT_FMASK8_S8 = 0x00000028, FMT_FMASK16_S16 = 0x00000029, FMT_FMASK16_S8 = 0x0000002a, FMT_FMASK32_S16 = 0x0000002b, FMT_FMASK32_S8 = 0x0000002c, FMT_FMASK64_S16 = 0x0000002d, FMT_ETC2_RGB = 0x0000002e, FMT_ETC2_RGBA = 0x0000002f, FMT_ETC2_R = 0x00000030, FMT_ETC2_RG = 0x00000031, FMT_ETC2_RGBA1 = 0x00000032, FMT_ASTC_2D_LDR = 0x00000033, FMT_ASTC_2D_HDR = 0x00000034, FMT_ASTC_2D_LDR_SRGB = 0x00000035, FMT_ASTC_3D_LDR = 0x00000036, FMT_ASTC_3D_HDR = 0x00000037, FMT_ASTC_3D_LDR_SRGB = 0x00000038, FMT_MM_8 = 0x00000039, FMT_MM_8_8 = 0x0000003a, FMT_MM_8_8_8_8 = 0x0000003b, FMT_MM_VYUY8 = 0x0000003c, FMT_MM_10_11_11 = 0x0000003d, FMT_MM_2_10_10_10 = 0x0000003e, FMT_MM_16_16_16_16 = 0x0000003f, FMT_10_IN_16 = 0x00000040, FMT_10_IN_16_16 = 0x00000041, FMT_10_IN_16_16_16_16 = 0x00000042, FMT_7E3 = 0x00000043, FMT_YCBCR = 0x00000044, } FMT; typedef enum type { TYPE_UNORM = 0x00000000, TYPE_SNORM = 0x00000001, TYPE_USCALED = 0x00000002, TYPE_SSCALED = 0x00000003, TYPE_UINT = 0x00000004, TYPE_SINT = 0x00000005, TYPE_RESERVED_6 = 0x00000006, TYPE_FLOAT = 0x00000007, TYPE_RESERVED_8 = 0x00000008, TYPE_SRGB = 0x00000009, TYPE_UNORM_UINT = 0x0000000a, TYPE_REVERSED_UNORM = 0x0000000b, TYPE_FLOAT_CLAMP = 0x0000000c, TYPE_F1 = 0x0000000d, TYPE_F2 = 0x0000000e, TYPE_F4 = 0x0000000f, TYPE_F8 = 0x00000010, TYPE_4X4 = 0x00000011, TYPE_5X4 = 0x00000012, TYPE_5X5 = 0x00000013, TYPE_6X5 = 0x00000014, TYPE_6X6 = 0x00000015, TYPE_8X5 = 0x00000016, TYPE_8X6 = 0x00000017, TYPE_8X8 = 0x00000018, TYPE_10X5 = 0x00000019, TYPE_10X6 = 0x0000001a, TYPE_10X8 = 0x0000001b, TYPE_10X10 = 0x0000001c, TYPE_12X10 = 0x0000001d, TYPE_12X12 = 0x0000001e, TYPE_3X3X3 = 0x0000001f, TYPE_4X4X3 = 0x00000020, TYPE_4X4X4 = 0x00000021, TYPE_5X4X4 = 0x00000022, TYPE_5X5X4 = 0x00000023, TYPE_6X5X5 = 0x00000024, TYPE_6X6X6 = 0x00000025 } type; enum FORMAT { CFMT_INVALID = 0, CFMT_8_UNORM = 1, CFMT_8_SNORM = 2, CFMT_8_UINT = 5, CFMT_8_SINT = 6, CFMT_16_UNORM = 7, CFMT_16_SNORM = 8, CFMT_16_UINT = 11, CFMT_16_SINT = 12, CFMT_16_FLOAT = 13, CFMT_8_8_UNORM = 14, CFMT_8_8_SNORM = 15, CFMT_8_8_UINT = 18, CFMT_8_8_SINT = 19, CFMT_32_UINT = 20, CFMT_32_SINT = 21, CFMT_32_FLOAT = 22, CFMT_16_16_UNORM = 23, CFMT_16_16_SNORM = 24, CFMT_16_16_UINT = 27, CFMT_16_16_SINT = 28, CFMT_16_16_FLOAT = 29, CFMT_10_10_10_2_UNORM = 44, CFMT_10_10_10_2_SNORM = 45, CFMT_10_10_10_2_UINT = 48, CFMT_10_10_10_2_SINT = 49, CFMT_2_10_10_10_UNORM = 50, CFMT_2_10_10_10_SNORM = 51, CFMT_2_10_10_10_UINT = 54, CFMT_2_10_10_10_SINT = 55, CFMT_8_8_8_8_UNORM = 56, CFMT_8_8_8_8_SNORM = 57, CFMT_8_8_8_8_UINT = 60, CFMT_8_8_8_8_SINT = 61, CFMT_32_32_UINT = 62, CFMT_32_32_SINT = 63, CFMT_32_32_FLOAT = 64, CFMT_16_16_16_16_UNORM = 65, CFMT_16_16_16_16_SNORM = 66, CFMT_16_16_16_16_UINT = 69, CFMT_16_16_16_16_SINT = 70, CFMT_16_16_16_16_FLOAT = 71, CFMT_32_32_32_UINT = 72, CFMT_32_32_32_SINT = 73, CFMT_32_32_32_FLOAT = 74, CFMT_32_32_32_32_UINT = 75, CFMT_32_32_32_32_SINT = 76, CFMT_32_32_32_32_FLOAT = 77, CFMT_8_SRGB = 128, CFMT_8_8_SRGB = 129, CFMT_8_8_8_8_SRGB = 130, CFMT_5_6_5_UNORM = 133, CFMT_1_5_5_5_UNORM = 134, CFMT_5_5_5_1_UNORM = 135, CFMT_8_24_UNORM = 141, CFMT_8_24_UINT = 142, CFMT_24_8_UNORM = 143, CFMT_24_8_UINT = 144 }; typedef enum SEL { SEL_0 = 0x00000000, SEL_1 = 0x00000001, SEL_X = 0x00000004, SEL_Y = 0x00000005, SEL_Z = 0x00000006, SEL_W = 0x00000007, } SEL; typedef enum SQ_RSRC_IMG_TYPE { SQ_RSRC_IMG_1D = 0x00000008, SQ_RSRC_IMG_2D = 0x00000009, SQ_RSRC_IMG_3D = 0x0000000a, SQ_RSRC_IMG_CUBE_ARRAY = 0x0000000b, SQ_RSRC_IMG_1D_ARRAY = 0x0000000c, SQ_RSRC_IMG_2D_ARRAY = 0x0000000d, SQ_RSRC_IMG_2D_MSAA = 0x0000000e, SQ_RSRC_IMG_2D_MSAA_ARRAY = 0x0000000f, } SQ_RSRC_IMG_TYPE; typedef enum SQ_TEX_XY_FILTER { SQ_TEX_XY_FILTER_POINT = 0x00000000, SQ_TEX_XY_FILTER_BILINEAR = 0x00000001, SQ_TEX_XY_FILTER_ANISO_POINT = 0x00000002, SQ_TEX_XY_FILTER_ANISO_BILINEAR = 0x00000003, } SQ_TEX_XY_FILTER; typedef enum SQ_TEX_Z_FILTER { SQ_TEX_Z_FILTER_NONE = 0x00000000, SQ_TEX_Z_FILTER_POINT = 0x00000001, SQ_TEX_Z_FILTER_LINEAR = 0x00000002, } SQ_TEX_Z_FILTER; typedef enum SQ_TEX_MIP_FILTER { SQ_TEX_MIP_FILTER_NONE = 0x00000000, SQ_TEX_MIP_FILTER_POINT = 0x00000001, SQ_TEX_MIP_FILTER_LINEAR = 0x00000002, SQ_TEX_MIP_FILTER_POINT_ANISO_ADJ__VI = 0x00000003, } SQ_TEX_MIP_FILTER; typedef enum SQ_TEX_CLAMP { SQ_TEX_WRAP = 0x00000000, SQ_TEX_MIRROR = 0x00000001, SQ_TEX_CLAMP_LAST_TEXEL = 0x00000002, SQ_TEX_MIRROR_ONCE_LAST_TEXEL = 0x00000003, SQ_TEX_CLAMP_HALF_BORDER = 0x00000004, SQ_TEX_MIRROR_ONCE_HALF_BORDER = 0x00000005, SQ_TEX_CLAMP_BORDER = 0x00000006, SQ_TEX_MIRROR_ONCE_BORDER = 0x00000007, } SQ_TEX_CLAMP; typedef enum SQ_TEX_BORDER_COLOR { SQ_TEX_BORDER_COLOR_TRANS_BLACK = 0x00000000, SQ_TEX_BORDER_COLOR_OPAQUE_BLACK = 0x00000001, SQ_TEX_BORDER_COLOR_OPAQUE_WHITE = 0x00000002, SQ_TEX_BORDER_COLOR_REGISTER = 0x00000003, } SQ_TEX_BORDER_COLOR; typedef enum TEX_BC_SWIZZLE { TEX_BC_Swizzle_XYZW = 0x00000000, TEX_BC_Swizzle_XWYZ = 0x00000001, TEX_BC_Swizzle_WZYX = 0x00000002, TEX_BC_Swizzle_WXYZ = 0x00000003, TEX_BC_Swizzle_ZYXW = 0x00000004, TEX_BC_Swizzle_YXWZ = 0x00000005, } TEX_BC_SWIZZLE; typedef struct metadata_amd_nv_s { uint32_t version; // Must be 1 uint32_t vendorID; // AMD SQ_IMG_RSRC_WORD0 word0; SQ_IMG_RSRC_WORD1 word1; SQ_IMG_RSRC_WORD2 word2; SQ_IMG_RSRC_WORD3 word3; SQ_IMG_RSRC_WORD4 word4; SQ_IMG_RSRC_WORD5 word5; SQ_IMG_RSRC_WORD6 word6; SQ_IMG_RSRC_WORD7 word7; uint32_t mip_offsets[0]; } metadata_amd_nv_t; } // namespace image } // namespace rocr #endif // EXT_IMAGE_RESOURCE_NV_H_ ROCR-Runtime-rocm-5.0.0/src/image/util.h000066400000000000000000000271471420110115200176350ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_EXT_IMAGE_UTIL_H #define HSA_RUNTIME_EXT_IMAGE_UTIL_H #include "stdint.h" #include "stddef.h" #include "stdlib.h" #include #include #include #include #include "inc/hsa.h" namespace rocr { namespace image { #if defined(_MSC_VER) #define ALIGNED_(x) __declspec(align(x)) #else #if defined(__GNUC__) #define ALIGNED_(x) __attribute__((aligned(x))) #endif // __GNUC__ #endif // _MSC_VER #define MULTILINE(...) # __VA_ARGS__ } // namespace image } // namespace rocr #if defined(__GNUC__) #include "mm_malloc.h" #if defined(__i386__) || defined(__x86_64__) #include #else #error \ "Processor not identified. " \ "Need to provide a lightweight approximate clock interface (aka __rdtsc())." #endif namespace rocr { namespace image { #define __forceinline __inline__ __attribute__((always_inline)) static __forceinline void __debugbreak() { __builtin_trap(); } #define __declspec(x) __attribute__((x)) #undef __stdcall #define __stdcall // __attribute__((__stdcall__)) #define __ALIGNED__(x) __attribute__((aligned(x))) static __forceinline void* _aligned_malloc(size_t size, size_t alignment) { #ifdef _ISOC11_SOURCE return aligned_alloc(alignment, size); #else void* mem = NULL; if (NULL != posix_memalign(&mem, alignment, size)) return NULL; return mem; #endif } static __forceinline void _aligned_free(void* ptr) { return free(ptr); } #elif defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64)) #include "intrin.h" #define __ALIGNED__(x) __declspec(align(x)) namespace rocr { namespace image { #else #error "Compiler and/or processor not identified." #endif // A macro to disallow the copy and move constructor and operator= functions #define DISALLOW_COPY_AND_ASSIGN(TypeName) \ TypeName(const TypeName&) = delete; \ TypeName(TypeName&&) = delete; \ void operator=(const TypeName&) = delete; \ void operator=(TypeName&&) = delete; template class ScopeGuard { public: explicit __forceinline ScopeGuard(const lambda& release) : release_(release), dismiss_(false) {} ScopeGuard(ScopeGuard& rhs) { *this = rhs; } __forceinline ~ScopeGuard() { if (!dismiss_) release_(); } __forceinline ScopeGuard& operator=(ScopeGuard& rhs) { dismiss_ = rhs.dismiss_; release_ = rhs.release_; rhs.dismiss_ = true; return *this; } __forceinline void Dismiss() { dismiss_ = true; } private: lambda release_; bool dismiss_; }; template static __forceinline ScopeGuard MakeScopeGuard(lambda rel) { return ScopeGuard(rel); } #define MAKE_SCOPE_GUARD_HELPER(lname, sname, ...) \ auto lname = __VA_ARGS__; \ ScopeGuard sname(lname); #define MAKE_SCOPE_GUARD(...) \ MAKE_SCOPE_GUARD_HELPER(PASTE(scopeGuardLambda, __COUNTER__), PASTE(scopeGuard, __COUNTER__), \ __VA_ARGS__) #define MAKE_NAMED_SCOPE_GUARD(name, ...) \ MAKE_SCOPE_GUARD_HELPER(PASTE(scopeGuardLambda, __COUNTER__), name, __VA_ARGS__) /// @brief: Finds out the min one of two inputs, input must support ">" /// operator. /// @param: a(Input), a reference to type T. /// @param: b(Input), a reference to type T. /// @return: T. template static __forceinline T Min(const T& a, const T& b) { return (a > b) ? b : a; } template static __forceinline T Min(const T& a, const T& b, Arg... args) { return Min(a, Min(b, args...)); } /// @brief: Find out the max one of two inputs, input must support ">" operator. /// @param: a(Input), a reference to type T. /// @param: b(Input), a reference to type T. /// @return: T. template static __forceinline T Max(const T& a, const T& b) { return (b > a) ? b : a; } template static __forceinline T Max(const T& a, const T& b, Arg... args) { return Max(a, Max(b, args...)); } /// @brief: Free the memory space which is newed previously. /// @param: ptr(Input), a pointer to memory space. Can't be NULL. /// @return: void. struct DeleteObject { template void operator()(const T* ptr) const { delete ptr; } }; /// @brief: Checks if a value is power of two, if it is, return true. Be careful /// when passing 0. /// @param: val(Input), the data to be checked. /// @return: bool. template static __forceinline bool IsPowerOfTwo(T val) { return (val & (val - 1)) == 0; } /// @brief: Calculates the floor value aligned based on parameter of alignment. /// If value is at the boundary of alignment, it is unchanged. /// @param: value(Input), value to be calculated. /// @param: alignment(Input), alignment value. /// @return: T. template static __forceinline T AlignDown(T value, size_t alignment) { assert(IsPowerOfTwo(alignment)); return (T)(value & ~(alignment - 1)); } /// @brief: Same as previous one, but first parameter becomes pointer, for more /// info, see the previous desciption. /// @param: value(Input), pointer to type T. /// @param: alignment(Input), alignment value. /// @return: T*, pointer to type T. template static __forceinline T* AlignDown(T* value, size_t alignment) { return (T*)AlignDown((intptr_t)value, alignment); } /// @brief: Calculates the ceiling value aligned based on parameter of /// alignment. /// If value is at the boundary of alignment, it is unchanged. /// @param: value(Input), value to be calculated. /// @param: alignment(Input), alignment value. /// @param: T. template static __forceinline T AlignUp(T value, size_t alignment) { return AlignDown((T)(value + alignment - 1), alignment); } /// @brief: Same as previous one, but first parameter becomes pointer, for more /// info, see the previous desciption. /// @param: value(Input), pointer to type T. /// @param: alignment(Input), alignment value. /// @return: T*, pointer to type T. template static __forceinline T* AlignUp(T* value, size_t alignment) { return (T*)AlignDown((intptr_t)((uint8_t*)value + alignment - 1), alignment); } /// @brief: Checks if the input value is at the boundary of alignment, if it is, /// @return true. /// @param: value(Input), value to be checked. /// @param: alignment(Input), alignment value. /// @return: bool. template static __forceinline bool IsMultipleOf(T value, size_t alignment) { return (AlignUp(value, alignment) == value); } /// @brief: Same as previous one, but first parameter becomes pointer, for more /// info, see the previous desciption. /// @param: value(Input), pointer to type T. /// @param: alignment(Input), alignment value. /// @return: bool. template static __forceinline bool IsMultipleOf(T* value, size_t alignment) { return (AlignUp(value, alignment) == value); } static __forceinline uint32_t NextPow2(uint32_t value) { if (value == 0) return 1; uint32_t v = value - 1; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; return v + 1; } static __forceinline uint64_t NextPow2(uint64_t value) { if (value == 0) return 1; uint64_t v = value - 1; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; v |= v >> 32; return v + 1; } static __forceinline bool strIsEmpty(const char* str) noexcept { return str[0] == '\0'; } static __forceinline std::string& ltrim(std::string& s) { auto it = std::find_if(s.begin(), s.end(), [](char c) { return !std::isspace(c, std::locale::classic()); }); s.erase(s.begin(), it); return s; } static __forceinline std::string& rtrim(std::string& s) { auto it = std::find_if(s.rbegin(), s.rend(), [](char c) { return !std::isspace(c, std::locale::classic()); }); s.erase(it.base(), s.end()); return s; } static __forceinline std::string& trim(std::string& s) { return ltrim(rtrim(s)); } template static __forceinline uint32_t BitSelect(T p) { static_assert(sizeof(T) <= sizeof(uintptr_t), "Type out of range."); static_assert(highBit < sizeof(uintptr_t)*8, "Bit index out of range."); uintptr_t ptr = p; if(highBit != (sizeof(uintptr_t)*8-1)) return (uint32_t)((ptr & ((1ull<<(highBit+1))-1)) >> lowBit); else return (uint32_t)(ptr >> lowBit); } inline uint32_t PtrLow16Shift8(const void* p) { uintptr_t ptr = reinterpret_cast(p); return (uint32_t)((ptr & 0xFFFFULL) >> 8); } inline uint32_t PtrHigh64Shift16(const void* p) { uintptr_t ptr = reinterpret_cast(p); return (uint32_t)((ptr & 0xFFFFFFFFFFFF0000ULL) >> 16); } inline uint32_t PtrLow40Shift8(const void* p) { uintptr_t ptr = reinterpret_cast(p); return (uint32_t)((ptr & 0xFFFFFFFFFFULL) >> 8); } inline uint32_t PtrHigh64Shift40(const void* p) { uintptr_t ptr = reinterpret_cast(p); return (uint32_t)((ptr & 0xFFFFFF0000000000ULL) >> 40); } inline uint32_t PtrLow32(const void* p) { return static_cast(reinterpret_cast(p)); } inline uint32_t PtrHigh32(const void* p) { uint32_t ptr = 0; #ifdef HSA_LARGE_MODEL ptr = static_cast(reinterpret_cast(p) >> 32); #endif return ptr; } } // namespace image } // namespace rocr #endif // HSA_RUNTIME_EXT_IMAGE_UTIL_H ROCR-Runtime-rocm-5.0.0/src/inc/000077500000000000000000000000001420110115200161635ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/inc/Brig.h000066400000000000000000000777011420110115200172330ustar00rootroot00000000000000// University of Illinois/NCSA // Open Source License // // Copyright (c) 2013-2015, Advanced Micro Devices, Inc. // All rights reserved. // // Developed by: // // HSA Team // // Advanced Micro Devices, Inc // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy of // this software and associated documentation files (the "Software"), to deal with // the Software without restriction, including without limitation the rights to // use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies // of the Software, and to permit persons to whom the Software is furnished to do // so, subject to the following conditions: // // * Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimers in the // documentation and/or other materials provided with the distribution. // // * Neither the names of the LLVM Team, University of Illinois at // Urbana-Champaign, nor the names of its contributors may be used to // endorse or promote products derived from this Software without specific // prior written permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS // FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE // SOFTWARE. #ifndef INCLUDED_BRIG_H #define INCLUDED_BRIG_H #include /* size_t */ #include /* uintXX_t */ #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ /*========================================================================================*/ /* =======================================================================================*/ /* =======================================================================================*/ /* =======================================================================================*/ typedef uint32_t BrigCodeOffset32_t; typedef uint32_t BrigOperandOffset32_t; typedef uint32_t BrigDataOffset32_t; typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t; typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t; typedef BrigDataOffset32_t BrigDataOffsetString32_t; typedef uint32_t BrigVersion32_t; enum BrigVersion { BRIG_VERSION_HSAIL_MAJOR = 1, BRIG_VERSION_HSAIL_MINOR = 0, BRIG_VERSION_BRIG_MAJOR = 1, BRIG_VERSION_BRIG_MINOR = 0 }; typedef uint16_t BrigKind16_t; enum BrigKind { BRIG_KIND_NONE = 0x0000, BRIG_KIND_DIRECTIVE_BEGIN = 0x1000, BRIG_KIND_DIRECTIVE_ARG_BLOCK_END = 0x1000, BRIG_KIND_DIRECTIVE_ARG_BLOCK_START = 0x1001, BRIG_KIND_DIRECTIVE_COMMENT = 0x1002, BRIG_KIND_DIRECTIVE_CONTROL = 0x1003, BRIG_KIND_DIRECTIVE_EXTENSION = 0x1004, BRIG_KIND_DIRECTIVE_FBARRIER = 0x1005, BRIG_KIND_DIRECTIVE_FUNCTION = 0x1006, BRIG_KIND_DIRECTIVE_INDIRECT_FUNCTION = 0x1007, BRIG_KIND_DIRECTIVE_KERNEL = 0x1008, BRIG_KIND_DIRECTIVE_LABEL = 0x1009, BRIG_KIND_DIRECTIVE_LOC = 0x100a, BRIG_KIND_DIRECTIVE_MODULE = 0x100b, BRIG_KIND_DIRECTIVE_PRAGMA = 0x100c, BRIG_KIND_DIRECTIVE_SIGNATURE = 0x100d, BRIG_KIND_DIRECTIVE_VARIABLE = 0x100e, BRIG_KIND_DIRECTIVE_END = 0x100f, BRIG_KIND_INST_BEGIN = 0x2000, BRIG_KIND_INST_ADDR = 0x2000, BRIG_KIND_INST_ATOMIC = 0x2001, BRIG_KIND_INST_BASIC = 0x2002, BRIG_KIND_INST_BR = 0x2003, BRIG_KIND_INST_CMP = 0x2004, BRIG_KIND_INST_CVT = 0x2005, BRIG_KIND_INST_IMAGE = 0x2006, BRIG_KIND_INST_LANE = 0x2007, BRIG_KIND_INST_MEM = 0x2008, BRIG_KIND_INST_MEM_FENCE = 0x2009, BRIG_KIND_INST_MOD = 0x200a, BRIG_KIND_INST_QUERY_IMAGE = 0x200b, BRIG_KIND_INST_QUERY_SAMPLER = 0x200c, BRIG_KIND_INST_QUEUE = 0x200d, BRIG_KIND_INST_SEG = 0x200e, BRIG_KIND_INST_SEG_CVT = 0x200f, BRIG_KIND_INST_SIGNAL = 0x2010, BRIG_KIND_INST_SOURCE_TYPE = 0x2011, BRIG_KIND_INST_END = 0x2012, BRIG_KIND_OPERAND_BEGIN = 0x3000, BRIG_KIND_OPERAND_ADDRESS = 0x3000, BRIG_KIND_OPERAND_ALIGN = 0x3001, BRIG_KIND_OPERAND_CODE_LIST = 0x3002, BRIG_KIND_OPERAND_CODE_REF = 0x3003, BRIG_KIND_OPERAND_CONSTANT_BYTES = 0x3004, BRIG_KIND_OPERAND_RESERVED = 0x3005, BRIG_KIND_OPERAND_CONSTANT_IMAGE = 0x3006, BRIG_KIND_OPERAND_CONSTANT_OPERAND_LIST = 0x3007, BRIG_KIND_OPERAND_CONSTANT_SAMPLER = 0x3008, BRIG_KIND_OPERAND_OPERAND_LIST = 0x3009, BRIG_KIND_OPERAND_REGISTER = 0x300a, BRIG_KIND_OPERAND_STRING = 0x300b, BRIG_KIND_OPERAND_WAVESIZE = 0x300c, BRIG_KIND_OPERAND_END = 0x300d }; typedef uint8_t BrigAlignment8_t; enum BrigAlignment { BRIG_ALIGNMENT_NONE = 0, BRIG_ALIGNMENT_1 = 1, BRIG_ALIGNMENT_2 = 2, BRIG_ALIGNMENT_4 = 3, BRIG_ALIGNMENT_8 = 4, BRIG_ALIGNMENT_16 = 5, BRIG_ALIGNMENT_32 = 6, BRIG_ALIGNMENT_64 = 7, BRIG_ALIGNMENT_128 = 8, BRIG_ALIGNMENT_256 = 9, BRIG_ALIGNMENT_MAX = BRIG_ALIGNMENT_256 }; typedef uint8_t BrigAllocation8_t; enum BrigAllocation { BRIG_ALLOCATION_NONE = 0, BRIG_ALLOCATION_PROGRAM = 1, BRIG_ALLOCATION_AGENT = 2, BRIG_ALLOCATION_AUTOMATIC = 3 }; typedef uint8_t BrigAluModifier8_t; enum BrigAluModifierMask { BRIG_ALU_FTZ = 1 }; typedef uint8_t BrigAtomicOperation8_t; enum BrigAtomicOperation { BRIG_ATOMIC_ADD = 0, BRIG_ATOMIC_AND = 1, BRIG_ATOMIC_CAS = 2, BRIG_ATOMIC_EXCH = 3, BRIG_ATOMIC_LD = 4, BRIG_ATOMIC_MAX = 5, BRIG_ATOMIC_MIN = 6, BRIG_ATOMIC_OR = 7, BRIG_ATOMIC_ST = 8, BRIG_ATOMIC_SUB = 9, BRIG_ATOMIC_WRAPDEC = 10, BRIG_ATOMIC_WRAPINC = 11, BRIG_ATOMIC_XOR = 12, BRIG_ATOMIC_WAIT_EQ = 13, BRIG_ATOMIC_WAIT_NE = 14, BRIG_ATOMIC_WAIT_LT = 15, BRIG_ATOMIC_WAIT_GTE = 16, BRIG_ATOMIC_WAITTIMEOUT_EQ = 17, BRIG_ATOMIC_WAITTIMEOUT_NE = 18, BRIG_ATOMIC_WAITTIMEOUT_LT = 19, BRIG_ATOMIC_WAITTIMEOUT_GTE = 20 }; typedef uint8_t BrigCompareOperation8_t; enum BrigCompareOperation { BRIG_COMPARE_EQ = 0, BRIG_COMPARE_NE = 1, BRIG_COMPARE_LT = 2, BRIG_COMPARE_LE = 3, BRIG_COMPARE_GT = 4, BRIG_COMPARE_GE = 5, BRIG_COMPARE_EQU = 6, BRIG_COMPARE_NEU = 7, BRIG_COMPARE_LTU = 8, BRIG_COMPARE_LEU = 9, BRIG_COMPARE_GTU = 10, BRIG_COMPARE_GEU = 11, BRIG_COMPARE_NUM = 12, BRIG_COMPARE_NAN = 13, BRIG_COMPARE_SEQ = 14, BRIG_COMPARE_SNE = 15, BRIG_COMPARE_SLT = 16, BRIG_COMPARE_SLE = 17, BRIG_COMPARE_SGT = 18, BRIG_COMPARE_SGE = 19, BRIG_COMPARE_SGEU = 20, BRIG_COMPARE_SEQU = 21, BRIG_COMPARE_SNEU = 22, BRIG_COMPARE_SLTU = 23, BRIG_COMPARE_SLEU = 24, BRIG_COMPARE_SNUM = 25, BRIG_COMPARE_SNAN = 26, BRIG_COMPARE_SGTU = 27 }; typedef uint16_t BrigControlDirective16_t; enum BrigControlDirective { BRIG_CONTROL_NONE = 0, BRIG_CONTROL_ENABLEBREAKEXCEPTIONS = 1, BRIG_CONTROL_ENABLEDETECTEXCEPTIONS = 2, BRIG_CONTROL_MAXDYNAMICGROUPSIZE = 3, BRIG_CONTROL_MAXFLATGRIDSIZE = 4, BRIG_CONTROL_MAXFLATWORKGROUPSIZE = 5, BRIG_CONTROL_REQUIREDDIM = 6, BRIG_CONTROL_REQUIREDGRIDSIZE = 7, BRIG_CONTROL_REQUIREDWORKGROUPSIZE = 8, BRIG_CONTROL_REQUIRENOPARTIALWORKGROUPS = 9 }; typedef uint8_t BrigExecutableModifier8_t; enum BrigExecutableModifierMask { BRIG_EXECUTABLE_DEFINITION = 1 }; typedef uint8_t BrigImageChannelOrder8_t; enum BrigImageChannelOrder { BRIG_CHANNEL_ORDER_A = 0, BRIG_CHANNEL_ORDER_R = 1, BRIG_CHANNEL_ORDER_RX = 2, BRIG_CHANNEL_ORDER_RG = 3, BRIG_CHANNEL_ORDER_RGX = 4, BRIG_CHANNEL_ORDER_RA = 5, BRIG_CHANNEL_ORDER_RGB = 6, BRIG_CHANNEL_ORDER_RGBX = 7, BRIG_CHANNEL_ORDER_RGBA = 8, BRIG_CHANNEL_ORDER_BGRA = 9, BRIG_CHANNEL_ORDER_ARGB = 10, BRIG_CHANNEL_ORDER_ABGR = 11, BRIG_CHANNEL_ORDER_SRGB = 12, BRIG_CHANNEL_ORDER_SRGBX = 13, BRIG_CHANNEL_ORDER_SRGBA = 14, BRIG_CHANNEL_ORDER_SBGRA = 15, BRIG_CHANNEL_ORDER_INTENSITY = 16, BRIG_CHANNEL_ORDER_LUMINANCE = 17, BRIG_CHANNEL_ORDER_DEPTH = 18, BRIG_CHANNEL_ORDER_DEPTH_STENCIL = 19, BRIG_CHANNEL_ORDER_FIRST_USER_DEFINED = 128 }; typedef uint8_t BrigImageChannelType8_t; enum BrigImageChannelType { BRIG_CHANNEL_TYPE_SNORM_INT8 = 0, BRIG_CHANNEL_TYPE_SNORM_INT16 = 1, BRIG_CHANNEL_TYPE_UNORM_INT8 = 2, BRIG_CHANNEL_TYPE_UNORM_INT16 = 3, BRIG_CHANNEL_TYPE_UNORM_INT24 = 4, BRIG_CHANNEL_TYPE_UNORM_SHORT_555 = 5, BRIG_CHANNEL_TYPE_UNORM_SHORT_565 = 6, BRIG_CHANNEL_TYPE_UNORM_INT_101010 = 7, BRIG_CHANNEL_TYPE_SIGNED_INT8 = 8, BRIG_CHANNEL_TYPE_SIGNED_INT16 = 9, BRIG_CHANNEL_TYPE_SIGNED_INT32 = 10, BRIG_CHANNEL_TYPE_UNSIGNED_INT8 = 11, BRIG_CHANNEL_TYPE_UNSIGNED_INT16 = 12, BRIG_CHANNEL_TYPE_UNSIGNED_INT32 = 13, BRIG_CHANNEL_TYPE_HALF_FLOAT = 14, BRIG_CHANNEL_TYPE_FLOAT = 15, BRIG_CHANNEL_TYPE_FIRST_USER_DEFINED = 128 }; typedef uint8_t BrigImageGeometry8_t; enum BrigImageGeometry { BRIG_GEOMETRY_1D = 0, BRIG_GEOMETRY_2D = 1, BRIG_GEOMETRY_3D = 2, BRIG_GEOMETRY_1DA = 3, BRIG_GEOMETRY_2DA = 4, BRIG_GEOMETRY_1DB = 5, BRIG_GEOMETRY_2DDEPTH = 6, BRIG_GEOMETRY_2DADEPTH = 7, BRIG_GEOMETRY_FIRST_USER_DEFINED = 128 }; typedef uint8_t BrigImageQuery8_t; enum BrigImageQuery { BRIG_IMAGE_QUERY_WIDTH = 0, BRIG_IMAGE_QUERY_HEIGHT = 1, BRIG_IMAGE_QUERY_DEPTH = 2, BRIG_IMAGE_QUERY_ARRAY = 3, BRIG_IMAGE_QUERY_CHANNELORDER = 4, BRIG_IMAGE_QUERY_CHANNELTYPE = 5, BRIG_IMAGE_QUERY_FIRST_USER_DEFINED = 6 }; typedef uint8_t BrigLinkage8_t; enum BrigLinkage { BRIG_LINKAGE_NONE = 0, BRIG_LINKAGE_PROGRAM = 1, BRIG_LINKAGE_MODULE = 2, BRIG_LINKAGE_FUNCTION = 3, BRIG_LINKAGE_ARG = 4 }; typedef uint8_t BrigMachineModel8_t; enum BrigMachineModel { BRIG_MACHINE_SMALL = 0, BRIG_MACHINE_LARGE = 1, }; typedef uint8_t BrigMemoryModifier8_t; enum BrigMemoryModifierMask { BRIG_MEMORY_CONST = 1 }; typedef uint8_t BrigMemoryOrder8_t; enum BrigMemoryOrder { BRIG_MEMORY_ORDER_NONE = 0, BRIG_MEMORY_ORDER_RELAXED = 1, BRIG_MEMORY_ORDER_SC_ACQUIRE = 2, BRIG_MEMORY_ORDER_SC_RELEASE = 3, BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE = 4, }; typedef uint8_t BrigMemoryScope8_t; enum BrigMemoryScope { BRIG_MEMORY_SCOPE_NONE = 0, BRIG_MEMORY_SCOPE_WORKITEM = 1, BRIG_MEMORY_SCOPE_WAVEFRONT = 2, BRIG_MEMORY_SCOPE_WORKGROUP = 3, BRIG_MEMORY_SCOPE_AGENT = 4, BRIG_MEMORY_SCOPE_SYSTEM = 5, }; typedef uint16_t BrigOpcode16_t; enum BrigOpcode { BRIG_OPCODE_NOP = 0, BRIG_OPCODE_ABS = 1, BRIG_OPCODE_ADD = 2, BRIG_OPCODE_BORROW = 3, BRIG_OPCODE_CARRY = 4, BRIG_OPCODE_CEIL = 5, BRIG_OPCODE_COPYSIGN = 6, BRIG_OPCODE_DIV = 7, BRIG_OPCODE_FLOOR = 8, BRIG_OPCODE_FMA = 9, BRIG_OPCODE_FRACT = 10, BRIG_OPCODE_MAD = 11, BRIG_OPCODE_MAX = 12, BRIG_OPCODE_MIN = 13, BRIG_OPCODE_MUL = 14, BRIG_OPCODE_MULHI = 15, BRIG_OPCODE_NEG = 16, BRIG_OPCODE_REM = 17, BRIG_OPCODE_RINT = 18, BRIG_OPCODE_SQRT = 19, BRIG_OPCODE_SUB = 20, BRIG_OPCODE_TRUNC = 21, BRIG_OPCODE_MAD24 = 22, BRIG_OPCODE_MAD24HI = 23, BRIG_OPCODE_MUL24 = 24, BRIG_OPCODE_MUL24HI = 25, BRIG_OPCODE_SHL = 26, BRIG_OPCODE_SHR = 27, BRIG_OPCODE_AND = 28, BRIG_OPCODE_NOT = 29, BRIG_OPCODE_OR = 30, BRIG_OPCODE_POPCOUNT = 31, BRIG_OPCODE_XOR = 32, BRIG_OPCODE_BITEXTRACT = 33, BRIG_OPCODE_BITINSERT = 34, BRIG_OPCODE_BITMASK = 35, BRIG_OPCODE_BITREV = 36, BRIG_OPCODE_BITSELECT = 37, BRIG_OPCODE_FIRSTBIT = 38, BRIG_OPCODE_LASTBIT = 39, BRIG_OPCODE_COMBINE = 40, BRIG_OPCODE_EXPAND = 41, BRIG_OPCODE_LDA = 42, BRIG_OPCODE_MOV = 43, BRIG_OPCODE_SHUFFLE = 44, BRIG_OPCODE_UNPACKHI = 45, BRIG_OPCODE_UNPACKLO = 46, BRIG_OPCODE_PACK = 47, BRIG_OPCODE_UNPACK = 48, BRIG_OPCODE_CMOV = 49, BRIG_OPCODE_CLASS = 50, BRIG_OPCODE_NCOS = 51, BRIG_OPCODE_NEXP2 = 52, BRIG_OPCODE_NFMA = 53, BRIG_OPCODE_NLOG2 = 54, BRIG_OPCODE_NRCP = 55, BRIG_OPCODE_NRSQRT = 56, BRIG_OPCODE_NSIN = 57, BRIG_OPCODE_NSQRT = 58, BRIG_OPCODE_BITALIGN = 59, BRIG_OPCODE_BYTEALIGN = 60, BRIG_OPCODE_PACKCVT = 61, BRIG_OPCODE_UNPACKCVT = 62, BRIG_OPCODE_LERP = 63, BRIG_OPCODE_SAD = 64, BRIG_OPCODE_SADHI = 65, BRIG_OPCODE_SEGMENTP = 66, BRIG_OPCODE_FTOS = 67, BRIG_OPCODE_STOF = 68, BRIG_OPCODE_CMP = 69, BRIG_OPCODE_CVT = 70, BRIG_OPCODE_LD = 71, BRIG_OPCODE_ST = 72, BRIG_OPCODE_ATOMIC = 73, BRIG_OPCODE_ATOMICNORET = 74, BRIG_OPCODE_SIGNAL = 75, BRIG_OPCODE_SIGNALNORET = 76, BRIG_OPCODE_MEMFENCE = 77, BRIG_OPCODE_RDIMAGE = 78, BRIG_OPCODE_LDIMAGE = 79, BRIG_OPCODE_STIMAGE = 80, BRIG_OPCODE_IMAGEFENCE = 81, BRIG_OPCODE_QUERYIMAGE = 82, BRIG_OPCODE_QUERYSAMPLER = 83, BRIG_OPCODE_CBR = 84, BRIG_OPCODE_BR = 85, BRIG_OPCODE_SBR = 86, BRIG_OPCODE_BARRIER = 87, BRIG_OPCODE_WAVEBARRIER = 88, BRIG_OPCODE_ARRIVEFBAR = 89, BRIG_OPCODE_INITFBAR = 90, BRIG_OPCODE_JOINFBAR = 91, BRIG_OPCODE_LEAVEFBAR = 92, BRIG_OPCODE_RELEASEFBAR = 93, BRIG_OPCODE_WAITFBAR = 94, BRIG_OPCODE_LDF = 95, BRIG_OPCODE_ACTIVELANECOUNT = 96, BRIG_OPCODE_ACTIVELANEID = 97, BRIG_OPCODE_ACTIVELANEMASK = 98, BRIG_OPCODE_ACTIVELANEPERMUTE = 99, BRIG_OPCODE_CALL = 100, BRIG_OPCODE_SCALL = 101, BRIG_OPCODE_ICALL = 102, BRIG_OPCODE_RET = 103, BRIG_OPCODE_ALLOCA = 104, BRIG_OPCODE_CURRENTWORKGROUPSIZE = 105, BRIG_OPCODE_CURRENTWORKITEMFLATID = 106, BRIG_OPCODE_DIM = 107, BRIG_OPCODE_GRIDGROUPS = 108, BRIG_OPCODE_GRIDSIZE = 109, BRIG_OPCODE_PACKETCOMPLETIONSIG = 110, BRIG_OPCODE_PACKETID = 111, BRIG_OPCODE_WORKGROUPID = 112, BRIG_OPCODE_WORKGROUPSIZE = 113, BRIG_OPCODE_WORKITEMABSID = 114, BRIG_OPCODE_WORKITEMFLATABSID = 115, BRIG_OPCODE_WORKITEMFLATID = 116, BRIG_OPCODE_WORKITEMID = 117, BRIG_OPCODE_CLEARDETECTEXCEPT = 118, BRIG_OPCODE_GETDETECTEXCEPT = 119, BRIG_OPCODE_SETDETECTEXCEPT = 120, BRIG_OPCODE_ADDQUEUEWRITEINDEX = 121, BRIG_OPCODE_CASQUEUEWRITEINDEX = 122, BRIG_OPCODE_LDQUEUEREADINDEX = 123, BRIG_OPCODE_LDQUEUEWRITEINDEX = 124, BRIG_OPCODE_STQUEUEREADINDEX = 125, BRIG_OPCODE_STQUEUEWRITEINDEX = 126, BRIG_OPCODE_CLOCK = 127, BRIG_OPCODE_CUID = 128, BRIG_OPCODE_DEBUGTRAP = 129, BRIG_OPCODE_GROUPBASEPTR = 130, BRIG_OPCODE_KERNARGBASEPTR = 131, BRIG_OPCODE_LANEID = 132, BRIG_OPCODE_MAXCUID = 133, BRIG_OPCODE_MAXWAVEID = 134, BRIG_OPCODE_NULLPTR = 135, BRIG_OPCODE_WAVEID = 136, BRIG_OPCODE_FIRST_USER_DEFINED = 32768, }; typedef uint8_t BrigPack8_t; enum BrigPack { BRIG_PACK_NONE = 0, BRIG_PACK_PP = 1, BRIG_PACK_PS = 2, BRIG_PACK_SP = 3, BRIG_PACK_SS = 4, BRIG_PACK_S = 5, BRIG_PACK_P = 6, BRIG_PACK_PPSAT = 7, BRIG_PACK_PSSAT = 8, BRIG_PACK_SPSAT = 9, BRIG_PACK_SSSAT = 10, BRIG_PACK_SSAT = 11, BRIG_PACK_PSAT = 12 }; typedef uint8_t BrigProfile8_t; enum BrigProfile { BRIG_PROFILE_BASE = 0, BRIG_PROFILE_FULL = 1, }; typedef uint16_t BrigRegisterKind16_t; enum BrigRegisterKind { BRIG_REGISTER_KIND_CONTROL = 0, BRIG_REGISTER_KIND_SINGLE = 1, BRIG_REGISTER_KIND_DOUBLE = 2, BRIG_REGISTER_KIND_QUAD = 3 }; typedef uint8_t BrigRound8_t; enum BrigRound { BRIG_ROUND_NONE = 0, BRIG_ROUND_FLOAT_DEFAULT = 1, BRIG_ROUND_FLOAT_NEAR_EVEN = 2, BRIG_ROUND_FLOAT_ZERO = 3, BRIG_ROUND_FLOAT_PLUS_INFINITY = 4, BRIG_ROUND_FLOAT_MINUS_INFINITY = 5, BRIG_ROUND_INTEGER_NEAR_EVEN = 6, BRIG_ROUND_INTEGER_ZERO = 7, BRIG_ROUND_INTEGER_PLUS_INFINITY = 8, BRIG_ROUND_INTEGER_MINUS_INFINITY = 9, BRIG_ROUND_INTEGER_NEAR_EVEN_SAT = 10, BRIG_ROUND_INTEGER_ZERO_SAT = 11, BRIG_ROUND_INTEGER_PLUS_INFINITY_SAT = 12, BRIG_ROUND_INTEGER_MINUS_INFINITY_SAT = 13, BRIG_ROUND_INTEGER_SIGNALING_NEAR_EVEN = 14, BRIG_ROUND_INTEGER_SIGNALING_ZERO = 15, BRIG_ROUND_INTEGER_SIGNALING_PLUS_INFINITY = 16, BRIG_ROUND_INTEGER_SIGNALING_MINUS_INFINITY = 17, BRIG_ROUND_INTEGER_SIGNALING_NEAR_EVEN_SAT = 18, BRIG_ROUND_INTEGER_SIGNALING_ZERO_SAT = 19, BRIG_ROUND_INTEGER_SIGNALING_PLUS_INFINITY_SAT = 20, BRIG_ROUND_INTEGER_SIGNALING_MINUS_INFINITY_SAT = 21 }; typedef uint8_t BrigSamplerAddressing8_t; enum BrigSamplerAddressing { BRIG_ADDRESSING_UNDEFINED = 0, BRIG_ADDRESSING_CLAMP_TO_EDGE = 1, BRIG_ADDRESSING_CLAMP_TO_BORDER = 2, BRIG_ADDRESSING_REPEAT = 3, BRIG_ADDRESSING_MIRRORED_REPEAT = 4, BRIG_ADDRESSING_FIRST_USER_DEFINED = 128 }; typedef uint8_t BrigSamplerCoordNormalization8_t; enum BrigSamplerCoordNormalization { BRIG_COORD_UNNORMALIZED = 0, BRIG_COORD_NORMALIZED = 1 }; typedef uint8_t BrigSamplerFilter8_t; enum BrigSamplerFilter { BRIG_FILTER_NEAREST = 0, BRIG_FILTER_LINEAR = 1, BRIG_FILTER_FIRST_USER_DEFINED = 128 }; typedef uint8_t BrigSamplerQuery8_t; enum BrigSamplerQuery { BRIG_SAMPLER_QUERY_ADDRESSING = 0, BRIG_SAMPLER_QUERY_COORD = 1, BRIG_SAMPLER_QUERY_FILTER = 2 }; typedef uint32_t BrigSectionIndex32_t; enum BrigSectionIndex { BRIG_SECTION_INDEX_DATA = 0, BRIG_SECTION_INDEX_CODE = 1, BRIG_SECTION_INDEX_OPERAND = 2, BRIG_SECTION_INDEX_BEGIN_IMPLEMENTATION_DEFINED = 3, }; typedef uint8_t BrigSegCvtModifier8_t; enum BrigSegCvtModifierMask { BRIG_SEG_CVT_NONULL = 1 }; typedef uint8_t BrigSegment8_t; enum BrigSegment { BRIG_SEGMENT_NONE = 0, BRIG_SEGMENT_FLAT = 1, BRIG_SEGMENT_GLOBAL = 2, BRIG_SEGMENT_READONLY = 3, BRIG_SEGMENT_KERNARG = 4, BRIG_SEGMENT_GROUP = 5, BRIG_SEGMENT_PRIVATE = 6, BRIG_SEGMENT_SPILL = 7, BRIG_SEGMENT_ARG = 8, BRIG_SEGMENT_FIRST_USER_DEFINED = 128 }; enum { BRIG_TYPE_BASE_SIZE = 5, BRIG_TYPE_PACK_SIZE = 2, BRIG_TYPE_ARRAY_SIZE = 1, BRIG_TYPE_BASE_SHIFT = 0, BRIG_TYPE_PACK_SHIFT = BRIG_TYPE_BASE_SHIFT + BRIG_TYPE_BASE_SIZE, BRIG_TYPE_ARRAY_SHIFT = BRIG_TYPE_PACK_SHIFT + BRIG_TYPE_PACK_SIZE, BRIG_TYPE_BASE_MASK = ((1 << BRIG_TYPE_BASE_SIZE) - 1) << BRIG_TYPE_BASE_SHIFT, BRIG_TYPE_PACK_MASK = ((1 << BRIG_TYPE_PACK_SIZE) - 1) << BRIG_TYPE_PACK_SHIFT, BRIG_TYPE_ARRAY_MASK = ((1 << BRIG_TYPE_ARRAY_SIZE) - 1) << BRIG_TYPE_ARRAY_SHIFT, BRIG_TYPE_PACK_NONE = 0 << BRIG_TYPE_PACK_SHIFT, BRIG_TYPE_PACK_32 = 1 << BRIG_TYPE_PACK_SHIFT, BRIG_TYPE_PACK_64 = 2 << BRIG_TYPE_PACK_SHIFT, BRIG_TYPE_PACK_128 = 3 << BRIG_TYPE_PACK_SHIFT, BRIG_TYPE_ARRAY = 1 << BRIG_TYPE_ARRAY_SHIFT }; typedef uint16_t BrigType16_t; enum BrigType { BRIG_TYPE_NONE = 0, BRIG_TYPE_U8 = 1, BRIG_TYPE_U16 = 2, BRIG_TYPE_U32 = 3, BRIG_TYPE_U64 = 4, BRIG_TYPE_S8 = 5, BRIG_TYPE_S16 = 6, BRIG_TYPE_S32 = 7, BRIG_TYPE_S64 = 8, BRIG_TYPE_F16 = 9, BRIG_TYPE_F32 = 10, BRIG_TYPE_F64 = 11, BRIG_TYPE_B1 = 12, BRIG_TYPE_B8 = 13, BRIG_TYPE_B16 = 14, BRIG_TYPE_B32 = 15, BRIG_TYPE_B64 = 16, BRIG_TYPE_B128 = 17, BRIG_TYPE_SAMP = 18, BRIG_TYPE_ROIMG = 19, BRIG_TYPE_WOIMG = 20, BRIG_TYPE_RWIMG = 21, BRIG_TYPE_SIG32 = 22, BRIG_TYPE_SIG64 = 23, BRIG_TYPE_U8X4 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_32, BRIG_TYPE_U8X8 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_64, BRIG_TYPE_U8X16 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_128, BRIG_TYPE_U16X2 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_32, BRIG_TYPE_U16X4 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_64, BRIG_TYPE_U16X8 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_128, BRIG_TYPE_U32X2 = BRIG_TYPE_U32 | BRIG_TYPE_PACK_64, BRIG_TYPE_U32X4 = BRIG_TYPE_U32 | BRIG_TYPE_PACK_128, BRIG_TYPE_U64X2 = BRIG_TYPE_U64 | BRIG_TYPE_PACK_128, BRIG_TYPE_S8X4 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_32, BRIG_TYPE_S8X8 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_64, BRIG_TYPE_S8X16 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_128, BRIG_TYPE_S16X2 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_32, BRIG_TYPE_S16X4 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_64, BRIG_TYPE_S16X8 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_128, BRIG_TYPE_S32X2 = BRIG_TYPE_S32 | BRIG_TYPE_PACK_64, BRIG_TYPE_S32X4 = BRIG_TYPE_S32 | BRIG_TYPE_PACK_128, BRIG_TYPE_S64X2 = BRIG_TYPE_S64 | BRIG_TYPE_PACK_128, BRIG_TYPE_F16X2 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_32, BRIG_TYPE_F16X4 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_64, BRIG_TYPE_F16X8 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_128, BRIG_TYPE_F32X2 = BRIG_TYPE_F32 | BRIG_TYPE_PACK_64, BRIG_TYPE_F32X4 = BRIG_TYPE_F32 | BRIG_TYPE_PACK_128, BRIG_TYPE_F64X2 = BRIG_TYPE_F64 | BRIG_TYPE_PACK_128, BRIG_TYPE_U8_ARRAY = BRIG_TYPE_U8 | BRIG_TYPE_ARRAY, BRIG_TYPE_U16_ARRAY = BRIG_TYPE_U16 | BRIG_TYPE_ARRAY, BRIG_TYPE_U32_ARRAY = BRIG_TYPE_U32 | BRIG_TYPE_ARRAY, BRIG_TYPE_U64_ARRAY = BRIG_TYPE_U64 | BRIG_TYPE_ARRAY, BRIG_TYPE_S8_ARRAY = BRIG_TYPE_S8 | BRIG_TYPE_ARRAY, BRIG_TYPE_S16_ARRAY = BRIG_TYPE_S16 | BRIG_TYPE_ARRAY, BRIG_TYPE_S32_ARRAY = BRIG_TYPE_S32 | BRIG_TYPE_ARRAY, BRIG_TYPE_S64_ARRAY = BRIG_TYPE_S64 | BRIG_TYPE_ARRAY, BRIG_TYPE_F16_ARRAY = BRIG_TYPE_F16 | BRIG_TYPE_ARRAY, BRIG_TYPE_F32_ARRAY = BRIG_TYPE_F32 | BRIG_TYPE_ARRAY, BRIG_TYPE_F64_ARRAY = BRIG_TYPE_F64 | BRIG_TYPE_ARRAY, BRIG_TYPE_B8_ARRAY = BRIG_TYPE_B8 | BRIG_TYPE_ARRAY, BRIG_TYPE_B16_ARRAY = BRIG_TYPE_B16 | BRIG_TYPE_ARRAY, BRIG_TYPE_B32_ARRAY = BRIG_TYPE_B32 | BRIG_TYPE_ARRAY, BRIG_TYPE_B64_ARRAY = BRIG_TYPE_B64 | BRIG_TYPE_ARRAY, BRIG_TYPE_B128_ARRAY = BRIG_TYPE_B128 | BRIG_TYPE_ARRAY, BRIG_TYPE_SAMP_ARRAY = BRIG_TYPE_SAMP | BRIG_TYPE_ARRAY, BRIG_TYPE_ROIMG_ARRAY = BRIG_TYPE_ROIMG | BRIG_TYPE_ARRAY, BRIG_TYPE_WOIMG_ARRAY = BRIG_TYPE_WOIMG | BRIG_TYPE_ARRAY, BRIG_TYPE_RWIMG_ARRAY = BRIG_TYPE_RWIMG | BRIG_TYPE_ARRAY, BRIG_TYPE_SIG32_ARRAY = BRIG_TYPE_SIG32 | BRIG_TYPE_ARRAY, BRIG_TYPE_SIG64_ARRAY = BRIG_TYPE_SIG64 | BRIG_TYPE_ARRAY, BRIG_TYPE_U8X4_ARRAY = BRIG_TYPE_U8X4 | BRIG_TYPE_ARRAY, BRIG_TYPE_U8X8_ARRAY = BRIG_TYPE_U8X8 | BRIG_TYPE_ARRAY, BRIG_TYPE_U8X16_ARRAY = BRIG_TYPE_U8X16 | BRIG_TYPE_ARRAY, BRIG_TYPE_U16X2_ARRAY = BRIG_TYPE_U16X2 | BRIG_TYPE_ARRAY, BRIG_TYPE_U16X4_ARRAY = BRIG_TYPE_U16X4 | BRIG_TYPE_ARRAY, BRIG_TYPE_U16X8_ARRAY = BRIG_TYPE_U16X8 | BRIG_TYPE_ARRAY, BRIG_TYPE_U32X2_ARRAY = BRIG_TYPE_U32X2 | BRIG_TYPE_ARRAY, BRIG_TYPE_U32X4_ARRAY = BRIG_TYPE_U32X4 | BRIG_TYPE_ARRAY, BRIG_TYPE_U64X2_ARRAY = BRIG_TYPE_U64X2 | BRIG_TYPE_ARRAY, BRIG_TYPE_S8X4_ARRAY = BRIG_TYPE_S8X4 | BRIG_TYPE_ARRAY, BRIG_TYPE_S8X8_ARRAY = BRIG_TYPE_S8X8 | BRIG_TYPE_ARRAY, BRIG_TYPE_S8X16_ARRAY = BRIG_TYPE_S8X16 | BRIG_TYPE_ARRAY, BRIG_TYPE_S16X2_ARRAY = BRIG_TYPE_S16X2 | BRIG_TYPE_ARRAY, BRIG_TYPE_S16X4_ARRAY = BRIG_TYPE_S16X4 | BRIG_TYPE_ARRAY, BRIG_TYPE_S16X8_ARRAY = BRIG_TYPE_S16X8 | BRIG_TYPE_ARRAY, BRIG_TYPE_S32X2_ARRAY = BRIG_TYPE_S32X2 | BRIG_TYPE_ARRAY, BRIG_TYPE_S32X4_ARRAY = BRIG_TYPE_S32X4 | BRIG_TYPE_ARRAY, BRIG_TYPE_S64X2_ARRAY = BRIG_TYPE_S64X2 | BRIG_TYPE_ARRAY, BRIG_TYPE_F16X2_ARRAY = BRIG_TYPE_F16X2 | BRIG_TYPE_ARRAY, BRIG_TYPE_F16X4_ARRAY = BRIG_TYPE_F16X4 | BRIG_TYPE_ARRAY, BRIG_TYPE_F16X8_ARRAY = BRIG_TYPE_F16X8 | BRIG_TYPE_ARRAY, BRIG_TYPE_F32X2_ARRAY = BRIG_TYPE_F32X2 | BRIG_TYPE_ARRAY, BRIG_TYPE_F32X4_ARRAY = BRIG_TYPE_F32X4 | BRIG_TYPE_ARRAY, BRIG_TYPE_F64X2_ARRAY = BRIG_TYPE_F64X2 | BRIG_TYPE_ARRAY, }; typedef uint8_t BrigVariableModifier8_t; enum BrigVariableModifierMask { BRIG_VARIABLE_DEFINITION = 1, BRIG_VARIABLE_CONST = 2 }; typedef uint8_t BrigWidth8_t; enum BrigWidth { BRIG_WIDTH_NONE = 0, BRIG_WIDTH_1 = 1, BRIG_WIDTH_2 = 2, BRIG_WIDTH_4 = 3, BRIG_WIDTH_8 = 4, BRIG_WIDTH_16 = 5, BRIG_WIDTH_32 = 6, BRIG_WIDTH_64 = 7, BRIG_WIDTH_128 = 8, BRIG_WIDTH_256 = 9, BRIG_WIDTH_512 = 10, BRIG_WIDTH_1024 = 11, BRIG_WIDTH_2048 = 12, BRIG_WIDTH_4096 = 13, BRIG_WIDTH_8192 = 14, BRIG_WIDTH_16384 = 15, BRIG_WIDTH_32768 = 16, BRIG_WIDTH_65536 = 17, BRIG_WIDTH_131072 = 18, BRIG_WIDTH_262144 = 19, BRIG_WIDTH_524288 = 20, BRIG_WIDTH_1048576 = 21, BRIG_WIDTH_2097152 = 22, BRIG_WIDTH_4194304 = 23, BRIG_WIDTH_8388608 = 24, BRIG_WIDTH_16777216 = 25, BRIG_WIDTH_33554432 = 26, BRIG_WIDTH_67108864 = 27, BRIG_WIDTH_134217728 = 28, BRIG_WIDTH_268435456 = 29, BRIG_WIDTH_536870912 = 30, BRIG_WIDTH_1073741824 = 31, BRIG_WIDTH_2147483648 = 32, BRIG_WIDTH_WAVESIZE = 33, BRIG_WIDTH_ALL = 34, }; struct BrigUInt64 { uint32_t lo; uint32_t hi; }; struct BrigBase { uint16_t byteCount; BrigKind16_t kind; }; struct BrigData { uint32_t byteCount; uint8_t bytes[1]; }; struct BrigDirectiveArgBlock { BrigBase base; }; struct BrigDirectiveComment { BrigBase base; BrigDataOffsetString32_t name; }; struct BrigDirectiveControl { BrigBase base; BrigControlDirective16_t control; uint16_t reserved; BrigDataOffsetOperandList32_t operands; }; struct BrigDirectiveExecutable { BrigBase base; BrigDataOffsetString32_t name; uint16_t outArgCount; uint16_t inArgCount; BrigCodeOffset32_t firstInArg; BrigCodeOffset32_t firstCodeBlockEntry; BrigCodeOffset32_t nextModuleEntry; BrigExecutableModifier8_t modifier; BrigLinkage8_t linkage; uint16_t reserved; }; struct BrigDirectiveExtension { BrigBase base; BrigDataOffsetString32_t name; }; struct BrigDirectiveFbarrier { BrigBase base; BrigDataOffsetString32_t name; BrigVariableModifier8_t modifier; BrigLinkage8_t linkage; uint16_t reserved; }; struct BrigDirectiveLabel { BrigBase base; BrigDataOffsetString32_t name; }; struct BrigDirectiveLoc { BrigBase base; BrigDataOffsetString32_t filename; uint32_t line; uint32_t column; }; struct BrigDirectiveNone { BrigBase base; }; struct BrigDirectivePragma { BrigBase base; BrigDataOffsetOperandList32_t operands; }; struct BrigDirectiveVariable { BrigBase base; BrigDataOffsetString32_t name; BrigOperandOffset32_t init; BrigType16_t type; BrigSegment8_t segment; BrigAlignment8_t align; BrigUInt64 dim; BrigVariableModifier8_t modifier; BrigLinkage8_t linkage; BrigAllocation8_t allocation; uint8_t reserved; }; struct BrigDirectiveModule { BrigBase base; BrigDataOffsetString32_t name; BrigVersion32_t hsailMajor; BrigVersion32_t hsailMinor; BrigProfile8_t profile; BrigMachineModel8_t machineModel; BrigRound8_t defaultFloatRound; uint8_t reserved; }; struct BrigInstBase { BrigBase base; BrigOpcode16_t opcode; BrigType16_t type; BrigDataOffsetOperandList32_t operands; }; struct BrigInstAddr { BrigInstBase base; BrigSegment8_t segment; uint8_t reserved[3]; }; struct BrigInstAtomic { BrigInstBase base; BrigSegment8_t segment; BrigMemoryOrder8_t memoryOrder; BrigMemoryScope8_t memoryScope; BrigAtomicOperation8_t atomicOperation; uint8_t equivClass; uint8_t reserved[3]; }; struct BrigInstBasic { BrigInstBase base; }; struct BrigInstBr { BrigInstBase base; BrigWidth8_t width; uint8_t reserved[3]; }; struct BrigInstCmp { BrigInstBase base; BrigType16_t sourceType; BrigAluModifier8_t modifier; BrigCompareOperation8_t compare; BrigPack8_t pack; uint8_t reserved[3]; }; struct BrigInstCvt { BrigInstBase base; BrigType16_t sourceType; BrigAluModifier8_t modifier; BrigRound8_t round; }; struct BrigInstImage { BrigInstBase base; BrigType16_t imageType; BrigType16_t coordType; BrigImageGeometry8_t geometry; uint8_t equivClass; uint16_t reserved; }; struct BrigInstLane { BrigInstBase base; BrigType16_t sourceType; BrigWidth8_t width; uint8_t reserved; }; struct BrigInstMem { BrigInstBase base; BrigSegment8_t segment; BrigAlignment8_t align; uint8_t equivClass; BrigWidth8_t width; BrigMemoryModifier8_t modifier; uint8_t reserved[3]; }; struct BrigInstMemFence { BrigInstBase base; BrigMemoryOrder8_t memoryOrder; BrigMemoryScope8_t globalSegmentMemoryScope; BrigMemoryScope8_t groupSegmentMemoryScope; BrigMemoryScope8_t imageSegmentMemoryScope; }; struct BrigInstMod { BrigInstBase base; BrigAluModifier8_t modifier; BrigRound8_t round; BrigPack8_t pack; uint8_t reserved; }; struct BrigInstQueryImage { BrigInstBase base; BrigType16_t imageType; BrigImageGeometry8_t geometry; BrigImageQuery8_t query; }; struct BrigInstQuerySampler { BrigInstBase base; BrigSamplerQuery8_t query; uint8_t reserved[3]; }; struct BrigInstQueue { BrigInstBase base; BrigSegment8_t segment; BrigMemoryOrder8_t memoryOrder; uint16_t reserved; }; struct BrigInstSeg { BrigInstBase base; BrigSegment8_t segment; uint8_t reserved[3]; }; struct BrigInstSegCvt { BrigInstBase base; BrigType16_t sourceType; BrigSegment8_t segment; BrigSegCvtModifier8_t modifier; }; struct BrigInstSignal { BrigInstBase base; BrigType16_t signalType; BrigMemoryOrder8_t memoryOrder; BrigAtomicOperation8_t signalOperation; }; struct BrigInstSourceType { BrigInstBase base; BrigType16_t sourceType; uint16_t reserved; }; struct BrigOperandAddress { BrigBase base; BrigCodeOffset32_t symbol; BrigOperandOffset32_t reg; BrigUInt64 offset; }; struct BrigOperandAlign { BrigBase base; BrigAlignment8_t align; uint8_t reserved[3]; }; struct BrigOperandCodeList { BrigBase base; BrigDataOffsetCodeList32_t elements; }; struct BrigOperandCodeRef { BrigBase base; BrigCodeOffset32_t ref; }; struct BrigOperandConstantBytes { BrigBase base; BrigType16_t type; uint16_t reserved; BrigDataOffsetString32_t bytes; }; struct BrigOperandConstantOperandList { BrigBase base; BrigType16_t type; uint16_t reserved; BrigDataOffsetOperandList32_t elements; }; struct BrigOperandConstantImage { BrigBase base; BrigType16_t type; BrigImageGeometry8_t geometry; BrigImageChannelOrder8_t channelOrder; BrigImageChannelType8_t channelType; uint8_t reserved[3]; BrigUInt64 width; BrigUInt64 height; BrigUInt64 depth; BrigUInt64 array; }; struct BrigOperandOperandList { BrigBase base; BrigDataOffsetOperandList32_t elements; }; struct BrigOperandRegister { BrigBase base; BrigRegisterKind16_t regKind; uint16_t regNum; }; struct BrigOperandConstantSampler { BrigBase base; BrigType16_t type; BrigSamplerCoordNormalization8_t coord; BrigSamplerFilter8_t filter; BrigSamplerAddressing8_t addressing; uint8_t reserved[3]; }; struct BrigOperandString { BrigBase base; BrigDataOffsetString32_t string; }; struct BrigOperandWavesize { BrigBase base; }; typedef uint32_t BrigExceptions32_t; enum BrigExceptionsMask { BRIG_EXCEPTIONS_INVALID_OPERATION = 1 << 0, BRIG_EXCEPTIONS_DIVIDE_BY_ZERO = 1 << 1, BRIG_EXCEPTIONS_OVERFLOW = 1 << 2, BRIG_EXCEPTIONS_UNDERFLOW = 1 << 3, BRIG_EXCEPTIONS_INEXACT = 1 << 4, BRIG_EXCEPTIONS_FIRST_USER_DEFINED = 1 << 16 }; struct BrigSectionHeader { uint64_t byteCount; uint32_t headerByteCount; uint32_t nameLength; uint8_t name[1]; }; struct BrigModuleHeader { char identification[8]; BrigVersion32_t brigMajor; BrigVersion32_t brigMinor; uint64_t byteCount; uint8_t hash[64]; uint32_t reserved; uint32_t sectionCount; uint64_t sectionIndex; }; typedef BrigModuleHeader* BrigModule_t; #ifdef __cplusplus } #endif /*__cplusplus*/ #endif // defined(INCLUDED_BRIG_H) ROCR-Runtime-rocm-5.0.0/src/inc/amd_hsa_common.h000066400000000000000000000075101420110115200213030ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // The following set of header files provides definitions for AMD GPU // Architecture: // - amd_hsa_common.h // - amd_hsa_elf.h // - amd_hsa_kernel_code.h // - amd_hsa_queue.h // - amd_hsa_signal.h // // Refer to "HSA Application Binary Interface: AMD GPU Architecture" for more // information. #ifndef AMD_HSA_COMMON_H #define AMD_HSA_COMMON_H #include #include // Descriptive version of the HSA Application Binary Interface. #define AMD_HSA_ABI_VERSION "AMD GPU Architecture v0.35 (June 25, 2015)" // Alignment attribute that specifies a minimum alignment (in bytes) for // variables of the specified type. #if defined(__GNUC__) # define __ALIGNED__(x) __attribute__((aligned(x))) #elif defined(_MSC_VER) # define __ALIGNED__(x) __declspec(align(x)) #elif defined(RC_INVOKED) # define __ALIGNED__(x) #else # error #endif // Creates enumeration entries for packed types. Enumeration entries include // bit shift amount, bit width, and bit mask. #define AMD_HSA_BITS_CREATE_ENUM_ENTRIES(name, shift, width) \ name##_SHIFT = (shift), \ name##_WIDTH = (width), \ name = (((1 << (width)) - 1) << (shift)) \ // Gets bits for specified mask from specified src packed instance. #define AMD_HSA_BITS_GET(src, mask) \ ((src & mask) >> mask ## _SHIFT) \ // Sets val bits for specified mask in specified dst packed instance. #define AMD_HSA_BITS_SET(dst, mask, val) \ dst &= (~(1 << mask##_SHIFT) & ~mask); \ dst |= (((val) << mask##_SHIFT) & mask) \ #endif // AMD_HSA_COMMON_H ROCR-Runtime-rocm-5.0.0/src/inc/amd_hsa_elf.h000066400000000000000000000365321420110115200205670ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // Undefine the macro in case it is defined in the system elf.h. #undef EM_AMDGPU #ifndef AMD_HSA_ELF_H #define AMD_HSA_ELF_H // AMD GPU Specific ELF Header Enumeration Values. // // Values are copied from LLVM BinaryFormat/ELF.h . This file also contains // code object V1 defintions which are not part of the LLVM header. Code object // V1 was only supported by the Finalizer which is now deprecated and removed. // // TODO: Deprecate and remove V1 support and replace this header with using the // LLVM header. namespace ELF { // Machine architectures // See current registered ELF machine architectures at: // http://www.uxsglobal.com/developers/gabi/latest/ch4.eheader.html enum { EM_AMDGPU = 224, // AMD GPU architecture }; // OS ABI identification. enum { ELFOSABI_AMDGPU_HSA = 64, // AMD HSA runtime }; // AMDGPU OS ABI Version identification. enum { // ELFABIVERSION_AMDGPU_HSA_V1 does not exist because OS ABI identification // was never defined for V1. ELFABIVERSION_AMDGPU_HSA_V2 = 0, ELFABIVERSION_AMDGPU_HSA_V3 = 1, ELFABIVERSION_AMDGPU_HSA_V4 = 2 }; // AMDGPU specific e_flags. enum : unsigned { // Processor selection mask for EF_AMDGPU_MACH_* values. EF_AMDGPU_MACH = 0x0ff, // Not specified processor. EF_AMDGPU_MACH_NONE = 0x000, // AMDGCN-based processors. EF_AMDGPU_MACH_AMDGCN_GFX600 = 0x020, EF_AMDGPU_MACH_AMDGCN_GFX601 = 0x021, EF_AMDGPU_MACH_AMDGCN_GFX700 = 0x022, EF_AMDGPU_MACH_AMDGCN_GFX701 = 0x023, EF_AMDGPU_MACH_AMDGCN_GFX702 = 0x024, EF_AMDGPU_MACH_AMDGCN_GFX703 = 0x025, EF_AMDGPU_MACH_AMDGCN_GFX704 = 0x026, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X27 = 0x027, EF_AMDGPU_MACH_AMDGCN_GFX801 = 0x028, EF_AMDGPU_MACH_AMDGCN_GFX802 = 0x029, EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a, EF_AMDGPU_MACH_AMDGCN_GFX810 = 0x02b, EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c, EF_AMDGPU_MACH_AMDGCN_GFX902 = 0x02d, EF_AMDGPU_MACH_AMDGCN_GFX904 = 0x02e, EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f, EF_AMDGPU_MACH_AMDGCN_GFX908 = 0x030, EF_AMDGPU_MACH_AMDGCN_GFX909 = 0x031, EF_AMDGPU_MACH_AMDGCN_GFX90C = 0x032, EF_AMDGPU_MACH_AMDGCN_GFX1010 = 0x033, EF_AMDGPU_MACH_AMDGCN_GFX1011 = 0x034, EF_AMDGPU_MACH_AMDGCN_GFX1012 = 0x035, EF_AMDGPU_MACH_AMDGCN_GFX1030 = 0x036, EF_AMDGPU_MACH_AMDGCN_GFX1031 = 0x037, EF_AMDGPU_MACH_AMDGCN_GFX1032 = 0x038, EF_AMDGPU_MACH_AMDGCN_GFX1033 = 0x039, EF_AMDGPU_MACH_AMDGCN_GFX602 = 0x03a, EF_AMDGPU_MACH_AMDGCN_GFX705 = 0x03b, EF_AMDGPU_MACH_AMDGCN_GFX805 = 0x03c, EF_AMDGPU_MACH_AMDGCN_GFX1035 = 0x03d, EF_AMDGPU_MACH_AMDGCN_GFX1034 = 0x03e, EF_AMDGPU_MACH_AMDGCN_GFX90A = 0x03f, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X40 = 0x040, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X41 = 0x041, EF_AMDGPU_MACH_AMDGCN_GFX1013 = 0x042, // First/last AMDGCN-based processors. EF_AMDGPU_MACH_AMDGCN_FIRST = EF_AMDGPU_MACH_AMDGCN_GFX600, EF_AMDGPU_MACH_AMDGCN_LAST = EF_AMDGPU_MACH_AMDGCN_GFX1013, // Indicates if the "xnack" target feature is enabled for all code contained // in the object. // // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V2. EF_AMDGPU_FEATURE_XNACK_V2 = 0x01, // Indicates if the trap handler is enabled for all code contained // in the object. // // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V2. EF_AMDGPU_FEATURE_TRAP_HANDLER_V2 = 0x02, // Indicates if the "xnack" target feature is enabled for all code contained // in the object. // // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V3. EF_AMDGPU_FEATURE_XNACK_V3 = 0x100, // Indicates if the "sramecc" target feature is enabled for all code // contained in the object. // // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V3. EF_AMDGPU_FEATURE_SRAMECC_V3 = 0x200, // XNACK selection mask for EF_AMDGPU_FEATURE_XNACK_* values. // // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V4. EF_AMDGPU_FEATURE_XNACK_V4 = 0x300, // XNACK is not supported. EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4 = 0x000, // XNACK is any/default/unspecified. EF_AMDGPU_FEATURE_XNACK_ANY_V4 = 0x100, // XNACK is off. EF_AMDGPU_FEATURE_XNACK_OFF_V4 = 0x200, // XNACK is on. EF_AMDGPU_FEATURE_XNACK_ON_V4 = 0x300, // SRAMECC selection mask for EF_AMDGPU_FEATURE_SRAMECC_* values. // // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V4. EF_AMDGPU_FEATURE_SRAMECC_V4 = 0xc00, // SRAMECC is not supported. EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4 = 0x000, // SRAMECC is any/default/unspecified. EF_AMDGPU_FEATURE_SRAMECC_ANY_V4 = 0x400, // SRAMECC is off. EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 = 0x800, // SRAMECC is on. EF_AMDGPU_FEATURE_SRAMECC_ON_V4 = 0xc00, }; } // end namespace ELF // ELF Section Header Flag Enumeration Values. #define SHF_AMDGPU_HSA_GLOBAL (0x00100000 & SHF_MASKOS) #define SHF_AMDGPU_HSA_READONLY (0x00200000 & SHF_MASKOS) #define SHF_AMDGPU_HSA_CODE (0x00400000 & SHF_MASKOS) #define SHF_AMDGPU_HSA_AGENT (0x00800000 & SHF_MASKOS) // typedef enum { AMDGPU_HSA_SEGMENT_GLOBAL_PROGRAM = 0, AMDGPU_HSA_SEGMENT_GLOBAL_AGENT = 1, AMDGPU_HSA_SEGMENT_READONLY_AGENT = 2, AMDGPU_HSA_SEGMENT_CODE_AGENT = 3, AMDGPU_HSA_SEGMENT_LAST, } amdgpu_hsa_elf_segment_t; // ELF Program Header Type Enumeration Values. #define PT_AMDGPU_HSA_LOAD_GLOBAL_PROGRAM (PT_LOOS + AMDGPU_HSA_SEGMENT_GLOBAL_PROGRAM) #define PT_AMDGPU_HSA_LOAD_GLOBAL_AGENT (PT_LOOS + AMDGPU_HSA_SEGMENT_GLOBAL_AGENT) #define PT_AMDGPU_HSA_LOAD_READONLY_AGENT (PT_LOOS + AMDGPU_HSA_SEGMENT_READONLY_AGENT) #define PT_AMDGPU_HSA_LOAD_CODE_AGENT (PT_LOOS + AMDGPU_HSA_SEGMENT_CODE_AGENT) // ELF Symbol Type Enumeration Values. #define STT_AMDGPU_HSA_KERNEL (STT_LOOS + 0) #define STT_AMDGPU_HSA_INDIRECT_FUNCTION (STT_LOOS + 1) #define STT_AMDGPU_HSA_METADATA (STT_LOOS + 2) // ELF Symbol Binding Enumeration Values. #define STB_AMDGPU_HSA_EXTERNAL (STB_LOOS + 0) // ELF Symbol Other Information Creation/Retrieval. #define ELF64_ST_AMDGPU_ALLOCATION(o) (((o) >> 2) & 0x3) #define ELF64_ST_AMDGPU_FLAGS(o) ((o) >> 4) #define ELF64_ST_AMDGPU_OTHER(f, a, v) (((f) << 4) + (((a) & 0x3) << 2) + ((v) & 0x3)) typedef enum { AMDGPU_HSA_SYMBOL_ALLOCATION_DEFAULT = 0, AMDGPU_HSA_SYMBOL_ALLOCATION_GLOBAL_PROGRAM = 1, AMDGPU_HSA_SYMBOL_ALLOCATION_GLOBAL_AGENT = 2, AMDGPU_HSA_SYMBOL_ALLOCATION_READONLY_AGENT = 3, AMDGPU_HSA_SYMBOL_ALLOCATION_LAST, } amdgpu_hsa_symbol_allocation_t; // ELF Symbol Allocation Enumeration Values. #define STA_AMDGPU_HSA_DEFAULT AMDGPU_HSA_SYMBOL_ALLOCATION_DEFAULT #define STA_AMDGPU_HSA_GLOBAL_PROGRAM AMDGPU_HSA_SYMBOL_ALLOCATION_GLOBAL_PROGRAM #define STA_AMDGPU_HSA_GLOBAL_AGENT AMDGPU_HSA_SYMBOL_ALLOCATION_GLOBAL_AGENT #define STA_AMDGPU_HSA_READONLY_AGENT AMDGPU_HSA_SYMBOL_ALLOCATION_READONLY_AGENT typedef enum { AMDGPU_HSA_SYMBOL_FLAG_DEFAULT = 0, AMDGPU_HSA_SYMBOL_FLAG_CONST = 1, AMDGPU_HSA_SYMBOL_FLAG_LAST, } amdgpu_hsa_symbol_flag_t; // ELF Symbol Flag Enumeration Values. #define STF_AMDGPU_HSA_CONST AMDGPU_HSA_SYMBOL_FLAG_CONST // AMD GPU Relocation Type Enumeration Values. #define R_AMDGPU_NONE 0 #define R_AMDGPU_32_LOW 1 #define R_AMDGPU_32_HIGH 2 #define R_AMDGPU_64 3 #define R_AMDGPU_INIT_SAMPLER 4 #define R_AMDGPU_INIT_IMAGE 5 #define R_AMDGPU_RELATIVE64 13 // AMD GPU Note Type Enumeration Values. #define NT_AMD_HSA_CODE_OBJECT_VERSION 1 #define NT_AMD_HSA_HSAIL 2 #define NT_AMD_HSA_ISA_VERSION 3 #define NT_AMD_HSA_PRODUCER 4 #define NT_AMD_HSA_PRODUCER_OPTIONS 5 #define NT_AMD_HSA_EXTENSION 6 #define NT_AMD_HSA_ISA_NAME 11 #define NT_AMD_HSA_HLDEBUG_DEBUG 101 #define NT_AMD_HSA_HLDEBUG_TARGET 102 // AMD GPU Metadata Kind Enumeration Values. typedef uint16_t amdgpu_hsa_metadata_kind16_t; typedef enum { AMDGPU_HSA_METADATA_KIND_NONE = 0, AMDGPU_HSA_METADATA_KIND_INIT_SAMP = 1, AMDGPU_HSA_METADATA_KIND_INIT_ROIMG = 2, AMDGPU_HSA_METADATA_KIND_INIT_WOIMG = 3, AMDGPU_HSA_METADATA_KIND_INIT_RWIMG = 4 } amdgpu_hsa_metadata_kind_t; // AMD GPU Sampler Coordinate Normalization Enumeration Values. typedef uint8_t amdgpu_hsa_sampler_coord8_t; typedef enum { AMDGPU_HSA_SAMPLER_COORD_UNNORMALIZED = 0, AMDGPU_HSA_SAMPLER_COORD_NORMALIZED = 1 } amdgpu_hsa_sampler_coord_t; // AMD GPU Sampler Filter Enumeration Values. typedef uint8_t amdgpu_hsa_sampler_filter8_t; typedef enum { AMDGPU_HSA_SAMPLER_FILTER_NEAREST = 0, AMDGPU_HSA_SAMPLER_FILTER_LINEAR = 1 } amdgpu_hsa_sampler_filter_t; // AMD GPU Sampler Addressing Enumeration Values. typedef uint8_t amdgpu_hsa_sampler_addressing8_t; typedef enum { AMDGPU_HSA_SAMPLER_ADDRESSING_UNDEFINED = 0, AMDGPU_HSA_SAMPLER_ADDRESSING_CLAMP_TO_EDGE = 1, AMDGPU_HSA_SAMPLER_ADDRESSING_CLAMP_TO_BORDER = 2, AMDGPU_HSA_SAMPLER_ADDRESSING_REPEAT = 3, AMDGPU_HSA_SAMPLER_ADDRESSING_MIRRORED_REPEAT = 4 } amdgpu_hsa_sampler_addressing_t; // AMD GPU Sampler Descriptor. typedef struct amdgpu_hsa_sampler_descriptor_s { uint16_t size; amdgpu_hsa_metadata_kind16_t kind; amdgpu_hsa_sampler_coord8_t coord; amdgpu_hsa_sampler_filter8_t filter; amdgpu_hsa_sampler_addressing8_t addressing; uint8_t reserved1; } amdgpu_hsa_sampler_descriptor_t; // AMD GPU Image Geometry Enumeration Values. typedef uint8_t amdgpu_hsa_image_geometry8_t; typedef enum { AMDGPU_HSA_IMAGE_GEOMETRY_1D = 0, AMDGPU_HSA_IMAGE_GEOMETRY_2D = 1, AMDGPU_HSA_IMAGE_GEOMETRY_3D = 2, AMDGPU_HSA_IMAGE_GEOMETRY_1DA = 3, AMDGPU_HSA_IMAGE_GEOMETRY_2DA = 4, AMDGPU_HSA_IMAGE_GEOMETRY_1DB = 5, AMDGPU_HSA_IMAGE_GEOMETRY_2DDEPTH = 6, AMDGPU_HSA_IMAGE_GEOMETRY_2DADEPTH = 7 } amdgpu_hsa_image_geometry_t; // AMD GPU Image Channel Order Enumeration Values. typedef uint8_t amdgpu_hsa_image_channel_order8_t; typedef enum { AMDGPU_HSA_IMAGE_CHANNEL_ORDER_A = 0, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_R = 1, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_RX = 2, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_RG = 3, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_RGX = 4, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_RA = 5, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_RGB = 6, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_RGBX = 7, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_RGBA = 8, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_BGRA = 9, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_ARGB = 10, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_ABGR = 11, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_SRGB = 12, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_SRGBX = 13, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_SRGBA = 14, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_SBGRA = 15, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_INTENSITY = 16, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_LUMINANCE = 17, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_DEPTH = 18, AMDGPU_HSA_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL = 19 } amdgpu_hsa_image_channel_order_t; // AMD GPU Image Channel Type Enumeration Values. typedef uint8_t amdgpu_hsa_image_channel_type8_t; typedef enum { AMDGPU_HSA_IMAGE_CHANNEL_TYPE_SNORM_INT8 = 0, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_SNORM_INT16 = 1, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_UNORM_INT8 = 2, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_UNORM_INT16 = 3, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_UNORM_INT24 = 4, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_SHORT_555 = 5, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_SHORT_565 = 6, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_INT_101010 = 7, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_SIGNED_INT8 = 8, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_SIGNED_INT16 = 9, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_SIGNED_INT32 = 10, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8 = 11, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16 = 12, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 = 13, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_HALF_FLOAT = 14, AMDGPU_HSA_IMAGE_CHANNEL_TYPE_FLOAT = 15 } amdgpu_hsa_image_channel_type_t; // AMD GPU Image Descriptor. typedef struct amdgpu_hsa_image_descriptor_s { uint16_t size; amdgpu_hsa_metadata_kind16_t kind; amdgpu_hsa_image_geometry8_t geometry; amdgpu_hsa_image_channel_order8_t channel_order; amdgpu_hsa_image_channel_type8_t channel_type; uint8_t reserved1; uint64_t width; uint64_t height; uint64_t depth; uint64_t array; } amdgpu_hsa_image_descriptor_t; typedef struct amdgpu_hsa_note_code_object_version_s { uint32_t major_version; uint32_t minor_version; } amdgpu_hsa_note_code_object_version_t; typedef struct amdgpu_hsa_note_hsail_s { uint32_t hsail_major_version; uint32_t hsail_minor_version; uint8_t profile; uint8_t machine_model; uint8_t default_float_round; } amdgpu_hsa_note_hsail_t; typedef struct amdgpu_hsa_note_isa_s { uint16_t vendor_name_size; uint16_t architecture_name_size; uint32_t major; uint32_t minor; uint32_t stepping; char vendor_and_architecture_name[1]; } amdgpu_hsa_note_isa_t; typedef struct amdgpu_hsa_note_producer_s { uint16_t producer_name_size; uint16_t reserved; uint32_t producer_major_version; uint32_t producer_minor_version; char producer_name[1]; } amdgpu_hsa_note_producer_t; typedef struct amdgpu_hsa_note_producer_options_s { uint16_t producer_options_size; char producer_options[1]; } amdgpu_hsa_note_producer_options_t; typedef enum { AMDGPU_HSA_RODATA_GLOBAL_PROGRAM = 0, AMDGPU_HSA_RODATA_GLOBAL_AGENT, AMDGPU_HSA_RODATA_READONLY_AGENT, AMDGPU_HSA_DATA_GLOBAL_PROGRAM, AMDGPU_HSA_DATA_GLOBAL_AGENT, AMDGPU_HSA_DATA_READONLY_AGENT, AMDGPU_HSA_BSS_GLOBAL_PROGRAM, AMDGPU_HSA_BSS_GLOBAL_AGENT, AMDGPU_HSA_BSS_READONLY_AGENT, AMDGPU_HSA_SECTION_LAST, } amdgpu_hsa_elf_section_t; #endif // AMD_HSA_ELF_H ROCR-Runtime-rocm-5.0.0/src/inc/amd_hsa_kernel_code.h000066400000000000000000000305631420110115200222710ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_KERNEL_CODE_H #define AMD_HSA_KERNEL_CODE_H #include "amd_hsa_common.h" #include "hsa.h" // AMD Kernel Code Version Enumeration Values. typedef uint32_t amd_kernel_code_version32_t; enum amd_kernel_code_version_t { AMD_KERNEL_CODE_VERSION_MAJOR = 1, AMD_KERNEL_CODE_VERSION_MINOR = 1 }; // AMD Machine Kind Enumeration Values. typedef uint16_t amd_machine_kind16_t; enum amd_machine_kind_t { AMD_MACHINE_KIND_UNDEFINED = 0, AMD_MACHINE_KIND_AMDGPU = 1 }; // AMD Machine Version. typedef uint16_t amd_machine_version16_t; // AMD Float Round Mode Enumeration Values. enum amd_float_round_mode_t { AMD_FLOAT_ROUND_MODE_NEAREST_EVEN = 0, AMD_FLOAT_ROUND_MODE_PLUS_INFINITY = 1, AMD_FLOAT_ROUND_MODE_MINUS_INFINITY = 2, AMD_FLOAT_ROUND_MODE_ZERO = 3 }; // AMD Float Denorm Mode Enumeration Values. enum amd_float_denorm_mode_t { AMD_FLOAT_DENORM_MODE_FLUSH_SOURCE_OUTPUT = 0, AMD_FLOAT_DENORM_MODE_FLUSH_OUTPUT = 1, AMD_FLOAT_DENORM_MODE_FLUSH_SOURCE = 2, AMD_FLOAT_DENORM_MODE_NO_FLUSH = 3 }; // AMD Compute Program Resource Register One. typedef uint32_t amd_compute_pgm_rsrc_one32_t; enum amd_compute_pgm_rsrc_one_t { AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WORKITEM_VGPR_COUNT, 0, 6), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_PRIORITY, 10, 2), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_ROUND_MODE_32, 12, 2), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_ROUND_MODE_16_64, 14, 2), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_DENORM_MODE_32, 16, 2), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_DENORM_MODE_16_64, 18, 2), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_PRIV, 20, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_ENABLE_DX10_CLAMP, 21, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_DEBUG_MODE, 22, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_ENABLE_IEEE_MODE, 23, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_BULKY, 24, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_CDBG_USER, 25, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_ONE_RESERVED1, 26, 6) }; // AMD System VGPR Workitem ID Enumeration Values. enum amd_system_vgpr_workitem_id_t { AMD_SYSTEM_VGPR_WORKITEM_ID_X = 0, AMD_SYSTEM_VGPR_WORKITEM_ID_X_Y = 1, AMD_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2, AMD_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3 }; // AMD Compute Program Resource Register Two. typedef uint32_t amd_compute_pgm_rsrc_two32_t; enum amd_compute_pgm_rsrc_two_t { AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_PRIVATE_SEGMENT_WAVE_BYTE_OFFSET, 0, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_USER_SGPR_COUNT, 1, 5), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_TRAP_HANDLER, 6, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_ID_X, 7, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_INFO, 10, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_VGPR_WORKITEM_ID, 11, 2), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_MEMORY_VIOLATION, 14, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_GRANULATED_LDS_SIZE, 15, 9), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_INT_DIVISION_BY_ZERO, 30, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_TWO_RESERVED1, 31, 1) }; // AMD Element Byte Size Enumeration Values. enum amd_element_byte_size_t { AMD_ELEMENT_BYTE_SIZE_2 = 0, AMD_ELEMENT_BYTE_SIZE_4 = 1, AMD_ELEMENT_BYTE_SIZE_8 = 2, AMD_ELEMENT_BYTE_SIZE_16 = 3 }; // AMD Kernel Code Properties. typedef uint32_t amd_kernel_code_properties32_t; enum amd_kernel_code_properties_t { AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER, 0, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_DISPATCH_PTR, 1, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_QUEUE_PTR, 2, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_KERNARG_SEGMENT_PTR, 3, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_DISPATCH_ID, 4, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X, 7, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y, 8, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z, 9, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_RESERVED1, 10, 6), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_ENABLE_ORDERED_APPEND_GDS, 16, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_PRIVATE_ELEMENT_SIZE, 17, 2), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_IS_PTR64, 19, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_IS_DYNAMIC_CALLSTACK, 20, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_IS_DEBUG_ENABLED, 21, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_IS_XNACK_ENABLED, 22, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTIES_RESERVED2, 23, 9) }; // AMD Power Of Two Enumeration Values. typedef uint8_t amd_powertwo8_t; enum amd_powertwo_t { AMD_POWERTWO_1 = 0, AMD_POWERTWO_2 = 1, AMD_POWERTWO_4 = 2, AMD_POWERTWO_8 = 3, AMD_POWERTWO_16 = 4, AMD_POWERTWO_32 = 5, AMD_POWERTWO_64 = 6, AMD_POWERTWO_128 = 7, AMD_POWERTWO_256 = 8 }; // AMD Enabled Control Directive Enumeration Values. typedef uint64_t amd_enabled_control_directive64_t; enum amd_enabled_control_directive_t { AMD_ENABLED_CONTROL_DIRECTIVE_ENABLE_BREAK_EXCEPTIONS = 1, AMD_ENABLED_CONTROL_DIRECTIVE_ENABLE_DETECT_EXCEPTIONS = 2, AMD_ENABLED_CONTROL_DIRECTIVE_MAX_DYNAMIC_GROUP_SIZE = 4, AMD_ENABLED_CONTROL_DIRECTIVE_MAX_FLAT_GRID_SIZE = 8, AMD_ENABLED_CONTROL_DIRECTIVE_MAX_FLAT_WORKGROUP_SIZE = 16, AMD_ENABLED_CONTROL_DIRECTIVE_REQUIRED_DIM = 32, AMD_ENABLED_CONTROL_DIRECTIVE_REQUIRED_GRID_SIZE = 64, AMD_ENABLED_CONTROL_DIRECTIVE_REQUIRED_WORKGROUP_SIZE = 128, AMD_ENABLED_CONTROL_DIRECTIVE_REQUIRE_NO_PARTIAL_WORKGROUPS = 256 }; // AMD Exception Kind Enumeration Values. typedef uint16_t amd_exception_kind16_t; enum amd_exception_kind_t { AMD_EXCEPTION_KIND_INVALID_OPERATION = 1, AMD_EXCEPTION_KIND_DIVISION_BY_ZERO = 2, AMD_EXCEPTION_KIND_OVERFLOW = 4, AMD_EXCEPTION_KIND_UNDERFLOW = 8, AMD_EXCEPTION_KIND_INEXACT = 16 }; // AMD Control Directives. #define AMD_CONTROL_DIRECTIVES_ALIGN_BYTES 64 #define AMD_CONTROL_DIRECTIVES_ALIGN __ALIGNED__(AMD_CONTROL_DIRECTIVES_ALIGN_BYTES) typedef AMD_CONTROL_DIRECTIVES_ALIGN struct amd_control_directives_s { amd_enabled_control_directive64_t enabled_control_directives; uint16_t enable_break_exceptions; uint16_t enable_detect_exceptions; uint32_t max_dynamic_group_size; uint64_t max_flat_grid_size; uint32_t max_flat_workgroup_size; uint8_t required_dim; uint8_t reserved1[3]; uint64_t required_grid_size[3]; uint32_t required_workgroup_size[3]; uint8_t reserved2[60]; } amd_control_directives_t; // AMD Kernel Code. #define AMD_ISA_ALIGN_BYTES 256 #define AMD_KERNEL_CODE_ALIGN_BYTES 64 #define AMD_KERNEL_CODE_ALIGN __ALIGNED__(AMD_KERNEL_CODE_ALIGN_BYTES) typedef AMD_KERNEL_CODE_ALIGN struct amd_kernel_code_s { amd_kernel_code_version32_t amd_kernel_code_version_major; amd_kernel_code_version32_t amd_kernel_code_version_minor; amd_machine_kind16_t amd_machine_kind; amd_machine_version16_t amd_machine_version_major; amd_machine_version16_t amd_machine_version_minor; amd_machine_version16_t amd_machine_version_stepping; int64_t kernel_code_entry_byte_offset; int64_t kernel_code_prefetch_byte_offset; uint64_t kernel_code_prefetch_byte_size; uint64_t max_scratch_backing_memory_byte_size; amd_compute_pgm_rsrc_one32_t compute_pgm_rsrc1; amd_compute_pgm_rsrc_two32_t compute_pgm_rsrc2; amd_kernel_code_properties32_t kernel_code_properties; uint32_t workitem_private_segment_byte_size; uint32_t workgroup_group_segment_byte_size; uint32_t gds_segment_byte_size; uint64_t kernarg_segment_byte_size; uint32_t workgroup_fbarrier_count; uint16_t wavefront_sgpr_count; uint16_t workitem_vgpr_count; uint16_t reserved_vgpr_first; uint16_t reserved_vgpr_count; uint16_t reserved_sgpr_first; uint16_t reserved_sgpr_count; uint16_t debug_wavefront_private_segment_offset_sgpr; uint16_t debug_private_segment_buffer_sgpr; amd_powertwo8_t kernarg_segment_alignment; amd_powertwo8_t group_segment_alignment; amd_powertwo8_t private_segment_alignment; amd_powertwo8_t wavefront_size; int32_t call_convention; uint8_t reserved1[12]; uint64_t runtime_loader_kernel_symbol; amd_control_directives_t control_directives; } amd_kernel_code_t; // TODO: this struct should be completely gone once debugger designs/implements // Debugger APIs. typedef struct amd_runtime_loader_debug_info_s { const void* elf_raw; size_t elf_size; const char *kernel_name; const void *owning_segment; } amd_runtime_loader_debug_info_t; #endif // AMD_HSA_KERNEL_CODE_H ROCR-Runtime-rocm-5.0.0/src/inc/amd_hsa_queue.h000066400000000000000000000071401420110115200211360ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_QUEUE_H #define AMD_HSA_QUEUE_H #include "amd_hsa_common.h" #include "hsa.h" // AMD Queue Properties. typedef uint32_t amd_queue_properties32_t; enum amd_queue_properties_t { AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_QUEUE_PROPERTIES_ENABLE_TRAP_HANDLER, 0, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_QUEUE_PROPERTIES_IS_PTR64, 1, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_QUEUE_PROPERTIES_ENABLE_TRAP_HANDLER_DEBUG_SGPRS, 2, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_QUEUE_PROPERTIES_ENABLE_PROFILING, 3, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_QUEUE_PROPERTIES_USE_SCRATCH_ONCE, 4, 1), AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_QUEUE_PROPERTIES_RESERVED1, 5, 27) }; // AMD Queue. #define AMD_QUEUE_ALIGN_BYTES 64 #define AMD_QUEUE_ALIGN __ALIGNED__(AMD_QUEUE_ALIGN_BYTES) typedef struct AMD_QUEUE_ALIGN amd_queue_s { hsa_queue_t hsa_queue; uint32_t reserved1[4]; volatile uint64_t write_dispatch_id; uint32_t group_segment_aperture_base_hi; uint32_t private_segment_aperture_base_hi; uint32_t max_cu_id; uint32_t max_wave_id; volatile uint64_t max_legacy_doorbell_dispatch_id_plus_1; volatile uint32_t legacy_doorbell_lock; uint32_t reserved2[9]; volatile uint64_t read_dispatch_id; uint32_t read_dispatch_id_field_base_byte_offset; uint32_t compute_tmpring_size; uint32_t scratch_resource_descriptor[4]; uint64_t scratch_backing_memory_location; uint64_t scratch_backing_memory_byte_size; uint32_t scratch_wave64_lane_byte_size; amd_queue_properties32_t queue_properties; uint32_t reserved3[2]; hsa_signal_t queue_inactive_signal; uint32_t reserved4[14]; } amd_queue_t; #endif // AMD_HSA_QUEUE_H ROCR-Runtime-rocm-5.0.0/src/inc/amd_hsa_signal.h000066400000000000000000000056031420110115200212710ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_SIGNAL_H #define AMD_HSA_SIGNAL_H #include "amd_hsa_common.h" #include "amd_hsa_queue.h" // AMD Signal Kind Enumeration Values. typedef int64_t amd_signal_kind64_t; enum amd_signal_kind_t { AMD_SIGNAL_KIND_INVALID = 0, AMD_SIGNAL_KIND_USER = 1, AMD_SIGNAL_KIND_DOORBELL = -1, AMD_SIGNAL_KIND_LEGACY_DOORBELL = -2 }; // AMD Signal. #define AMD_SIGNAL_ALIGN_BYTES 64 #define AMD_SIGNAL_ALIGN __ALIGNED__(AMD_SIGNAL_ALIGN_BYTES) typedef struct AMD_SIGNAL_ALIGN amd_signal_s { amd_signal_kind64_t kind; union { volatile int64_t value; volatile uint32_t* legacy_hardware_doorbell_ptr; volatile uint64_t* hardware_doorbell_ptr; }; uint64_t event_mailbox_ptr; uint32_t event_id; uint32_t reserved1; uint64_t start_ts; uint64_t end_ts; union { amd_queue_t* queue_ptr; uint64_t reserved2; }; uint32_t reserved3[2]; } amd_signal_t; #endif // AMD_HSA_SIGNAL_H ROCR-Runtime-rocm-5.0.0/src/inc/hsa.h000066400000000000000000005603431420110115200171220ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_INC_HSA_H_ #define HSA_RUNTIME_INC_HSA_H_ #include /* size_t */ #include /* uintXX_t */ #ifndef __cplusplus #include /* bool */ #endif /* __cplusplus */ // Placeholder for calling convention and import/export macros #ifndef HSA_CALL #define HSA_CALL #endif #ifndef HSA_EXPORT_DECORATOR #ifdef __GNUC__ #define HSA_EXPORT_DECORATOR __attribute__ ((visibility ("default"))) #else #define HSA_EXPORT_DECORATOR #endif #endif #define HSA_API_EXPORT HSA_EXPORT_DECORATOR HSA_CALL #define HSA_API_IMPORT HSA_CALL #if !defined(HSA_API) && defined(HSA_EXPORT) #define HSA_API HSA_API_EXPORT #else #define HSA_API HSA_API_IMPORT #endif // Detect and set large model builds. #undef HSA_LARGE_MODEL #if defined(__LP64__) || defined(_M_X64) #define HSA_LARGE_MODEL #endif // Try to detect CPU endianness #if !defined(LITTLEENDIAN_CPU) && !defined(BIGENDIAN_CPU) #if defined(__i386__) || defined(__x86_64__) || defined(_M_IX86) || \ defined(_M_X64) #define LITTLEENDIAN_CPU #endif #endif #undef HSA_LITTLE_ENDIAN #if defined(LITTLEENDIAN_CPU) #define HSA_LITTLE_ENDIAN #elif defined(BIGENDIAN_CPU) #else #error "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined" #endif #ifndef HSA_DEPRECATED #define HSA_DEPRECATED //#ifdef __GNUC__ //#define HSA_DEPRECATED __attribute__((deprecated)) //#else //#define HSA_DEPRECATED __declspec(deprecated) //#endif #endif #define HSA_VERSION_1_0 1 #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ /** \defgroup status Runtime Notifications * @{ */ /** * @brief Status codes. */ typedef enum { /** * The function has been executed successfully. */ HSA_STATUS_SUCCESS = 0x0, /** * A traversal over a list of elements has been interrupted by the * application before completing. */ HSA_STATUS_INFO_BREAK = 0x1, /** * A generic error has occurred. */ HSA_STATUS_ERROR = 0x1000, /** * One of the actual arguments does not meet a precondition stated in the * documentation of the corresponding formal argument. */ HSA_STATUS_ERROR_INVALID_ARGUMENT = 0x1001, /** * The requested queue creation is not valid. */ HSA_STATUS_ERROR_INVALID_QUEUE_CREATION = 0x1002, /** * The requested allocation is not valid. */ HSA_STATUS_ERROR_INVALID_ALLOCATION = 0x1003, /** * The agent is invalid. */ HSA_STATUS_ERROR_INVALID_AGENT = 0x1004, /** * The memory region is invalid. */ HSA_STATUS_ERROR_INVALID_REGION = 0x1005, /** * The signal is invalid. */ HSA_STATUS_ERROR_INVALID_SIGNAL = 0x1006, /** * The queue is invalid. */ HSA_STATUS_ERROR_INVALID_QUEUE = 0x1007, /** * The HSA runtime failed to allocate the necessary resources. This error * may also occur when the HSA runtime needs to spawn threads or create * internal OS-specific events. */ HSA_STATUS_ERROR_OUT_OF_RESOURCES = 0x1008, /** * The AQL packet is malformed. */ HSA_STATUS_ERROR_INVALID_PACKET_FORMAT = 0x1009, /** * An error has been detected while releasing a resource. */ HSA_STATUS_ERROR_RESOURCE_FREE = 0x100A, /** * An API other than ::hsa_init has been invoked while the reference count * of the HSA runtime is 0. */ HSA_STATUS_ERROR_NOT_INITIALIZED = 0x100B, /** * The maximum reference count for the object has been reached. */ HSA_STATUS_ERROR_REFCOUNT_OVERFLOW = 0x100C, /** * The arguments passed to a functions are not compatible. */ HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS = 0x100D, /** * The index is invalid. */ HSA_STATUS_ERROR_INVALID_INDEX = 0x100E, /** * The instruction set architecture is invalid. */ HSA_STATUS_ERROR_INVALID_ISA = 0x100F, /** * The instruction set architecture name is invalid. */ HSA_STATUS_ERROR_INVALID_ISA_NAME = 0x1017, /** * The code object is invalid. */ HSA_STATUS_ERROR_INVALID_CODE_OBJECT = 0x1010, /** * The executable is invalid. */ HSA_STATUS_ERROR_INVALID_EXECUTABLE = 0x1011, /** * The executable is frozen. */ HSA_STATUS_ERROR_FROZEN_EXECUTABLE = 0x1012, /** * There is no symbol with the given name. */ HSA_STATUS_ERROR_INVALID_SYMBOL_NAME = 0x1013, /** * The variable is already defined. */ HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED = 0x1014, /** * The variable is undefined. */ HSA_STATUS_ERROR_VARIABLE_UNDEFINED = 0x1015, /** * An HSAIL operation resulted in a hardware exception. */ HSA_STATUS_ERROR_EXCEPTION = 0x1016, /** * The code object symbol is invalid. */ HSA_STATUS_ERROR_INVALID_CODE_SYMBOL = 0x1018, /** * The executable symbol is invalid. */ HSA_STATUS_ERROR_INVALID_EXECUTABLE_SYMBOL = 0x1019, /** * The file descriptor is invalid. */ HSA_STATUS_ERROR_INVALID_FILE = 0x1020, /** * The code object reader is invalid. */ HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER = 0x1021, /** * The cache is invalid. */ HSA_STATUS_ERROR_INVALID_CACHE = 0x1022, /** * The wavefront is invalid. */ HSA_STATUS_ERROR_INVALID_WAVEFRONT = 0x1023, /** * The signal group is invalid. */ HSA_STATUS_ERROR_INVALID_SIGNAL_GROUP = 0x1024, /** * The HSA runtime is not in the configuration state. */ HSA_STATUS_ERROR_INVALID_RUNTIME_STATE = 0x1025, /** * The queue received an error that may require process termination. */ HSA_STATUS_ERROR_FATAL = 0x1026 } hsa_status_t; /** * @brief Query additional information about a status code. * * @param[in] status Status code. * * @param[out] status_string A NUL-terminated string that describes the error * status. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p status is an invalid * status code, or @p status_string is NULL. */ hsa_status_t HSA_API hsa_status_string( hsa_status_t status, const char ** status_string); /** @} */ /** \defgroup common Common Definitions * @{ */ /** * @brief Three-dimensional coordinate. */ typedef struct hsa_dim3_s { /** * X dimension. */ uint32_t x; /** * Y dimension. */ uint32_t y; /** * Z dimension. */ uint32_t z; } hsa_dim3_t; /** * @brief Access permissions. */ typedef enum { /** * Read-only access. */ HSA_ACCESS_PERMISSION_RO = 1, /** * Write-only access. */ HSA_ACCESS_PERMISSION_WO = 2, /** * Read and write access. */ HSA_ACCESS_PERMISSION_RW = 3 } hsa_access_permission_t; /** * @brief POSIX file descriptor. */ typedef int hsa_file_t; /** @} **/ /** \defgroup initshutdown Initialization and Shut Down * @{ */ /** * @brief Initialize the HSA runtime. * * @details Initializes the HSA runtime if it is not already initialized, and * increases the reference counter associated with the HSA runtime for the * current process. Invocation of any HSA function other than ::hsa_init results * in undefined behavior if the current HSA runtime reference counter is less * than one. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_REFCOUNT_OVERFLOW The HSA runtime reference * count reaches INT32_MAX. */ hsa_status_t HSA_API hsa_init(); /** * @brief Shut down the HSA runtime. * * @details Decreases the reference count of the HSA runtime instance. When the * reference count reaches 0, the HSA runtime is no longer considered valid * but the application might call ::hsa_init to initialize the HSA runtime * again. * * Once the reference count of the HSA runtime reaches 0, all the resources * associated with it (queues, signals, agent information, etc.) are * considered invalid and any attempt to reference them in subsequent API calls * results in undefined behavior. When the reference count reaches 0, the HSA * runtime may release resources associated with it. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * */ hsa_status_t HSA_API hsa_shut_down(); /** @} **/ /** \defgroup agentinfo System and Agent Information * @{ */ /** * @brief Endianness. A convention used to interpret the bytes making up a data * word. */ typedef enum { /** * The least significant byte is stored in the smallest address. */ HSA_ENDIANNESS_LITTLE = 0, /** * The most significant byte is stored in the smallest address. */ HSA_ENDIANNESS_BIG = 1 } hsa_endianness_t; /** * @brief Machine model. A machine model determines the size of certain data * types in HSA runtime and an agent. */ typedef enum { /** * Small machine model. Addresses use 32 bits. */ HSA_MACHINE_MODEL_SMALL = 0, /** * Large machine model. Addresses use 64 bits. */ HSA_MACHINE_MODEL_LARGE = 1 } hsa_machine_model_t; /** * @brief Profile. A profile indicates a particular level of feature * support. For example, in the base profile the application must use the HSA * runtime allocator to reserve shared virtual memory, while in the full profile * any host pointer can be shared across all the agents. */ typedef enum { /** * Base profile. */ HSA_PROFILE_BASE = 0, /** * Full profile. */ HSA_PROFILE_FULL = 1 } hsa_profile_t; /** * @brief System attributes. */ typedef enum { /** * Major version of the HSA runtime specification supported by the * implementation. The type of this attribute is uint16_t. */ HSA_SYSTEM_INFO_VERSION_MAJOR = 0, /** * Minor version of the HSA runtime specification supported by the * implementation. The type of this attribute is uint16_t. */ HSA_SYSTEM_INFO_VERSION_MINOR = 1, /** * Current timestamp. The value of this attribute monotonically increases at a * constant rate. The type of this attribute is uint64_t. */ HSA_SYSTEM_INFO_TIMESTAMP = 2, /** * Timestamp value increase rate, in Hz. The timestamp (clock) frequency is * in the range 1-400MHz. The type of this attribute is uint64_t. */ HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY = 3, /** * Maximum duration of a signal wait operation. Expressed as a count based on * the timestamp frequency. The type of this attribute is uint64_t. */ HSA_SYSTEM_INFO_SIGNAL_MAX_WAIT = 4, /** * Endianness of the system. The type of this attribute is ::hsa_endianness_t. */ HSA_SYSTEM_INFO_ENDIANNESS = 5, /** * Machine model supported by the HSA runtime. The type of this attribute is * ::hsa_machine_model_t. */ HSA_SYSTEM_INFO_MACHINE_MODEL = 6, /** * Bit-mask indicating which extensions are supported by the * implementation. An extension with an ID of @p i is supported if the bit at * position @p i is set. The type of this attribute is uint8_t[128]. */ HSA_SYSTEM_INFO_EXTENSIONS = 7, /** * String containing the ROCr build identifier. */ HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200, /** * Returns true if hsa_amd_svm_* APIs are supported by the driver. The type of * this attribute is bool. */ HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED = 0x201, // TODO: Should this be per Agent? /** * Returns true if all Agents have access to system allocated memory (such as * that allocated by mmap, malloc, or new) by default. * If false then system allocated memory may only be made SVM accessible to * an Agent by declaration of accessibility with hsa_amd_svm_set_attributes. * The type of this attribute is bool. */ HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202 } hsa_system_info_t; /** * @brief Get the current value of a system attribute. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * system attribute, or @p value is NULL. */ hsa_status_t HSA_API hsa_system_get_info( hsa_system_info_t attribute, void* value); /** * @brief HSA extensions. */ typedef enum { /** * Finalizer extension. */ HSA_EXTENSION_FINALIZER = 0, /** * Images extension. */ HSA_EXTENSION_IMAGES = 1, /** * Performance counter extension. */ HSA_EXTENSION_PERFORMANCE_COUNTERS = 2, /** * Profiling events extension. */ HSA_EXTENSION_PROFILING_EVENTS = 3, /** * Extension count. */ HSA_EXTENSION_STD_LAST = 3, /** * First AMD extension number. */ HSA_AMD_FIRST_EXTENSION = 0x200, /** * Profiler extension. */ HSA_EXTENSION_AMD_PROFILER = 0x200, /** * Loader extension. */ HSA_EXTENSION_AMD_LOADER = 0x201, /** * AqlProfile extension. */ HSA_EXTENSION_AMD_AQLPROFILE = 0x202, /** * Last AMD extension. */ HSA_AMD_LAST_EXTENSION = 0x202 } hsa_extension_t; /** * @brief Query the name of a given extension. * * @param[in] extension Extension identifier. If the extension is not supported * by the implementation (see ::HSA_SYSTEM_INFO_EXTENSIONS), the behavior * is undefined. * * @param[out] name Pointer to a memory location where the HSA runtime stores * the extension name. The extension name is a NUL-terminated string. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p extension is not a valid * extension, or @p name is NULL. */ hsa_status_t HSA_API hsa_extension_get_name( uint16_t extension, const char **name); /** * @deprecated * * @brief Query if a given version of an extension is supported by the HSA * implementation. * * @param[in] extension Extension identifier. * * @param[in] version_major Major version number. * * @param[in] version_minor Minor version number. * * @param[out] result Pointer to a memory location where the HSA runtime stores * the result of the check. The result is true if the specified version of the * extension is supported, and false otherwise. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p extension is not a valid * extension, or @p result is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_system_extension_supported( uint16_t extension, uint16_t version_major, uint16_t version_minor, bool* result); /** * @brief Query if a given version of an extension is supported by the HSA * implementation. All minor versions from 0 up to the returned @p version_minor * must be supported by the implementation. * * @param[in] extension Extension identifier. * * @param[in] version_major Major version number. * * @param[out] version_minor Minor version number. * * @param[out] result Pointer to a memory location where the HSA runtime stores * the result of the check. The result is true if the specified version of the * extension is supported, and false otherwise. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p extension is not a valid * extension, or @p version_minor is NULL, or @p result is NULL. */ hsa_status_t HSA_API hsa_system_major_extension_supported( uint16_t extension, uint16_t version_major, uint16_t *version_minor, bool* result); /** * @deprecated * * @brief Retrieve the function pointers corresponding to a given version of an * extension. Portable applications are expected to invoke the extension API * using the returned function pointers * * @details The application is responsible for verifying that the given version * of the extension is supported by the HSA implementation (see * ::hsa_system_extension_supported). If the given combination of extension, * major version, and minor version is not supported by the implementation, the * behavior is undefined. * * @param[in] extension Extension identifier. * * @param[in] version_major Major version number for which to retrieve the * function pointer table. * * @param[in] version_minor Minor version number for which to retrieve the * function pointer table. * * @param[out] table Pointer to an application-allocated function pointer table * that is populated by the HSA runtime. Must not be NULL. The memory associated * with table can be reused or freed after the function returns. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p extension is not a valid * extension, or @p table is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_system_get_extension_table( uint16_t extension, uint16_t version_major, uint16_t version_minor, void *table); /** * @brief Retrieve the function pointers corresponding to a given major version * of an extension. Portable applications are expected to invoke the extension * API using the returned function pointers. * * @details The application is responsible for verifying that the given major * version of the extension is supported by the HSA implementation (see * ::hsa_system_major_extension_supported). If the given combination of extension * and major version is not supported by the implementation, the behavior is * undefined. Additionally if the length doesn't allow space for a full minor * version, it is implementation defined if only some of the function pointers for * that minor version get written. * * @param[in] extension Extension identifier. * * @param[in] version_major Major version number for which to retrieve the * function pointer table. * * @param[in] table_length Size in bytes of the function pointer table to be * populated. The implementation will not write more than this many bytes to the * table. * * @param[out] table Pointer to an application-allocated function pointer table * that is populated by the HSA runtime. Must not be NULL. The memory associated * with table can be reused or freed after the function returns. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p extension is not a valid * extension, or @p table is NULL. */ hsa_status_t HSA_API hsa_system_get_major_extension_table( uint16_t extension, uint16_t version_major, size_t table_length, void *table); /** * @brief Struct containing an opaque handle to an agent, a device that participates in * the HSA memory model. An agent can submit AQL packets for execution, and * may also accept AQL packets for execution (agent dispatch packets or kernel * dispatch packets launching HSAIL-derived binaries). */ typedef struct hsa_agent_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_agent_t; /** * @brief Agent features. */ typedef enum { /** * The agent supports AQL packets of kernel dispatch type. If this * feature is enabled, the agent is also a kernel agent. */ HSA_AGENT_FEATURE_KERNEL_DISPATCH = 1, /** * The agent supports AQL packets of agent dispatch type. */ HSA_AGENT_FEATURE_AGENT_DISPATCH = 2 } hsa_agent_feature_t; /** * @brief Hardware device type. */ typedef enum { /** * CPU device. */ HSA_DEVICE_TYPE_CPU = 0, /** * GPU device. */ HSA_DEVICE_TYPE_GPU = 1, /** * DSP device. */ HSA_DEVICE_TYPE_DSP = 2 } hsa_device_type_t; /** * @brief Default floating-point rounding mode. */ typedef enum { /** * Use a default floating-point rounding mode specified elsewhere. */ HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT = 0, /** * Operations that specify the default floating-point mode are rounded to zero * by default. */ HSA_DEFAULT_FLOAT_ROUNDING_MODE_ZERO = 1, /** * Operations that specify the default floating-point mode are rounded to the * nearest representable number and that ties should be broken by selecting * the value with an even least significant bit. */ HSA_DEFAULT_FLOAT_ROUNDING_MODE_NEAR = 2 } hsa_default_float_rounding_mode_t; /** * @brief Agent attributes. */ typedef enum { /** * Agent name. The type of this attribute is a NUL-terminated char[64]. The * name must be at most 63 characters long (not including the NUL terminator) * and all array elements not used for the name must be NUL. */ HSA_AGENT_INFO_NAME = 0, /** * Name of vendor. The type of this attribute is a NUL-terminated char[64]. * The name must be at most 63 characters long (not including the NUL * terminator) and all array elements not used for the name must be NUL. */ HSA_AGENT_INFO_VENDOR_NAME = 1, /** * Agent capability. The type of this attribute is ::hsa_agent_feature_t. */ HSA_AGENT_INFO_FEATURE = 2, /** * @deprecated Query ::HSA_ISA_INFO_MACHINE_MODELS for a given intruction set * architecture supported by the agent instead. If more than one ISA is * supported by the agent, the returned value corresponds to the first ISA * enumerated by ::hsa_agent_iterate_isas. * * Machine model supported by the agent. The type of this attribute is * ::hsa_machine_model_t. */ HSA_AGENT_INFO_MACHINE_MODEL = 3, /** * @deprecated Query ::HSA_ISA_INFO_PROFILES for a given intruction set * architecture supported by the agent instead. If more than one ISA is * supported by the agent, the returned value corresponds to the first ISA * enumerated by ::hsa_agent_iterate_isas. * * Profile supported by the agent. The type of this attribute is * ::hsa_profile_t. */ HSA_AGENT_INFO_PROFILE = 4, /** * @deprecated Query ::HSA_ISA_INFO_DEFAULT_FLOAT_ROUNDING_MODES for a given * intruction set architecture supported by the agent instead. If more than * one ISA is supported by the agent, the returned value corresponds to the * first ISA enumerated by ::hsa_agent_iterate_isas. * * Default floating-point rounding mode. The type of this attribute is * ::hsa_default_float_rounding_mode_t, but the value * ::HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT is not allowed. */ HSA_AGENT_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 5, /** * @deprecated Query ::HSA_ISA_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES * for a given intruction set architecture supported by the agent instead. If * more than one ISA is supported by the agent, the returned value corresponds * to the first ISA enumerated by ::hsa_agent_iterate_isas. * * A bit-mask of ::hsa_default_float_rounding_mode_t values, representing the * default floating-point rounding modes supported by the agent in the Base * profile. The type of this attribute is uint32_t. The default floating-point * rounding mode (::HSA_AGENT_INFO_DEFAULT_FLOAT_ROUNDING_MODE) bit must not * be set. */ HSA_AGENT_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES = 23, /** * @deprecated Query ::HSA_ISA_INFO_FAST_F16_OPERATION for a given intruction * set architecture supported by the agent instead. If more than one ISA is * supported by the agent, the returned value corresponds to the first ISA * enumerated by ::hsa_agent_iterate_isas. * * Flag indicating that the f16 HSAIL operation is at least as fast as the * f32 operation in the current agent. The value of this attribute is * undefined if the agent is not a kernel agent. The type of this * attribute is bool. */ HSA_AGENT_INFO_FAST_F16_OPERATION = 24, /** * @deprecated Query ::HSA_WAVEFRONT_INFO_SIZE for a given wavefront and * intruction set architecture supported by the agent instead. If more than * one ISA is supported by the agent, the returned value corresponds to the * first ISA enumerated by ::hsa_agent_iterate_isas and the first wavefront * enumerated by ::hsa_isa_iterate_wavefronts for that ISA. * * Number of work-items in a wavefront. Must be a power of 2 in the range * [1,256]. The value of this attribute is undefined if the agent is not * a kernel agent. The type of this attribute is uint32_t. */ HSA_AGENT_INFO_WAVEFRONT_SIZE = 6, /** * @deprecated Query ::HSA_ISA_INFO_WORKGROUP_MAX_DIM for a given intruction * set architecture supported by the agent instead. If more than one ISA is * supported by the agent, the returned value corresponds to the first ISA * enumerated by ::hsa_agent_iterate_isas. * * Maximum number of work-items of each dimension of a work-group. Each * maximum must be greater than 0. No maximum can exceed the value of * ::HSA_AGENT_INFO_WORKGROUP_MAX_SIZE. The value of this attribute is * undefined if the agent is not a kernel agent. The type of this * attribute is uint16_t[3]. */ HSA_AGENT_INFO_WORKGROUP_MAX_DIM = 7, /** * @deprecated Query ::HSA_ISA_INFO_WORKGROUP_MAX_SIZE for a given intruction * set architecture supported by the agent instead. If more than one ISA is * supported by the agent, the returned value corresponds to the first ISA * enumerated by ::hsa_agent_iterate_isas. * * Maximum total number of work-items in a work-group. The value of this * attribute is undefined if the agent is not a kernel agent. The type * of this attribute is uint32_t. */ HSA_AGENT_INFO_WORKGROUP_MAX_SIZE = 8, /** * @deprecated Query ::HSA_ISA_INFO_GRID_MAX_DIM for a given intruction set * architecture supported by the agent instead. * * Maximum number of work-items of each dimension of a grid. Each maximum must * be greater than 0, and must not be smaller than the corresponding value in * ::HSA_AGENT_INFO_WORKGROUP_MAX_DIM. No maximum can exceed the value of * ::HSA_AGENT_INFO_GRID_MAX_SIZE. The value of this attribute is undefined * if the agent is not a kernel agent. The type of this attribute is * ::hsa_dim3_t. */ HSA_AGENT_INFO_GRID_MAX_DIM = 9, /** * @deprecated Query ::HSA_ISA_INFO_GRID_MAX_SIZE for a given intruction set * architecture supported by the agent instead. If more than one ISA is * supported by the agent, the returned value corresponds to the first ISA * enumerated by ::hsa_agent_iterate_isas. * * Maximum total number of work-items in a grid. The value of this attribute * is undefined if the agent is not a kernel agent. The type of this * attribute is uint32_t. */ HSA_AGENT_INFO_GRID_MAX_SIZE = 10, /** * @deprecated Query ::HSA_ISA_INFO_FBARRIER_MAX_SIZE for a given intruction * set architecture supported by the agent instead. If more than one ISA is * supported by the agent, the returned value corresponds to the first ISA * enumerated by ::hsa_agent_iterate_isas. * * Maximum number of fbarriers per work-group. Must be at least 32. The value * of this attribute is undefined if the agent is not a kernel agent. The * type of this attribute is uint32_t. */ HSA_AGENT_INFO_FBARRIER_MAX_SIZE = 11, /** * @deprecated The maximum number of queues is not statically determined. * * Maximum number of queues that can be active (created but not destroyed) at * one time in the agent. The type of this attribute is uint32_t. */ HSA_AGENT_INFO_QUEUES_MAX = 12, /** * Minimum number of packets that a queue created in the agent * can hold. Must be a power of 2 greater than 0. Must not exceed * the value of ::HSA_AGENT_INFO_QUEUE_MAX_SIZE. The type of this * attribute is uint32_t. */ HSA_AGENT_INFO_QUEUE_MIN_SIZE = 13, /** * Maximum number of packets that a queue created in the agent can * hold. Must be a power of 2 greater than 0. The type of this attribute * is uint32_t. */ HSA_AGENT_INFO_QUEUE_MAX_SIZE = 14, /** * Type of a queue created in the agent. The type of this attribute is * ::hsa_queue_type32_t. */ HSA_AGENT_INFO_QUEUE_TYPE = 15, /** * @deprecated NUMA information is not exposed anywhere else in the API. * * Identifier of the NUMA node associated with the agent. The type of this * attribute is uint32_t. */ HSA_AGENT_INFO_NODE = 16, /** * Type of hardware device associated with the agent. The type of this * attribute is ::hsa_device_type_t. */ HSA_AGENT_INFO_DEVICE = 17, /** * @deprecated Query ::hsa_agent_iterate_caches to retrieve information about * the caches present in a given agent. * * Array of data cache sizes (L1..L4). Each size is expressed in bytes. A size * of 0 for a particular level indicates that there is no cache information * for that level. The type of this attribute is uint32_t[4]. */ HSA_AGENT_INFO_CACHE_SIZE = 18, /** * @deprecated An agent may support multiple instruction set * architectures. See ::hsa_agent_iterate_isas. If more than one ISA is * supported by the agent, the returned value corresponds to the first ISA * enumerated by ::hsa_agent_iterate_isas. * * Instruction set architecture of the agent. The type of this attribute * is ::hsa_isa_t. */ HSA_AGENT_INFO_ISA = 19, /** * Bit-mask indicating which extensions are supported by the agent. An * extension with an ID of @p i is supported if the bit at position @p i is * set. The type of this attribute is uint8_t[128]. */ HSA_AGENT_INFO_EXTENSIONS = 20, /** * Major version of the HSA runtime specification supported by the * agent. The type of this attribute is uint16_t. */ HSA_AGENT_INFO_VERSION_MAJOR = 21, /** * Minor version of the HSA runtime specification supported by the * agent. The type of this attribute is uint16_t. */ HSA_AGENT_INFO_VERSION_MINOR = 22 } hsa_agent_info_t; /** * @brief Get the current value of an attribute for a given agent. * * @param[in] agent A valid agent. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * agent attribute, or @p value is NULL. */ hsa_status_t HSA_API hsa_agent_get_info( hsa_agent_t agent, hsa_agent_info_t attribute, void* value); /** * @brief Iterate over the available agents, and invoke an * application-defined callback on every iteration. * * @param[in] callback Callback to be invoked once per agent. The HSA * runtime passes two arguments to the callback: the agent and the * application data. If @p callback returns a status other than * ::HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and * ::hsa_iterate_agents returns that status value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_iterate_agents( hsa_status_t (*callback)(hsa_agent_t agent, void* data), void* data); /* // If we do not know the size of an attribute, we need to query it first // Note: this API will not be in the spec unless needed hsa_status_t HSA_API hsa_agent_get_info_size( hsa_agent_t agent, hsa_agent_info_t attribute, size_t* size); // Set the value of an agents attribute // Note: this API will not be in the spec unless needed hsa_status_t HSA_API hsa_agent_set_info( hsa_agent_t agent, hsa_agent_info_t attribute, void* value); */ /** * @brief Exception policies applied in the presence of hardware exceptions. */ typedef enum { /** * If a hardware exception is detected, a work-item signals an exception. */ HSA_EXCEPTION_POLICY_BREAK = 1, /** * If a hardware exception is detected, a hardware status bit is set. */ HSA_EXCEPTION_POLICY_DETECT = 2 } hsa_exception_policy_t; /** * @deprecated Use ::hsa_isa_get_exception_policies for a given intruction set * architecture supported by the agent instead. If more than one ISA is * supported by the agent, this function uses the first value returned by * ::hsa_agent_iterate_isas. * * @brief Retrieve the exception policy support for a given combination of * agent and profile * * @param[in] agent Agent. * * @param[in] profile Profile. * * @param[out] mask Pointer to a memory location where the HSA runtime stores a * mask of ::hsa_exception_policy_t values. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p profile is not a valid * profile, or @p mask is NULL. * */ hsa_status_t HSA_API HSA_DEPRECATED hsa_agent_get_exception_policies( hsa_agent_t agent, hsa_profile_t profile, uint16_t *mask); /** * @brief Cache handle. */ typedef struct hsa_cache_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_cache_t; /** * @brief Cache attributes. */ typedef enum { /** * The length of the cache name in bytes, not including the NUL terminator. * The type of this attribute is uint32_t. */ HSA_CACHE_INFO_NAME_LENGTH = 0, /** * Human-readable description. The type of this attribute is a NUL-terminated * character array with the length equal to the value of * ::HSA_CACHE_INFO_NAME_LENGTH attribute. */ HSA_CACHE_INFO_NAME = 1, /** * Cache level. A L1 cache must return a value of 1, a L2 must return a value * of 2, and so on. The type of this attribute is uint8_t. */ HSA_CACHE_INFO_LEVEL = 2, /** * Cache size, in bytes. A value of 0 indicates that there is no size * information available. The type of this attribute is uint32_t. */ HSA_CACHE_INFO_SIZE = 3 } hsa_cache_info_t; /** * @brief Get the current value of an attribute for a given cache object. * * @param[in] cache Cache. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CACHE The cache is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * instruction set architecture attribute, or @p value is * NULL. */ hsa_status_t HSA_API hsa_cache_get_info( hsa_cache_t cache, hsa_cache_info_t attribute, void* value); /** * @brief Iterate over the memory caches of a given agent, and * invoke an application-defined callback on every iteration. * * @details Caches are visited in ascending order according to the value of the * ::HSA_CACHE_INFO_LEVEL attribute. * * @param[in] agent A valid agent. * * @param[in] callback Callback to be invoked once per cache that is present in * the agent. The HSA runtime passes two arguments to the callback: the cache * and the application data. If @p callback returns a status other than * ::HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and * that value is returned. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_agent_iterate_caches( hsa_agent_t agent, hsa_status_t (*callback)(hsa_cache_t cache, void* data), void* data); /** * @deprecated * * @brief Query if a given version of an extension is supported by an agent * * @param[in] extension Extension identifier. * * @param[in] agent Agent. * * @param[in] version_major Major version number. * * @param[in] version_minor Minor version number. * * @param[out] result Pointer to a memory location where the HSA runtime stores * the result of the check. The result is true if the specified version of the * extension is supported, and false otherwise. The result must be false if * ::hsa_system_extension_supported returns false for the same extension * version. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p extension is not a valid * extension, or @p result is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_agent_extension_supported( uint16_t extension, hsa_agent_t agent, uint16_t version_major, uint16_t version_minor, bool* result); /** * @brief Query if a given version of an extension is supported by an agent. All * minor versions from 0 up to the returned @p version_minor must be supported. * * @param[in] extension Extension identifier. * * @param[in] agent Agent. * * @param[in] version_major Major version number. * * @param[out] version_minor Minor version number. * * @param[out] result Pointer to a memory location where the HSA runtime stores * the result of the check. The result is true if the specified version of the * extension is supported, and false otherwise. The result must be false if * ::hsa_system_extension_supported returns false for the same extension * version. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p extension is not a valid * extension, or @p version_minor is NULL, or @p result is NULL. */ hsa_status_t HSA_API hsa_agent_major_extension_supported( uint16_t extension, hsa_agent_t agent, uint16_t version_major, uint16_t *version_minor, bool* result); /** @} */ /** \defgroup signals Signals * @{ */ /** * @brief Signal handle. */ typedef struct hsa_signal_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. The value 0 is reserved. */ uint64_t handle; } hsa_signal_t; /** * @brief Signal value. The value occupies 32 bits in small machine mode, and 64 * bits in large machine mode. */ #ifdef HSA_LARGE_MODEL typedef int64_t hsa_signal_value_t; #else typedef int32_t hsa_signal_value_t; #endif /** * @brief Create a signal. * * @param[in] initial_value Initial value of the signal. * * @param[in] num_consumers Size of @p consumers. A value of 0 indicates that * any agent might wait on the signal. * * @param[in] consumers List of agents that might consume (wait on) the * signal. If @p num_consumers is 0, this argument is ignored; otherwise, the * HSA runtime might use the list to optimize the handling of the signal * object. If an agent not listed in @p consumers waits on the returned * signal, the behavior is undefined. The memory associated with @p consumers * can be reused or freed after the function returns. * * @param[out] signal Pointer to a memory location where the HSA runtime will * store the newly created signal handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p signal is NULL, @p * num_consumers is greater than 0 but @p consumers is NULL, or @p consumers * contains duplicates. */ hsa_status_t HSA_API hsa_signal_create( hsa_signal_value_t initial_value, uint32_t num_consumers, const hsa_agent_t *consumers, hsa_signal_t *signal); /** * @brief Destroy a signal previous created by ::hsa_signal_create. * * @param[in] signal Signal. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_SIGNAL @p signal is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT The handle in @p signal is 0. */ hsa_status_t HSA_API hsa_signal_destroy( hsa_signal_t signal); /** * @brief Atomically read the current value of a signal. * * @param[in] signal Signal. * * @return Value of the signal. */ hsa_signal_value_t HSA_API hsa_signal_load_scacquire( hsa_signal_t signal); /** * @copydoc hsa_signal_load_scacquire */ hsa_signal_value_t HSA_API hsa_signal_load_relaxed( hsa_signal_t signal); /** * @deprecated Renamed as ::hsa_signal_load_scacquire. * * @copydoc hsa_signal_load_scacquire */ hsa_signal_value_t HSA_API HSA_DEPRECATED hsa_signal_load_acquire( hsa_signal_t signal); /** * @brief Atomically set the value of a signal. * * @details If the value of the signal is changed, all the agents waiting * on @p signal for which @p value satisfies their wait condition are awakened. * * @param[in] signal Signal. * * @param[in] value New signal value. */ void HSA_API hsa_signal_store_relaxed( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_store_relaxed */ void HSA_API hsa_signal_store_screlease( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_store_screlease. * * @copydoc hsa_signal_store_screlease */ void HSA_API HSA_DEPRECATED hsa_signal_store_release( hsa_signal_t signal, hsa_signal_value_t value); /** * @brief Atomically set the value of a signal without necessarily notifying the * the agents waiting on it. * * @details The agents waiting on @p signal may not wake up even when the new * value satisfies their wait condition. If the application wants to update the * signal and there is no need to notify any agent, invoking this function can * be more efficient than calling the non-silent counterpart. * * @param[in] signal Signal. * * @param[in] value New signal value. */ void HSA_API hsa_signal_silent_store_relaxed( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_silent_store_relaxed */ void HSA_API hsa_signal_silent_store_screlease( hsa_signal_t signal, hsa_signal_value_t value); /** * @brief Atomically set the value of a signal and return its previous value. * * @details If the value of the signal is changed, all the agents waiting * on @p signal for which @p value satisfies their wait condition are awakened. * * @param[in] signal Signal. If @p signal is a queue doorbell signal, the * behavior is undefined. * * @param[in] value New value. * * @return Value of the signal prior to the exchange. * */ hsa_signal_value_t HSA_API hsa_signal_exchange_scacq_screl( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_exchange_scacq_screl. * * @copydoc hsa_signal_exchange_scacq_screl */ hsa_signal_value_t HSA_API HSA_DEPRECATED hsa_signal_exchange_acq_rel( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_exchange_scacq_screl */ hsa_signal_value_t HSA_API hsa_signal_exchange_scacquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_exchange_scacquire. * * @copydoc hsa_signal_exchange_scacquire */ hsa_signal_value_t HSA_API HSA_DEPRECATED hsa_signal_exchange_acquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_exchange_scacq_screl */ hsa_signal_value_t HSA_API hsa_signal_exchange_relaxed( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_exchange_scacq_screl */ hsa_signal_value_t HSA_API hsa_signal_exchange_screlease( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_exchange_screlease. * * @copydoc hsa_signal_exchange_screlease */ hsa_signal_value_t HSA_API HSA_DEPRECATED hsa_signal_exchange_release( hsa_signal_t signal, hsa_signal_value_t value); /** * @brief Atomically set the value of a signal if the observed value is equal to * the expected value. The observed value is returned regardless of whether the * replacement was done. * * @details If the value of the signal is changed, all the agents waiting * on @p signal for which @p value satisfies their wait condition are awakened. * * @param[in] signal Signal. If @p signal is a queue * doorbell signal, the behavior is undefined. * * @param[in] expected Value to compare with. * * @param[in] value New value. * * @return Observed value of the signal. * */ hsa_signal_value_t HSA_API hsa_signal_cas_scacq_screl( hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_cas_scacq_screl. * * @copydoc hsa_signal_cas_scacq_screl */ hsa_signal_value_t HSA_API HSA_DEPRECATED hsa_signal_cas_acq_rel( hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); /** * @copydoc hsa_signal_cas_scacq_screl */ hsa_signal_value_t HSA_API hsa_signal_cas_scacquire( hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_cas_scacquire. * * @copydoc hsa_signal_cas_scacquire */ hsa_signal_value_t HSA_API HSA_DEPRECATED hsa_signal_cas_acquire( hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); /** * @copydoc hsa_signal_cas_scacq_screl */ hsa_signal_value_t HSA_API hsa_signal_cas_relaxed( hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); /** * @copydoc hsa_signal_cas_scacq_screl */ hsa_signal_value_t HSA_API hsa_signal_cas_screlease( hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_cas_screlease. * * @copydoc hsa_signal_cas_screlease */ hsa_signal_value_t HSA_API HSA_DEPRECATED hsa_signal_cas_release( hsa_signal_t signal, hsa_signal_value_t expected, hsa_signal_value_t value); /** * @brief Atomically increment the value of a signal by a given amount. * * @details If the value of the signal is changed, all the agents waiting on * @p signal for which @p value satisfies their wait condition are awakened. * * @param[in] signal Signal. If @p signal is a queue doorbell signal, the * behavior is undefined. * * @param[in] value Value to add to the value of the signal. * */ void HSA_API hsa_signal_add_scacq_screl( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_add_scacq_screl. * * @copydoc hsa_signal_add_scacq_screl */ void HSA_API HSA_DEPRECATED hsa_signal_add_acq_rel( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_add_scacq_screl */ void HSA_API hsa_signal_add_scacquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_add_scacquire. * * @copydoc hsa_signal_add_scacquire */ void HSA_API HSA_DEPRECATED hsa_signal_add_acquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_add_scacq_screl */ void HSA_API hsa_signal_add_relaxed( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_add_scacq_screl */ void HSA_API hsa_signal_add_screlease( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_add_screlease. * * @copydoc hsa_signal_add_screlease */ void HSA_API HSA_DEPRECATED hsa_signal_add_release( hsa_signal_t signal, hsa_signal_value_t value); /** * @brief Atomically decrement the value of a signal by a given amount. * * @details If the value of the signal is changed, all the agents waiting on * @p signal for which @p value satisfies their wait condition are awakened. * * @param[in] signal Signal. If @p signal is a queue doorbell signal, the * behavior is undefined. * * @param[in] value Value to subtract from the value of the signal. * */ void HSA_API hsa_signal_subtract_scacq_screl( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_subtract_scacq_screl. * * @copydoc hsa_signal_subtract_scacq_screl */ void HSA_API HSA_DEPRECATED hsa_signal_subtract_acq_rel( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_subtract_scacq_screl */ void HSA_API hsa_signal_subtract_scacquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_subtract_scacquire. * * @copydoc hsa_signal_subtract_scacquire */ void HSA_API HSA_DEPRECATED hsa_signal_subtract_acquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_subtract_scacq_screl */ void HSA_API hsa_signal_subtract_relaxed( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_subtract_scacq_screl */ void HSA_API hsa_signal_subtract_screlease( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_subtract_screlease. * * @copydoc hsa_signal_subtract_screlease */ void HSA_API HSA_DEPRECATED hsa_signal_subtract_release( hsa_signal_t signal, hsa_signal_value_t value); /** * @brief Atomically perform a bitwise AND operation between the value of a * signal and a given value. * * @details If the value of the signal is changed, all the agents waiting on * @p signal for which @p value satisfies their wait condition are awakened. * * @param[in] signal Signal. If @p signal is a queue doorbell signal, the * behavior is undefined. * * @param[in] value Value to AND with the value of the signal. * */ void HSA_API hsa_signal_and_scacq_screl( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_and_scacq_screl. * * @copydoc hsa_signal_and_scacq_screl */ void HSA_API HSA_DEPRECATED hsa_signal_and_acq_rel( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_and_scacq_screl */ void HSA_API hsa_signal_and_scacquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_and_scacquire. * * @copydoc hsa_signal_and_scacquire */ void HSA_API HSA_DEPRECATED hsa_signal_and_acquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_and_scacq_screl */ void HSA_API hsa_signal_and_relaxed( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_and_scacq_screl */ void HSA_API hsa_signal_and_screlease( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_and_screlease. * * @copydoc hsa_signal_and_screlease */ void HSA_API HSA_DEPRECATED hsa_signal_and_release( hsa_signal_t signal, hsa_signal_value_t value); /** * @brief Atomically perform a bitwise OR operation between the value of a * signal and a given value. * * @details If the value of the signal is changed, all the agents waiting on * @p signal for which @p value satisfies their wait condition are awakened. * * @param[in] signal Signal. If @p signal is a queue doorbell signal, the * behavior is undefined. * * @param[in] value Value to OR with the value of the signal. */ void HSA_API hsa_signal_or_scacq_screl( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_or_scacq_screl. * * @copydoc hsa_signal_or_scacq_screl */ void HSA_API HSA_DEPRECATED hsa_signal_or_acq_rel( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_or_scacq_screl */ void HSA_API hsa_signal_or_scacquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_or_scacquire. * * @copydoc hsa_signal_or_scacquire */ void HSA_API HSA_DEPRECATED hsa_signal_or_acquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_or_scacq_screl */ void HSA_API hsa_signal_or_relaxed( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_or_scacq_screl */ void HSA_API hsa_signal_or_screlease( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_or_screlease. * * @copydoc hsa_signal_or_screlease */ void HSA_API HSA_DEPRECATED hsa_signal_or_release( hsa_signal_t signal, hsa_signal_value_t value); /** * @brief Atomically perform a bitwise XOR operation between the value of a * signal and a given value. * * @details If the value of the signal is changed, all the agents waiting on * @p signal for which @p value satisfies their wait condition are awakened. * * @param[in] signal Signal. If @p signal is a queue doorbell signal, the * behavior is undefined. * * @param[in] value Value to XOR with the value of the signal. * */ void HSA_API hsa_signal_xor_scacq_screl( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_xor_scacq_screl. * * @copydoc hsa_signal_xor_scacq_screl */ void HSA_API HSA_DEPRECATED hsa_signal_xor_acq_rel( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_xor_scacq_screl */ void HSA_API hsa_signal_xor_scacquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_xor_scacquire. * * @copydoc hsa_signal_xor_scacquire */ void HSA_API HSA_DEPRECATED hsa_signal_xor_acquire( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_xor_scacq_screl */ void HSA_API hsa_signal_xor_relaxed( hsa_signal_t signal, hsa_signal_value_t value); /** * @copydoc hsa_signal_xor_scacq_screl */ void HSA_API hsa_signal_xor_screlease( hsa_signal_t signal, hsa_signal_value_t value); /** * @deprecated Renamed as ::hsa_signal_xor_screlease. * * @copydoc hsa_signal_xor_screlease */ void HSA_API HSA_DEPRECATED hsa_signal_xor_release( hsa_signal_t signal, hsa_signal_value_t value); /** * @brief Wait condition operator. */ typedef enum { /** * The two operands are equal. */ HSA_SIGNAL_CONDITION_EQ = 0, /** * The two operands are not equal. */ HSA_SIGNAL_CONDITION_NE = 1, /** * The first operand is less than the second operand. */ HSA_SIGNAL_CONDITION_LT = 2, /** * The first operand is greater than or equal to the second operand. */ HSA_SIGNAL_CONDITION_GTE = 3 } hsa_signal_condition_t; /** * @brief State of the application thread during a signal wait. */ typedef enum { /** * The application thread may be rescheduled while waiting on the signal. */ HSA_WAIT_STATE_BLOCKED = 0, /** * The application thread stays active while waiting on a signal. */ HSA_WAIT_STATE_ACTIVE = 1 } hsa_wait_state_t; /** * @brief Wait until a signal value satisfies a specified condition, or a * certain amount of time has elapsed. * * @details A wait operation can spuriously resume at any time sooner than the * timeout (for example, due to system or other external factors) even when the * condition has not been met. * * The function is guaranteed to return if the signal value satisfies the * condition at some point in time during the wait, but the value returned to * the application might not satisfy the condition. The application must ensure * that signals are used in such way that wait wakeup conditions are not * invalidated before dependent threads have woken up. * * When the wait operation internally loads the value of the passed signal, it * uses the memory order indicated in the function name. * * @param[in] signal Signal. * * @param[in] condition Condition used to compare the signal value with @p * compare_value. * * @param[in] compare_value Value to compare with. * * @param[in] timeout_hint Maximum duration of the wait. Specified in the same * unit as the system timestamp. The operation might block for a shorter or * longer time even if the condition is not met. A value of UINT64_MAX indicates * no maximum. * * @param[in] wait_state_hint Hint used by the application to indicate the * preferred waiting state. The actual waiting state is ultimately decided by * HSA runtime and may not match the provided hint. A value of * ::HSA_WAIT_STATE_ACTIVE may improve the latency of response to a signal * update by avoiding rescheduling overhead. * * @return Observed value of the signal, which might not satisfy the specified * condition. * */ hsa_signal_value_t HSA_API hsa_signal_wait_scacquire( hsa_signal_t signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_state_hint); /** * @copydoc hsa_signal_wait_scacquire */ hsa_signal_value_t HSA_API hsa_signal_wait_relaxed( hsa_signal_t signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_state_hint); /** * @deprecated Renamed as ::hsa_signal_wait_scacquire. * * @copydoc hsa_signal_wait_scacquire */ hsa_signal_value_t HSA_API HSA_DEPRECATED hsa_signal_wait_acquire( hsa_signal_t signal, hsa_signal_condition_t condition, hsa_signal_value_t compare_value, uint64_t timeout_hint, hsa_wait_state_t wait_state_hint); /** * @brief Group of signals. */ typedef struct hsa_signal_group_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_signal_group_t; /** * @brief Create a signal group. * * @param[in] num_signals Number of elements in @p signals. Must not be 0. * * @param[in] signals List of signals in the group. The list must not contain * any repeated elements. Must not be NULL. * * @param[in] num_consumers Number of elements in @p consumers. Must not be 0. * * @param[in] consumers List of agents that might consume (wait on) the signal * group. The list must not contain repeated elements, and must be a subset of * the set of agents that are allowed to wait on all the signals in the * group. If an agent not listed in @p consumers waits on the returned group, * the behavior is undefined. The memory associated with @p consumers can be * reused or freed after the function returns. Must not be NULL. * * @param[out] signal_group Pointer to newly created signal group. Must not be * NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p num_signals is 0, @p signals * is NULL, @p num_consumers is 0, @p consumers is NULL, or @p signal_group is * NULL. */ hsa_status_t HSA_API hsa_signal_group_create( uint32_t num_signals, const hsa_signal_t *signals, uint32_t num_consumers, const hsa_agent_t *consumers, hsa_signal_group_t *signal_group); /** * @brief Destroy a signal group previous created by ::hsa_signal_group_create. * * @param[in] signal_group Signal group. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_SIGNAL_GROUP @p signal_group is invalid. */ hsa_status_t HSA_API hsa_signal_group_destroy( hsa_signal_group_t signal_group); /** * @brief Wait until the value of at least one of the signals in a signal group * satisfies its associated condition. * * @details The function is guaranteed to return if the value of at least one of * the signals in the group satisfies its associated condition at some point in * time during the wait, but the signal value returned to the application may no * longer satisfy the condition. The application must ensure that signals in the * group are used in such way that wait wakeup conditions are not invalidated * before dependent threads have woken up. * * When this operation internally loads the value of the passed signal, it uses * the memory order indicated in the function name. * * @param[in] signal_group Signal group. * * @param[in] conditions List of conditions. Each condition, and the value at * the same index in @p compare_values, is used to compare the value of the * signal at that index in @p signal_group (the signal passed by the application * to ::hsa_signal_group_create at that particular index). The size of @p * conditions must not be smaller than the number of signals in @p signal_group; * any extra elements are ignored. Must not be NULL. * * @param[in] compare_values List of comparison values. The size of @p * compare_values must not be smaller than the number of signals in @p * signal_group; any extra elements are ignored. Must not be NULL. * * @param[in] wait_state_hint Hint used by the application to indicate the * preferred waiting state. The actual waiting state is decided by the HSA runtime * and may not match the provided hint. A value of ::HSA_WAIT_STATE_ACTIVE may * improve the latency of response to a signal update by avoiding rescheduling * overhead. * * @param[out] signal Signal in the group that satisfied the associated * condition. If several signals satisfied their condition, the function can * return any of those signals. Must not be NULL. * * @param[out] value Observed value for @p signal, which might no longer satisfy * the specified condition. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_INVALID_SIGNAL_GROUP @p signal_group is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p conditions is NULL, @p * compare_values is NULL, @p signal is NULL, or @p value is NULL. */ hsa_status_t HSA_API hsa_signal_group_wait_any_scacquire( hsa_signal_group_t signal_group, const hsa_signal_condition_t *conditions, const hsa_signal_value_t *compare_values, hsa_wait_state_t wait_state_hint, hsa_signal_t *signal, hsa_signal_value_t *value); /** * @copydoc hsa_signal_group_wait_any_scacquire */ hsa_status_t HSA_API hsa_signal_group_wait_any_relaxed( hsa_signal_group_t signal_group, const hsa_signal_condition_t *conditions, const hsa_signal_value_t *compare_values, hsa_wait_state_t wait_state_hint, hsa_signal_t *signal, hsa_signal_value_t *value); /** @} */ /** \defgroup memory Memory * @{ */ /** * @brief A memory region represents a block of virtual memory with certain * properties. For example, the HSA runtime represents fine-grained memory in * the global segment using a region. A region might be associated with more * than one agent. */ typedef struct hsa_region_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_region_t; /** @} */ /** \defgroup queue Queues * @{ */ /** * @brief Queue type. Intended to be used for dynamic queue protocol * determination. */ typedef enum { /** * Queue supports multiple producers. Use of multiproducer queue mechanics is * required. */ HSA_QUEUE_TYPE_MULTI = 0, /** * Queue only supports a single producer. In some scenarios, the application * may want to limit the submission of AQL packets to a single agent. Queues * that support a single producer may be more efficient than queues supporting * multiple producers. Use of multiproducer queue mechanics is not supported. */ HSA_QUEUE_TYPE_SINGLE = 1, /** * Queue supports multiple producers and cooperative dispatches. Cooperative * dispatches are able to use GWS synchronization. Queues of this type may be * limited in number. The runtime may return the same queue to serve multiple * ::hsa_queue_create calls when this type is given. Callers must inspect the * returned queue to discover queue size. Queues of this type are reference * counted and require a matching number of ::hsa_queue_destroy calls to * release. Use of multiproducer queue mechanics is required. See * ::HSA_AMD_AGENT_INFO_COOPERATIVE_QUEUES to query agent support for this * type. */ HSA_QUEUE_TYPE_COOPERATIVE = 2 } hsa_queue_type_t; /** * @brief A fixed-size type used to represent ::hsa_queue_type_t constants. */ typedef uint32_t hsa_queue_type32_t; /** * @brief Queue features. */ typedef enum { /** * Queue supports kernel dispatch packets. */ HSA_QUEUE_FEATURE_KERNEL_DISPATCH = 1, /** * Queue supports agent dispatch packets. */ HSA_QUEUE_FEATURE_AGENT_DISPATCH = 2 } hsa_queue_feature_t; /** * @brief User mode queue. * * @details The queue structure is read-only and allocated by the HSA runtime, * but agents can directly modify the contents of the buffer pointed by @a * base_address, or use HSA runtime APIs to access the doorbell signal. * */ typedef struct hsa_queue_s { /** * Queue type. */ hsa_queue_type32_t type; /** * Queue features mask. This is a bit-field of ::hsa_queue_feature_t * values. Applications should ignore any unknown set bits. */ uint32_t features; #ifdef HSA_LARGE_MODEL void* base_address; #elif defined HSA_LITTLE_ENDIAN /** * Starting address of the HSA runtime-allocated buffer used to store the AQL * packets. Must be aligned to the size of an AQL packet. */ void* base_address; /** * Reserved. Must be 0. */ uint32_t reserved0; #else uint32_t reserved0; void* base_address; #endif /** * Signal object used by the application to indicate the ID of a packet that * is ready to be processed. The HSA runtime manages the doorbell signal. If * the application tries to replace or destroy this signal, the behavior is * undefined. * * If @a type is ::HSA_QUEUE_TYPE_SINGLE, the doorbell signal value must be * updated in a monotonically increasing fashion. If @a type is * ::HSA_QUEUE_TYPE_MULTI, the doorbell signal value can be updated with any * value. */ hsa_signal_t doorbell_signal; /** * Maximum number of packets the queue can hold. Must be a power of 2. */ uint32_t size; /** * Reserved. Must be 0. */ uint32_t reserved1; /** * Queue identifier, which is unique over the lifetime of the application. */ uint64_t id; } hsa_queue_t; /** * @brief Create a user mode queue. * * @details The HSA runtime creates the queue structure, the underlying packet * buffer, the completion signal, and the write and read indexes. The initial * value of the write and read indexes is 0. The type of every packet in the * buffer is initialized to ::HSA_PACKET_TYPE_INVALID. * * The application should only rely on the error code returned to determine if * the queue is valid. * * @param[in] agent Agent where to create the queue. * * @param[in] size Number of packets the queue is expected to * hold. Must be a power of 2 between 1 and the value of * ::HSA_AGENT_INFO_QUEUE_MAX_SIZE in @p agent. The size of the newly * created queue is the maximum of @p size and the value of * ::HSA_AGENT_INFO_QUEUE_MIN_SIZE in @p agent. * * @param[in] type Type of the queue, a bitwise OR of hsa_queue_type_t values. * If the value of ::HSA_AGENT_INFO_QUEUE_TYPE in @p agent is ::HSA_QUEUE_TYPE_SINGLE, * then @p type must also be ::HSA_QUEUE_TYPE_SINGLE. * * @param[in] callback Callback invoked by the HSA runtime for every * asynchronous event related to the newly created queue. May be NULL. The HSA * runtime passes three arguments to the callback: a code identifying the event * that triggered the invocation, a pointer to the queue where the event * originated, and the application data. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @param[in] private_segment_size Hint indicating the maximum * expected private segment usage per work-item, in bytes. There may * be performance degradation if the application places a kernel * dispatch packet in the queue and the corresponding private segment * usage exceeds @p private_segment_size. If the application does not * want to specify any particular value for this argument, @p * private_segment_size must be UINT32_MAX. If the queue does not * support kernel dispatch packets, this argument is ignored. * * @param[in] group_segment_size Hint indicating the maximum expected * group segment usage per work-group, in bytes. There may be * performance degradation if the application places a kernel dispatch * packet in the queue and the corresponding group segment usage * exceeds @p group_segment_size. If the application does not want to * specify any particular value for this argument, @p * group_segment_size must be UINT32_MAX. If the queue does not * support kernel dispatch packets, this argument is ignored. * * @param[out] queue Memory location where the HSA runtime stores a pointer to * the newly created queue. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_QUEUE_CREATION @p agent does not * support queues of the given type. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p size is not a power of two, * @p size is 0, @p type is an invalid queue type, or @p queue is NULL. * */ hsa_status_t HSA_API hsa_queue_create( hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type, void (*callback)(hsa_status_t status, hsa_queue_t *source, void *data), void *data, uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t **queue); /** * @brief Create a queue for which the application or a kernel is responsible * for processing the AQL packets. * * @details The application can use this function to create queues where AQL * packets are not parsed by the packet processor associated with an agent, * but rather by a unit of execution running on that agent (for example, a * thread in the host application). * * The application is responsible for ensuring that all the producers and * consumers of the resulting queue can access the provided doorbell signal * and memory region. The application is also responsible for ensuring that the * unit of execution processing the queue packets supports the indicated * features (AQL packet types). * * When the queue is created, the HSA runtime allocates the packet buffer using * @p region, and the write and read indexes. The initial value of the write and * read indexes is 0, and the type of every packet in the buffer is initialized * to ::HSA_PACKET_TYPE_INVALID. The value of the @e size, @e type, @e features, * and @e doorbell_signal fields in the returned queue match the values passed * by the application. * * @param[in] region Memory region that the HSA runtime should use to allocate * the AQL packet buffer and any other queue metadata. * * @param[in] size Number of packets the queue is expected to hold. Must be a * power of 2 greater than 0. * * @param[in] type Queue type. * * @param[in] features Supported queue features. This is a bit-field of * ::hsa_queue_feature_t values. * * @param[in] doorbell_signal Doorbell signal that the HSA runtime must * associate with the returned queue. The signal handle must not be 0. * * @param[out] queue Memory location where the HSA runtime stores a pointer to * the newly created queue. The application should not rely on the value * returned for this argument but only in the status code to determine if the * queue is valid. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p size is not a power of two, @p * size is 0, @p type is an invalid queue type, the doorbell signal handle is * 0, or @p queue is NULL. * */ hsa_status_t HSA_API hsa_soft_queue_create( hsa_region_t region, uint32_t size, hsa_queue_type32_t type, uint32_t features, hsa_signal_t doorbell_signal, hsa_queue_t **queue); /** * @brief Destroy a user mode queue. * * @details When a queue is destroyed, the state of the AQL packets that have * not been yet fully processed (their completion phase has not finished) * becomes undefined. It is the responsibility of the application to ensure that * all pending queue operations are finished if their results are required. * * The resources allocated by the HSA runtime during queue creation (queue * structure, ring buffer, doorbell signal) are released. The queue should not * be accessed after being destroyed. * * @param[in] queue Pointer to a queue created using ::hsa_queue_create. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_QUEUE The queue is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p queue is NULL. */ hsa_status_t HSA_API hsa_queue_destroy( hsa_queue_t *queue); /** * @brief Inactivate a queue. * * @details Inactivating the queue aborts any pending executions and prevent any * new packets from being processed. Any more packets written to the queue once * it is inactivated will be ignored by the packet processor. * * @param[in] queue Pointer to a queue. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_QUEUE The queue is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p queue is NULL. */ hsa_status_t HSA_API hsa_queue_inactivate( hsa_queue_t *queue); /** * @deprecated Renamed as ::hsa_queue_load_read_index_scacquire. * * @copydoc hsa_queue_load_read_index_scacquire */ uint64_t HSA_API HSA_DEPRECATED hsa_queue_load_read_index_acquire( const hsa_queue_t *queue); /** * @brief Atomically load the read index of a queue. * * @param[in] queue Pointer to a queue. * * @return Read index of the queue pointed by @p queue. */ uint64_t HSA_API hsa_queue_load_read_index_scacquire( const hsa_queue_t *queue); /** * @copydoc hsa_queue_load_read_index_scacquire */ uint64_t HSA_API hsa_queue_load_read_index_relaxed( const hsa_queue_t *queue); /** * @deprecated Renamed as ::hsa_queue_load_write_index_scacquire. * * @copydoc hsa_queue_load_write_index_scacquire */ uint64_t HSA_API HSA_DEPRECATED hsa_queue_load_write_index_acquire( const hsa_queue_t *queue); /** * @brief Atomically load the write index of a queue. * * @param[in] queue Pointer to a queue. * * @return Write index of the queue pointed by @p queue. */ uint64_t HSA_API hsa_queue_load_write_index_scacquire( const hsa_queue_t *queue); /** * @copydoc hsa_queue_load_write_index_scacquire */ uint64_t HSA_API hsa_queue_load_write_index_relaxed( const hsa_queue_t *queue); /** * @brief Atomically set the write index of a queue. * * @details It is recommended that the application uses this function to update * the write index when there is a single agent submitting work to the queue * (the queue type is ::HSA_QUEUE_TYPE_SINGLE). * * @param[in] queue Pointer to a queue. * * @param[in] value Value to assign to the write index. * */ void HSA_API hsa_queue_store_write_index_relaxed( const hsa_queue_t *queue, uint64_t value); /** * @deprecated Renamed as ::hsa_queue_store_write_index_screlease. * * @copydoc hsa_queue_store_write_index_screlease */ void HSA_API HSA_DEPRECATED hsa_queue_store_write_index_release( const hsa_queue_t *queue, uint64_t value); /** * @copydoc hsa_queue_store_write_index_relaxed */ void HSA_API hsa_queue_store_write_index_screlease( const hsa_queue_t *queue, uint64_t value); /** * @deprecated Renamed as ::hsa_queue_cas_write_index_scacq_screl. * * @copydoc hsa_queue_cas_write_index_scacq_screl */ uint64_t HSA_API HSA_DEPRECATED hsa_queue_cas_write_index_acq_rel( const hsa_queue_t *queue, uint64_t expected, uint64_t value); /** * @brief Atomically set the write index of a queue if the observed value is * equal to the expected value. The application can inspect the returned value * to determine if the replacement was done. * * @param[in] queue Pointer to a queue. * * @param[in] expected Expected value. * * @param[in] value Value to assign to the write index if @p expected matches * the observed write index. Must be greater than @p expected. * * @return Previous value of the write index. */ uint64_t HSA_API hsa_queue_cas_write_index_scacq_screl( const hsa_queue_t *queue, uint64_t expected, uint64_t value); /** * @deprecated Renamed as ::hsa_queue_cas_write_index_scacquire. * * @copydoc hsa_queue_cas_write_index_scacquire */ uint64_t HSA_API HSA_DEPRECATED hsa_queue_cas_write_index_acquire( const hsa_queue_t *queue, uint64_t expected, uint64_t value); /** * @copydoc hsa_queue_cas_write_index_scacq_screl */ uint64_t HSA_API hsa_queue_cas_write_index_scacquire( const hsa_queue_t *queue, uint64_t expected, uint64_t value); /** * @copydoc hsa_queue_cas_write_index_scacq_screl */ uint64_t HSA_API hsa_queue_cas_write_index_relaxed( const hsa_queue_t *queue, uint64_t expected, uint64_t value); /** * @deprecated Renamed as ::hsa_queue_cas_write_index_screlease. * * @copydoc hsa_queue_cas_write_index_screlease */ uint64_t HSA_API HSA_DEPRECATED hsa_queue_cas_write_index_release( const hsa_queue_t *queue, uint64_t expected, uint64_t value); /** * @copydoc hsa_queue_cas_write_index_scacq_screl */ uint64_t HSA_API hsa_queue_cas_write_index_screlease( const hsa_queue_t *queue, uint64_t expected, uint64_t value); /** * @deprecated Renamed as ::hsa_queue_add_write_index_scacq_screl. * * @copydoc hsa_queue_add_write_index_scacq_screl */ uint64_t HSA_API HSA_DEPRECATED hsa_queue_add_write_index_acq_rel( const hsa_queue_t *queue, uint64_t value); /** * @brief Atomically increment the write index of a queue by an offset. * * @param[in] queue Pointer to a queue. * * @param[in] value Value to add to the write index. * * @return Previous value of the write index. */ uint64_t HSA_API hsa_queue_add_write_index_scacq_screl( const hsa_queue_t *queue, uint64_t value); /** * @deprecated Renamed as ::hsa_queue_add_write_index_scacquire. * * @copydoc hsa_queue_add_write_index_scacquire */ uint64_t HSA_API HSA_DEPRECATED hsa_queue_add_write_index_acquire( const hsa_queue_t *queue, uint64_t value); /** * @copydoc hsa_queue_add_write_index_scacq_screl */ uint64_t HSA_API hsa_queue_add_write_index_scacquire( const hsa_queue_t *queue, uint64_t value); /** * @copydoc hsa_queue_add_write_index_scacq_screl */ uint64_t HSA_API hsa_queue_add_write_index_relaxed( const hsa_queue_t *queue, uint64_t value); /** * @deprecated Renamed as ::hsa_queue_add_write_index_screlease. * * @copydoc hsa_queue_add_write_index_screlease */ uint64_t HSA_API HSA_DEPRECATED hsa_queue_add_write_index_release( const hsa_queue_t *queue, uint64_t value); /** * @copydoc hsa_queue_add_write_index_scacq_screl */ uint64_t HSA_API hsa_queue_add_write_index_screlease( const hsa_queue_t *queue, uint64_t value); /** * @brief Atomically set the read index of a queue. * * @details Modifications of the read index are not allowed and result in * undefined behavior if the queue is associated with an agent for which * only the corresponding packet processor is permitted to update the read * index. * * @param[in] queue Pointer to a queue. * * @param[in] value Value to assign to the read index. * */ void HSA_API hsa_queue_store_read_index_relaxed( const hsa_queue_t *queue, uint64_t value); /** * @deprecated Renamed as ::hsa_queue_store_read_index_screlease. * * @copydoc hsa_queue_store_read_index_screlease */ void HSA_API HSA_DEPRECATED hsa_queue_store_read_index_release( const hsa_queue_t *queue, uint64_t value); /** * @copydoc hsa_queue_store_read_index_relaxed */ void HSA_API hsa_queue_store_read_index_screlease( const hsa_queue_t *queue, uint64_t value); /** @} */ /** \defgroup aql Architected Queuing Language * @{ */ /** * @brief Packet type. */ typedef enum { /** * Vendor-specific packet. */ HSA_PACKET_TYPE_VENDOR_SPECIFIC = 0, /** * The packet has been processed in the past, but has not been reassigned to * the packet processor. A packet processor must not process a packet of this * type. All queues support this packet type. */ HSA_PACKET_TYPE_INVALID = 1, /** * Packet used by agents for dispatching jobs to kernel agents. Not all * queues support packets of this type (see ::hsa_queue_feature_t). */ HSA_PACKET_TYPE_KERNEL_DISPATCH = 2, /** * Packet used by agents to delay processing of subsequent packets, and to * express complex dependencies between multiple packets. All queues support * this packet type. */ HSA_PACKET_TYPE_BARRIER_AND = 3, /** * Packet used by agents for dispatching jobs to agents. Not all * queues support packets of this type (see ::hsa_queue_feature_t). */ HSA_PACKET_TYPE_AGENT_DISPATCH = 4, /** * Packet used by agents to delay processing of subsequent packets, and to * express complex dependencies between multiple packets. All queues support * this packet type. */ HSA_PACKET_TYPE_BARRIER_OR = 5 } hsa_packet_type_t; /** * @brief Scope of the memory fence operation associated with a packet. */ typedef enum { /** * No scope (no fence is applied). The packet relies on external fences to * ensure visibility of memory updates. */ HSA_FENCE_SCOPE_NONE = 0, /** * The fence is applied with agent scope for the global segment. */ HSA_FENCE_SCOPE_AGENT = 1, /** * The fence is applied across both agent and system scope for the global * segment. */ HSA_FENCE_SCOPE_SYSTEM = 2 } hsa_fence_scope_t; /** * @brief Sub-fields of the @a header field that is present in any AQL * packet. The offset (with respect to the address of @a header) of a sub-field * is identical to its enumeration constant. The width of each sub-field is * determined by the corresponding value in ::hsa_packet_header_width_t. The * offset and the width are expressed in bits. */ typedef enum { /** * Packet type. The value of this sub-field must be one of * ::hsa_packet_type_t. If the type is ::HSA_PACKET_TYPE_VENDOR_SPECIFIC, the * packet layout is vendor-specific. */ HSA_PACKET_HEADER_TYPE = 0, /** * Barrier bit. If the barrier bit is set, the processing of the current * packet only launches when all preceding packets (within the same queue) are * complete. */ HSA_PACKET_HEADER_BARRIER = 8, /** * Acquire fence scope. The value of this sub-field determines the scope and * type of the memory fence operation applied before the packet enters the * active phase. An acquire fence ensures that any subsequent global segment * or image loads by any unit of execution that belongs to a dispatch that has * not yet entered the active phase on any queue of the same kernel agent, * sees any data previously released at the scopes specified by the acquire * fence. The value of this sub-field must be one of ::hsa_fence_scope_t. */ HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE = 9, /** * @deprecated Renamed as ::HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE. */ HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE = 9, /** * Release fence scope, The value of this sub-field determines the scope and * type of the memory fence operation applied after kernel completion but * before the packet is completed. A release fence makes any global segment or * image data that was stored by any unit of execution that belonged to a * dispatch that has completed the active phase on any queue of the same * kernel agent visible in all the scopes specified by the release fence. The * value of this sub-field must be one of ::hsa_fence_scope_t. */ HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE = 11, /** * @deprecated Renamed as ::HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE. */ HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE = 11 } hsa_packet_header_t; /** * @brief Width (in bits) of the sub-fields in ::hsa_packet_header_t. */ typedef enum { HSA_PACKET_HEADER_WIDTH_TYPE = 8, HSA_PACKET_HEADER_WIDTH_BARRIER = 1, HSA_PACKET_HEADER_WIDTH_SCACQUIRE_FENCE_SCOPE = 2, /** * @deprecated Use HSA_PACKET_HEADER_WIDTH_SCACQUIRE_FENCE_SCOPE. */ HSA_PACKET_HEADER_WIDTH_ACQUIRE_FENCE_SCOPE = 2, HSA_PACKET_HEADER_WIDTH_SCRELEASE_FENCE_SCOPE = 2, /** * @deprecated Use HSA_PACKET_HEADER_WIDTH_SCRELEASE_FENCE_SCOPE. */ HSA_PACKET_HEADER_WIDTH_RELEASE_FENCE_SCOPE = 2 } hsa_packet_header_width_t; /** * @brief Sub-fields of the kernel dispatch packet @a setup field. The offset * (with respect to the address of @a setup) of a sub-field is identical to its * enumeration constant. The width of each sub-field is determined by the * corresponding value in ::hsa_kernel_dispatch_packet_setup_width_t. The * offset and the width are expressed in bits. */ typedef enum { /** * Number of dimensions of the grid. Valid values are 1, 2, or 3. * */ HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS = 0 } hsa_kernel_dispatch_packet_setup_t; /** * @brief Width (in bits) of the sub-fields in * ::hsa_kernel_dispatch_packet_setup_t. */ typedef enum { HSA_KERNEL_DISPATCH_PACKET_SETUP_WIDTH_DIMENSIONS = 2 } hsa_kernel_dispatch_packet_setup_width_t; /** * @brief AQL kernel dispatch packet */ typedef struct hsa_kernel_dispatch_packet_s { /** * Packet header. Used to configure multiple packet parameters such as the * packet type. The parameters are described by ::hsa_packet_header_t. */ uint16_t header; /** * Dispatch setup parameters. Used to configure kernel dispatch parameters * such as the number of dimensions in the grid. The parameters are described * by ::hsa_kernel_dispatch_packet_setup_t. */ uint16_t setup; /** * X dimension of work-group, in work-items. Must be greater than 0. */ uint16_t workgroup_size_x; /** * Y dimension of work-group, in work-items. Must be greater than * 0. If the grid has 1 dimension, the only valid value is 1. */ uint16_t workgroup_size_y; /** * Z dimension of work-group, in work-items. Must be greater than * 0. If the grid has 1 or 2 dimensions, the only valid value is 1. */ uint16_t workgroup_size_z; /** * Reserved. Must be 0. */ uint16_t reserved0; /** * X dimension of grid, in work-items. Must be greater than 0. Must * not be smaller than @a workgroup_size_x. */ uint32_t grid_size_x; /** * Y dimension of grid, in work-items. Must be greater than 0. If the grid has * 1 dimension, the only valid value is 1. Must not be smaller than @a * workgroup_size_y. */ uint32_t grid_size_y; /** * Z dimension of grid, in work-items. Must be greater than 0. If the grid has * 1 or 2 dimensions, the only valid value is 1. Must not be smaller than @a * workgroup_size_z. */ uint32_t grid_size_z; /** * Size in bytes of private memory allocation request (per work-item). */ uint32_t private_segment_size; /** * Size in bytes of group memory allocation request (per work-group). Must not * be less than the sum of the group memory used by the kernel (and the * functions it calls directly or indirectly) and the dynamically allocated * group segment variables. */ uint32_t group_segment_size; /** * Opaque handle to a code object that includes an implementation-defined * executable code for the kernel. */ uint64_t kernel_object; #ifdef HSA_LARGE_MODEL void* kernarg_address; #elif defined HSA_LITTLE_ENDIAN /** * Pointer to a buffer containing the kernel arguments. May be NULL. * * The buffer must be allocated using ::hsa_memory_allocate, and must not be * modified once the kernel dispatch packet is enqueued until the dispatch has * completed execution. */ void* kernarg_address; /** * Reserved. Must be 0. */ uint32_t reserved1; #else uint32_t reserved1; void* kernarg_address; #endif /** * Reserved. Must be 0. */ uint64_t reserved2; /** * Signal used to indicate completion of the job. The application can use the * special signal handle 0 to indicate that no signal is used. */ hsa_signal_t completion_signal; } hsa_kernel_dispatch_packet_t; /** * @brief Agent dispatch packet. */ typedef struct hsa_agent_dispatch_packet_s { /** * Packet header. Used to configure multiple packet parameters such as the * packet type. The parameters are described by ::hsa_packet_header_t. */ uint16_t header; /** * Application-defined function to be performed by the destination agent. */ uint16_t type; /** * Reserved. Must be 0. */ uint32_t reserved0; #ifdef HSA_LARGE_MODEL void* return_address; #elif defined HSA_LITTLE_ENDIAN /** * Address where to store the function return values, if any. */ void* return_address; /** * Reserved. Must be 0. */ uint32_t reserved1; #else uint32_t reserved1; void* return_address; #endif /** * Function arguments. */ uint64_t arg[4]; /** * Reserved. Must be 0. */ uint64_t reserved2; /** * Signal used to indicate completion of the job. The application can use the * special signal handle 0 to indicate that no signal is used. */ hsa_signal_t completion_signal; } hsa_agent_dispatch_packet_t; /** * @brief Barrier-AND packet. */ typedef struct hsa_barrier_and_packet_s { /** * Packet header. Used to configure multiple packet parameters such as the * packet type. The parameters are described by ::hsa_packet_header_t. */ uint16_t header; /** * Reserved. Must be 0. */ uint16_t reserved0; /** * Reserved. Must be 0. */ uint32_t reserved1; /** * Array of dependent signal objects. Signals with a handle value of 0 are * allowed and are interpreted by the packet processor as satisfied * dependencies. */ hsa_signal_t dep_signal[5]; /** * Reserved. Must be 0. */ uint64_t reserved2; /** * Signal used to indicate completion of the job. The application can use the * special signal handle 0 to indicate that no signal is used. */ hsa_signal_t completion_signal; } hsa_barrier_and_packet_t; /** * @brief Barrier-OR packet. */ typedef struct hsa_barrier_or_packet_s { /** * Packet header. Used to configure multiple packet parameters such as the * packet type. The parameters are described by ::hsa_packet_header_t. */ uint16_t header; /** * Reserved. Must be 0. */ uint16_t reserved0; /** * Reserved. Must be 0. */ uint32_t reserved1; /** * Array of dependent signal objects. Signals with a handle value of 0 are * allowed and are interpreted by the packet processor as dependencies not * satisfied. */ hsa_signal_t dep_signal[5]; /** * Reserved. Must be 0. */ uint64_t reserved2; /** * Signal used to indicate completion of the job. The application can use the * special signal handle 0 to indicate that no signal is used. */ hsa_signal_t completion_signal; } hsa_barrier_or_packet_t; /** @} */ /** \addtogroup memory Memory * @{ */ /** * @brief Memory segments associated with a region. */ typedef enum { /** * Global segment. Used to hold data that is shared by all agents. */ HSA_REGION_SEGMENT_GLOBAL = 0, /** * Read-only segment. Used to hold data that remains constant during the * execution of a kernel. */ HSA_REGION_SEGMENT_READONLY = 1, /** * Private segment. Used to hold data that is local to a single work-item. */ HSA_REGION_SEGMENT_PRIVATE = 2, /** * Group segment. Used to hold data that is shared by the work-items of a * work-group. */ HSA_REGION_SEGMENT_GROUP = 3, /** * Kernarg segment. Used to store kernel arguments. */ HSA_REGION_SEGMENT_KERNARG = 4 } hsa_region_segment_t; /** * @brief Global region flags. */ typedef enum { /** * The application can use memory in the region to store kernel arguments, and * provide the values for the kernarg segment of a kernel dispatch. If this * flag is set, then ::HSA_REGION_GLOBAL_FLAG_FINE_GRAINED must be set. */ HSA_REGION_GLOBAL_FLAG_KERNARG = 1, /** * Updates to memory in this region are immediately visible to all the * agents under the terms of the HSA memory model. If this * flag is set, then ::HSA_REGION_GLOBAL_FLAG_COARSE_GRAINED must not be set. */ HSA_REGION_GLOBAL_FLAG_FINE_GRAINED = 2, /** * Updates to memory in this region can be performed by a single agent at * a time. If a different agent in the system is allowed to access the * region, the application must explicitely invoke ::hsa_memory_assign_agent * in order to transfer ownership to that agent for a particular buffer. */ HSA_REGION_GLOBAL_FLAG_COARSE_GRAINED = 4 } hsa_region_global_flag_t; /** * @brief Attributes of a memory region. */ typedef enum { /** * Segment where memory in the region can be used. The type of this * attribute is ::hsa_region_segment_t. */ HSA_REGION_INFO_SEGMENT = 0, /** * Flag mask. The value of this attribute is undefined if the value of * ::HSA_REGION_INFO_SEGMENT is not ::HSA_REGION_SEGMENT_GLOBAL. The type of * this attribute is uint32_t, a bit-field of ::hsa_region_global_flag_t * values. */ HSA_REGION_INFO_GLOBAL_FLAGS = 1, /** * Size of this region, in bytes. The type of this attribute is size_t. */ HSA_REGION_INFO_SIZE = 2, /** * Maximum allocation size in this region, in bytes. Must not exceed the value * of ::HSA_REGION_INFO_SIZE. The type of this attribute is size_t. * * If the region is in the global or readonly segments, this is the maximum * size that the application can pass to ::hsa_memory_allocate. * * If the region is in the group segment, this is the maximum size (per * work-group) that can be requested for a given kernel dispatch. If the * region is in the private segment, this is the maximum size (per work-item) * that can be requested for a specific kernel dispatch, and must be at least * 256 bytes. */ HSA_REGION_INFO_ALLOC_MAX_SIZE = 4, /** * Maximum size (per work-group) of private memory that can be requested for a * specific kernel dispatch. Must be at least 65536 bytes. The type of this * attribute is uint32_t. The value of this attribute is undefined if the * region is not in the private segment. */ HSA_REGION_INFO_ALLOC_MAX_PRIVATE_WORKGROUP_SIZE = 8, /** * Indicates whether memory in this region can be allocated using * ::hsa_memory_allocate. The type of this attribute is bool. * * The value of this flag is always false for regions in the group and private * segments. */ HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED = 5, /** * Allocation granularity of buffers allocated by ::hsa_memory_allocate in * this region. The size of a buffer allocated in this region is a multiple of * the value of this attribute. The value of this attribute is only defined if * ::HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED is true for this region. The type * of this attribute is size_t. */ HSA_REGION_INFO_RUNTIME_ALLOC_GRANULE = 6, /** * Alignment of buffers allocated by ::hsa_memory_allocate in this region. The * value of this attribute is only defined if * ::HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED is true for this region, and must be * a power of 2. The type of this attribute is size_t. */ HSA_REGION_INFO_RUNTIME_ALLOC_ALIGNMENT = 7 } hsa_region_info_t; /** * @brief Get the current value of an attribute of a region. * * @param[in] region A valid region. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to a application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_REGION The region is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * region attribute, or @p value is NULL. */ hsa_status_t HSA_API hsa_region_get_info( hsa_region_t region, hsa_region_info_t attribute, void* value); /** * @brief Iterate over the memory regions associated with a given agent, and * invoke an application-defined callback on every iteration. * * @param[in] agent A valid agent. * * @param[in] callback Callback to be invoked once per region that is * accessible from the agent. The HSA runtime passes two arguments to the * callback, the region and the application data. If @p callback returns a * status other than ::HSA_STATUS_SUCCESS for a particular iteration, the * traversal stops and ::hsa_agent_iterate_regions returns that status value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_agent_iterate_regions( hsa_agent_t agent, hsa_status_t (*callback)(hsa_region_t region, void* data), void* data); /** * @brief Allocate a block of memory in a given region. * * @param[in] region Region where to allocate memory from. The region must have * the ::HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED flag set. * * @param[in] size Allocation size, in bytes. Must not be zero. This value is * rounded up to the nearest multiple of ::HSA_REGION_INFO_RUNTIME_ALLOC_GRANULE * in @p region. * * @param[out] ptr Pointer to the location where to store the base address of * the allocated block. The returned base address is aligned to the value of * ::HSA_REGION_INFO_RUNTIME_ALLOC_ALIGNMENT in @p region. If the allocation * fails, the returned value is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_REGION The region is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ALLOCATION The host is not allowed to * allocate memory in @p region, or @p size is greater than the value of * HSA_REGION_INFO_ALLOC_MAX_SIZE in @p region. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p ptr is NULL, or @p size is 0. */ hsa_status_t HSA_API hsa_memory_allocate(hsa_region_t region, size_t size, void** ptr); /** * @brief Deallocate a block of memory previously allocated using * ::hsa_memory_allocate. * * @param[in] ptr Pointer to a memory block. If @p ptr does not match a value * previously returned by ::hsa_memory_allocate, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. */ hsa_status_t HSA_API hsa_memory_free(void* ptr); /** * @brief Copy a block of memory from the location pointed to by @p src to the * memory block pointed to by @p dst. * * @param[out] dst Buffer where the content is to be copied. If @p dst is in * coarse-grained memory, the copied data is only visible to the agent currently * assigned (::hsa_memory_assign_agent) to @p dst. * * @param[in] src A valid pointer to the source of data to be copied. The source * buffer must not overlap with the destination buffer. If the source buffer is * in coarse-grained memory then it must be assigned to an agent, from which the * data will be retrieved. * * @param[in] size Number of bytes to copy. If @p size is 0, no copy is * performed and the function returns success. Copying a number of bytes larger * than the size of the buffers pointed by @p dst or @p src results in undefined * behavior. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT The source or destination * pointers are NULL. */ hsa_status_t HSA_API hsa_memory_copy( void *dst, const void *src, size_t size); /** * @brief Change the ownership of a global, coarse-grained buffer. * * @details The contents of a coarse-grained buffer are visible to an agent * only after ownership has been explicitely transferred to that agent. Once the * operation completes, the previous owner cannot longer access the data in the * buffer. * * An implementation of the HSA runtime is allowed, but not required, to change * the physical location of the buffer when ownership is transferred to a * different agent. In general the application must not assume this * behavior. The virtual location (address) of the passed buffer is never * modified. * * @param[in] ptr Base address of a global buffer. The pointer must match an * address previously returned by ::hsa_memory_allocate. The size of the buffer * affected by the ownership change is identical to the size of that previous * allocation. If @p ptr points to a fine-grained global buffer, no operation is * performed and the function returns success. If @p ptr does not point to * global memory, the behavior is undefined. * * @param[in] agent Agent that becomes the owner of the buffer. The * application is responsible for ensuring that @p agent has access to the * region that contains the buffer. It is allowed to change ownership to an * agent that is already the owner of the buffer, with the same or different * access permissions. * * @param[in] access Access permissions requested for the new owner. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p ptr is NULL, or @p access is * not a valid access value. */ hsa_status_t HSA_API hsa_memory_assign_agent( void *ptr, hsa_agent_t agent, hsa_access_permission_t access); /** * * @brief Register a global, fine-grained buffer. * * @details Registering a buffer serves as an indication to the HSA runtime that * the memory might be accessed from a kernel agent other than the * host. Registration is a performance hint that allows the HSA runtime * implementation to know which buffers will be accessed by some of the kernel * agents ahead of time. * * Registration is only recommended for buffers in the global segment that have * not been allocated using the HSA allocator (::hsa_memory_allocate), but an OS * allocator instead. Registering an OS-allocated buffer in the base profile is * equivalent to a no-op. * * Registrations should not overlap. * * @param[in] ptr A buffer in global, fine-grained memory. If a NULL pointer is * passed, no operation is performed. If the buffer has been allocated using * ::hsa_memory_allocate, or has already been registered, no operation is * performed. * * @param[in] size Requested registration size in bytes. A size of 0 is * only allowed if @p ptr is NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p size is 0 but @p ptr * is not NULL. */ hsa_status_t HSA_API hsa_memory_register( void *ptr, size_t size); /** * * @brief Deregister memory previously registered using ::hsa_memory_register. * * @details If the memory interval being deregistered does not match a previous * registration (start and end addresses), the behavior is undefined. * * @param[in] ptr A pointer to the base of the buffer to be deregistered. If * a NULL pointer is passed, no operation is performed. * * @param[in] size Size of the buffer to be deregistered. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * */ hsa_status_t HSA_API hsa_memory_deregister( void *ptr, size_t size); /** @} */ /** \defgroup instruction-set-architecture Instruction Set Architecture. * @{ */ /** * @brief Instruction set architecture. */ typedef struct hsa_isa_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_isa_t; /** * @brief Retrieve a reference to an instruction set architecture handle out of * a symbolic name. * * @param[in] name Vendor-specific name associated with a a particular * instruction set architecture. @p name must start with the vendor name and a * colon (for example, "AMD:"). The rest of the name is vendor-specific. Must be * a NUL-terminated string. * * @param[out] isa Memory location where the HSA runtime stores the ISA handle * corresponding to the given name. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ISA_NAME The given name does not * correspond to any instruction set architecture. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p name is NULL, or @p isa is * NULL. */ hsa_status_t HSA_API hsa_isa_from_name( const char *name, hsa_isa_t *isa); /** * @brief Iterate over the instruction sets supported by the given agent, and * invoke an application-defined callback on every iteration. The iterator is * deterministic: if an agent supports several instruction set architectures, * they are traversed in the same order in every invocation of this function. * * @param[in] agent A valid agent. * * @param[in] callback Callback to be invoked once per instruction set * architecture. The HSA runtime passes two arguments to the callback: the * ISA and the application data. If @p callback returns a status other than * ::HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and * that status value is returned. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_agent_iterate_isas( hsa_agent_t agent, hsa_status_t (*callback)(hsa_isa_t isa, void *data), void *data); /** * @brief Instruction set architecture attributes. */ typedef enum { /** * The length of the ISA name in bytes, not including the NUL terminator. The * type of this attribute is uint32_t. */ HSA_ISA_INFO_NAME_LENGTH = 0, /** * Human-readable description. The type of this attribute is character array * with the length equal to the value of ::HSA_ISA_INFO_NAME_LENGTH attribute. */ HSA_ISA_INFO_NAME = 1, /** * @deprecated * * Number of call conventions supported by the instruction set architecture. * Must be greater than zero. The type of this attribute is uint32_t. */ HSA_ISA_INFO_CALL_CONVENTION_COUNT = 2, /** * @deprecated * * Number of work-items in a wavefront for a given call convention. Must be a * power of 2 in the range [1,256]. The type of this attribute is uint32_t. */ HSA_ISA_INFO_CALL_CONVENTION_INFO_WAVEFRONT_SIZE = 3, /** * @deprecated * * Number of wavefronts per compute unit for a given call convention. In * practice, other factors (for example, the amount of group memory used by a * work-group) may further limit the number of wavefronts per compute * unit. The type of this attribute is uint32_t. */ HSA_ISA_INFO_CALL_CONVENTION_INFO_WAVEFRONTS_PER_COMPUTE_UNIT = 4, /** * Machine models supported by the instruction set architecture. The type of * this attribute is a bool[2]. If the ISA supports the small machine model, * the element at index ::HSA_MACHINE_MODEL_SMALL is true. If the ISA supports * the large model, the element at index ::HSA_MACHINE_MODEL_LARGE is true. */ HSA_ISA_INFO_MACHINE_MODELS = 5, /** * Profiles supported by the instruction set architecture. The type of this * attribute is a bool[2]. If the ISA supports the base profile, the element * at index ::HSA_PROFILE_BASE is true. If the ISA supports the full profile, * the element at index ::HSA_PROFILE_FULL is true. */ HSA_ISA_INFO_PROFILES = 6, /** * Default floating-point rounding modes supported by the instruction set * architecture. The type of this attribute is a bool[3]. The value at a given * index is true if the corresponding rounding mode in * ::hsa_default_float_rounding_mode_t is supported. At least one default mode * has to be supported. * * If the default mode is supported, then * ::HSA_ISA_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES must report that * both the zero and the near roundings modes are supported. */ HSA_ISA_INFO_DEFAULT_FLOAT_ROUNDING_MODES = 7, /** * Default floating-point rounding modes supported by the instruction set * architecture in the Base profile. The type of this attribute is a * bool[3]. The value at a given index is true if the corresponding rounding * mode in ::hsa_default_float_rounding_mode_t is supported. The value at * index HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT must be false. At least one * of the values at indexes ::HSA_DEFAULT_FLOAT_ROUNDING_MODE_ZERO or * HSA_DEFAULT_FLOAT_ROUNDING_MODE_NEAR must be true. */ HSA_ISA_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES = 8, /** * Flag indicating that the f16 HSAIL operation is at least as fast as the * f32 operation in the instruction set architecture. The type of this * attribute is bool. */ HSA_ISA_INFO_FAST_F16_OPERATION = 9, /** * Maximum number of work-items of each dimension of a work-group. Each * maximum must be greater than 0. No maximum can exceed the value of * ::HSA_ISA_INFO_WORKGROUP_MAX_SIZE. The type of this attribute is * uint16_t[3]. */ HSA_ISA_INFO_WORKGROUP_MAX_DIM = 12, /** * Maximum total number of work-items in a work-group. The type * of this attribute is uint32_t. */ HSA_ISA_INFO_WORKGROUP_MAX_SIZE = 13, /** * Maximum number of work-items of each dimension of a grid. Each maximum must * be greater than 0, and must not be smaller than the corresponding value in * ::HSA_ISA_INFO_WORKGROUP_MAX_DIM. No maximum can exceed the value of * ::HSA_ISA_INFO_GRID_MAX_SIZE. The type of this attribute is * ::hsa_dim3_t. */ HSA_ISA_INFO_GRID_MAX_DIM = 14, /** * Maximum total number of work-items in a grid. The type of this * attribute is uint64_t. */ HSA_ISA_INFO_GRID_MAX_SIZE = 16, /** * Maximum number of fbarriers per work-group. Must be at least 32. The * type of this attribute is uint32_t. */ HSA_ISA_INFO_FBARRIER_MAX_SIZE = 17 } hsa_isa_info_t; /** * @deprecated The concept of call convention has been deprecated. If the * application wants to query the value of an attribute for a given instruction * set architecture, use ::hsa_isa_get_info_alt instead. If the application * wants to query an attribute that is specific to a given combination of ISA * and wavefront, use ::hsa_wavefront_get_info. * * @brief Get the current value of an attribute for a given instruction set * architecture (ISA). * * @param[in] isa A valid instruction set architecture. * * @param[in] attribute Attribute to query. * * @param[in] index Call convention index. Used only for call convention * attributes, otherwise ignored. Must have a value between 0 (inclusive) and * the value of the attribute ::HSA_ISA_INFO_CALL_CONVENTION_COUNT (not * inclusive) in @p isa. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ISA The instruction set architecture is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_INDEX The index is out of range. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * instruction set architecture attribute, or @p value is * NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_isa_get_info( hsa_isa_t isa, hsa_isa_info_t attribute, uint32_t index, void *value); /** * @brief Get the current value of an attribute for a given instruction set * architecture (ISA). * * @param[in] isa A valid instruction set architecture. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ISA The instruction set architecture is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * instruction set architecture attribute, or @p value is * NULL. */ hsa_status_t HSA_API hsa_isa_get_info_alt( hsa_isa_t isa, hsa_isa_info_t attribute, void *value); /** * @brief Retrieve the exception policy support for a given combination of * instruction set architecture and profile. * * @param[in] isa A valid instruction set architecture. * * @param[in] profile Profile. * * @param[out] mask Pointer to a memory location where the HSA runtime stores a * mask of ::hsa_exception_policy_t values. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ISA The instruction set architecture is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p profile is not a valid * profile, or @p mask is NULL. */ hsa_status_t HSA_API hsa_isa_get_exception_policies( hsa_isa_t isa, hsa_profile_t profile, uint16_t *mask); /** * @brief Floating-point types. */ typedef enum { /** * 16-bit floating-point type. */ HSA_FP_TYPE_16 = 1, /** * 32-bit floating-point type. */ HSA_FP_TYPE_32 = 2, /** * 64-bit floating-point type. */ HSA_FP_TYPE_64 = 4 } hsa_fp_type_t; /** * @brief Flush to zero modes. */ typedef enum { /** * Flush to zero. */ HSA_FLUSH_MODE_FTZ = 1, /** * Do not flush to zero. */ HSA_FLUSH_MODE_NON_FTZ = 2 } hsa_flush_mode_t; /** * @brief Round methods. */ typedef enum { /** * Single round method. */ HSA_ROUND_METHOD_SINGLE = 1, /** * Double round method. */ HSA_ROUND_METHOD_DOUBLE = 2 } hsa_round_method_t; /** * @brief Retrieve the round method (single or double) used to implement the * floating-point multiply add instruction (mad) for a given combination of * instruction set architecture, floating-point type, and flush to zero * modifier. * * @param[in] isa Instruction set architecture. * * @param[in] fp_type Floating-point type. * * @param[in] flush_mode Flush to zero modifier. * * @param[out] round_method Pointer to a memory location where the HSA * runtime stores the round method used by the implementation. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ISA The instruction set architecture is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p fp_type is not a valid * floating-point type, or @p flush_mode is not a valid flush to zero modifier, * or @p round_method is NULL. */ hsa_status_t HSA_API hsa_isa_get_round_method( hsa_isa_t isa, hsa_fp_type_t fp_type, hsa_flush_mode_t flush_mode, hsa_round_method_t *round_method); /** * @brief Wavefront handle */ typedef struct hsa_wavefront_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_wavefront_t; /** * @brief Wavefront attributes. */ typedef enum { /** * Number of work-items in the wavefront. Must be a power of 2 in the range * [1,256]. The type of this attribute is uint32_t. */ HSA_WAVEFRONT_INFO_SIZE = 0 } hsa_wavefront_info_t; /** * @brief Get the current value of a wavefront attribute. * * @param[in] wavefront A wavefront. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_WAVEFRONT The wavefront is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * wavefront attribute, or @p value is NULL. */ hsa_status_t HSA_API hsa_wavefront_get_info( hsa_wavefront_t wavefront, hsa_wavefront_info_t attribute, void *value); /** * @brief Iterate over the different wavefronts supported by an instruction set * architecture, and invoke an application-defined callback on every iteration. * * @param[in] isa Instruction set architecture. * * @param[in] callback Callback to be invoked once per wavefront that is * supported by the agent. The HSA runtime passes two arguments to the callback: * the wavefront handle and the application data. If @p callback returns a * status other than ::HSA_STATUS_SUCCESS for a particular iteration, the * traversal stops and that value is returned. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ISA The instruction set architecture is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_isa_iterate_wavefronts( hsa_isa_t isa, hsa_status_t (*callback)(hsa_wavefront_t wavefront, void *data), void *data); /** * @deprecated Use ::hsa_agent_iterate_isas to query which instructions set * architectures are supported by a given agent. * * @brief Check if the instruction set architecture of a code object can be * executed on an agent associated with another architecture. * * @param[in] code_object_isa Instruction set architecture associated with a * code object. * * @param[in] agent_isa Instruction set architecture associated with an agent. * * @param[out] result Pointer to a memory location where the HSA runtime stores * the result of the check. If the two architectures are compatible, the result * is true; if they are incompatible, the result is false. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ISA @p code_object_isa or @p agent_isa are * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p result is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_isa_compatible( hsa_isa_t code_object_isa, hsa_isa_t agent_isa, bool *result); /** @} */ /** \defgroup executable Executable * @{ */ /** * @brief Code object reader handle. A code object reader is used to * load a code object from file (when created using * ::hsa_code_object_reader_create_from_file), or from memory (if created using * ::hsa_code_object_reader_create_from_memory). */ typedef struct hsa_code_object_reader_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_code_object_reader_t; /** * @brief Create a code object reader to operate on a file. * * @param[in] file File descriptor. The file must have been opened by * application with at least read permissions prior calling this function. The * file must contain a vendor-specific code object. * * The file is owned and managed by the application; the lifetime of the file * descriptor must exceed that of any associated code object reader. * * @param[out] code_object_reader Memory location to store the newly created * code object reader handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_FILE @p file is invalid. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p code_object_reader is NULL. */ hsa_status_t HSA_API hsa_code_object_reader_create_from_file( hsa_file_t file, hsa_code_object_reader_t *code_object_reader); /** * @brief Create a code object reader to operate on memory. * * @param[in] code_object Memory buffer that contains a vendor-specific code * object. The buffer is owned and managed by the application; the lifetime of * the buffer must exceed that of any associated code object reader. * * @param[in] size Size of the buffer pointed to by @p code_object. Must not be * 0. * * @param[out] code_object_reader Memory location to store newly created code * object reader handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p code_object is NULL, @p size * is zero, or @p code_object_reader is NULL. */ hsa_status_t HSA_API hsa_code_object_reader_create_from_memory( const void *code_object, size_t size, hsa_code_object_reader_t *code_object_reader); /** * @brief Destroy a code object reader. * * @details The code object reader handle becomes invalid after completion of * this function. Any file or memory used to create the code object read is not * closed, removed, or deallocated by this function. * * @param[in] code_object_reader Code object reader to destroy. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER @p code_object_reader * is invalid. */ hsa_status_t HSA_API hsa_code_object_reader_destroy( hsa_code_object_reader_t code_object_reader); /** * @brief Struct containing an opaque handle to an executable, which contains * ISA for finalized kernels and indirect functions together with the allocated * global or readonly segment variables they reference. */ typedef struct hsa_executable_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_executable_t; /** * @brief Executable state. */ typedef enum { /** * Executable state, which allows the user to load code objects and define * external variables. Variable addresses, kernel code handles, and * indirect function code handles are not available in query operations until * the executable is frozen (zero always returned). */ HSA_EXECUTABLE_STATE_UNFROZEN = 0, /** * Executable state, which allows the user to query variable addresses, * kernel code handles, and indirect function code handles using query * operations. Loading new code objects, as well as defining external * variables, is not allowed in this state. */ HSA_EXECUTABLE_STATE_FROZEN = 1 } hsa_executable_state_t; /** * @deprecated Use ::hsa_executable_create_alt instead, which allows the * application to specify the default floating-point rounding mode of the * executable and assumes an unfrozen initial state. * * @brief Create an empty executable. * * @param[in] profile Profile used in the executable. * * @param[in] executable_state Executable state. If the state is * ::HSA_EXECUTABLE_STATE_FROZEN, the resulting executable is useless because no * code objects can be loaded, and no variables can be defined. * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @param[out] executable Memory location where the HSA runtime stores the newly * created executable handle. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p profile is invalid, or * @p executable is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_executable_create( hsa_profile_t profile, hsa_executable_state_t executable_state, const char *options, hsa_executable_t *executable); /** * @brief Create an empty executable. * * @param[in] profile Profile used in the executable. * * @param[in] default_float_rounding_mode Default floating-point rounding mode * used in the executable. Allowed rounding modes are near and zero (default is * not allowed). * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @param[out] executable Memory location where the HSA runtime stores newly * created executable handle. The initial state of the executable is * ::HSA_EXECUTABLE_STATE_UNFROZEN. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p profile is invalid, or * @p executable is NULL. */ hsa_status_t HSA_API hsa_executable_create_alt( hsa_profile_t profile, hsa_default_float_rounding_mode_t default_float_rounding_mode, const char *options, hsa_executable_t *executable); /** * @brief Destroy an executable. * * @details An executable handle becomes invalid after the executable has been * destroyed. Code object handles that were loaded into this executable are * still valid after the executable has been destroyed, and can be used as * intended. Resources allocated outside and associated with this executable * (such as external global or readonly variables) can be released after the * executable has been destroyed. * * Executable should not be destroyed while kernels are in flight. * * @param[in] executable Executable. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. */ hsa_status_t HSA_API hsa_executable_destroy( hsa_executable_t executable); /** * @brief Loaded code object handle. */ typedef struct hsa_loaded_code_object_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_loaded_code_object_t; /** * @brief Load a program code object into an executable. * * @details A program code object contains information about resources that are * accessible by all kernel agents that run the executable, and can be loaded * at most once into an executable. * * If the program code object uses extensions, the implementation must support * them for this operation to return successfully. * * @param[in] executable Executable. * * @param[in] code_object_reader A code object reader that holds the program * code object to load. If a code object reader is destroyed before all the * associated executables are destroyed, the behavior is undefined. * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @param[out] loaded_code_object Pointer to a memory location where the HSA * runtime stores the loaded code object handle. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_FROZEN_EXECUTABLE The executable is frozen. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER @p code_object_reader * is invalid. * * @retval ::HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS The program code object is * not compatible with the executable or the implementation (for example, the * code object uses an extension that is not supported by the implementation). */ hsa_status_t HSA_API hsa_executable_load_program_code_object( hsa_executable_t executable, hsa_code_object_reader_t code_object_reader, const char *options, hsa_loaded_code_object_t *loaded_code_object); /** * @brief Load an agent code object into an executable. * * @details The agent code object contains all defined agent * allocation variables, functions, indirect functions, and kernels in a given * program for a given instruction set architecture. * * Any module linkage declaration must have been defined either by a define * variable or by loading a code object that has a symbol with module linkage * definition. * * The default floating-point rounding mode of the code object associated with * @p code_object_reader must match that of the executable * (::HSA_EXECUTABLE_INFO_DEFAULT_FLOAT_ROUNDING_MODE), or be default (in which * case the value of ::HSA_EXECUTABLE_INFO_DEFAULT_FLOAT_ROUNDING_MODE is used). * If the agent code object uses extensions, the implementation and the agent * must support them for this operation to return successfully. * * @param[in] executable Executable. * * @param[in] agent Agent to load code object for. A code object can be loaded * into an executable at most once for a given agent. The instruction set * architecture of the code object must be supported by the agent. * * @param[in] code_object_reader A code object reader that holds the code object * to load. If a code object reader is destroyed before all the associated * executables are destroyed, the behavior is undefined. * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @param[out] loaded_code_object Pointer to a memory location where the HSA * runtime stores the loaded code object handle. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_FROZEN_EXECUTABLE The executable is frozen. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER @p code_object_reader * is invalid. * * @retval ::HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS The code object read by @p * code_object_reader is not compatible with the agent (for example, the agent * does not support the instruction set architecture of the code object), the * executable (for example, there is a default floating-point mode mismatch * between the two), or the implementation. */ hsa_status_t HSA_API hsa_executable_load_agent_code_object( hsa_executable_t executable, hsa_agent_t agent, hsa_code_object_reader_t code_object_reader, const char *options, hsa_loaded_code_object_t *loaded_code_object); /** * @brief Freeze the executable. * * @details No modifications to executable can be made after freezing: no code * objects can be loaded to the executable, and no external variables can be * defined. Freezing the executable does not prevent querying the executable's * attributes. The application must define all the external variables in an * executable before freezing it. * * @param[in] executable Executable. * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_VARIABLE_UNDEFINED One or more variables are * undefined in the executable. * * @retval ::HSA_STATUS_ERROR_FROZEN_EXECUTABLE @p executable is already frozen. */ hsa_status_t HSA_API hsa_executable_freeze( hsa_executable_t executable, const char *options); /** * @brief Executable attributes. */ typedef enum { /** * Profile this executable is created for. The type of this attribute is * ::hsa_profile_t. */ HSA_EXECUTABLE_INFO_PROFILE = 1, /** * Executable state. The type of this attribute is ::hsa_executable_state_t. */ HSA_EXECUTABLE_INFO_STATE = 2, /** * Default floating-point rounding mode specified when executable was created. * The type of this attribute is ::hsa_default_float_rounding_mode_t. */ HSA_EXECUTABLE_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 3 } hsa_executable_info_t; /** * @brief Get the current value of an attribute for a given executable. * * @param[in] executable Executable. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * executable attribute, or @p value is NULL. */ hsa_status_t HSA_API hsa_executable_get_info( hsa_executable_t executable, hsa_executable_info_t attribute, void *value); /** * @brief Define an external global variable with program allocation. * * @details This function allows the application to provide the definition * of a variable in the global segment memory with program allocation. The * variable must be defined before loading a code object into an executable. * In addition, code objects loaded must not define the variable. * * @param[in] executable Executable. Must not be in frozen state. * * @param[in] variable_name Name of the variable. The Programmer's Reference * Manual describes the standard name mangling scheme. * * @param[in] address Address where the variable is defined. This address must * be in global memory and can be read and written by any agent in the * system. The application cannot deallocate the buffer pointed by @p address * before @p executable is destroyed. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED The variable is * already defined. * * @retval ::HSA_STATUS_ERROR_INVALID_SYMBOL_NAME There is no variable with the * @p variable_name. * * @retval ::HSA_STATUS_ERROR_FROZEN_EXECUTABLE @p executable is frozen. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p variable_name is NULL. */ hsa_status_t HSA_API hsa_executable_global_variable_define( hsa_executable_t executable, const char *variable_name, void *address); /** * @brief Define an external global variable with agent allocation. * * @details This function allows the application to provide the definition * of a variable in the global segment memory with agent allocation. The * variable must be defined before loading a code object into an executable. * In addition, code objects loaded must not define the variable. * * @param[in] executable Executable. Must not be in frozen state. * * @param[in] agent Agent for which the variable is being defined. * * @param[in] variable_name Name of the variable. The Programmer's Reference * Manual describes the standard name mangling scheme. * * @param[in] address Address where the variable is defined. This address must * have been previously allocated using ::hsa_memory_allocate in a global region * that is only visible to @p agent. The application cannot deallocate the * buffer pointed by @p address before @p executable is destroyed. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT @p agent is invalid. * * @retval ::HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED The variable is * already defined. * * @retval ::HSA_STATUS_ERROR_INVALID_SYMBOL_NAME There is no variable with the * @p variable_name. * * @retval ::HSA_STATUS_ERROR_FROZEN_EXECUTABLE @p executable is frozen. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p variable_name is NULL. */ hsa_status_t HSA_API hsa_executable_agent_global_variable_define( hsa_executable_t executable, hsa_agent_t agent, const char *variable_name, void *address); /** * @brief Define an external readonly variable. * * @details This function allows the application to provide the definition * of a variable in the readonly segment memory. The variable must be defined * before loading a code object into an executable. In addition, code objects * loaded must not define the variable. * * @param[in] executable Executable. Must not be in frozen state. * * @param[in] agent Agent for which the variable is being defined. * * @param[in] variable_name Name of the variable. The Programmer's Reference * Manual describes the standard name mangling scheme. * * @param[in] address Address where the variable is defined. This address must * have been previously allocated using ::hsa_memory_allocate in a readonly * region associated with @p agent. The application cannot deallocate the buffer * pointed by @p address before @p executable is destroyed. * * @param[in] address Address where the variable is defined. The buffer pointed * by @p address is owned by the application, and cannot be deallocated before * @p executable is destroyed. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE Executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT @p agent is invalid. * * @retval ::HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED The variable is * already defined. * * @retval ::HSA_STATUS_ERROR_INVALID_SYMBOL_NAME There is no variable with the * @p variable_name. * * @retval ::HSA_STATUS_ERROR_FROZEN_EXECUTABLE @p executable is frozen. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p variable_name is NULL. */ hsa_status_t HSA_API hsa_executable_readonly_variable_define( hsa_executable_t executable, hsa_agent_t agent, const char *variable_name, void *address); /** * @brief Validate an executable. Checks that all code objects have matching * machine model, profile, and default floating-point rounding mode. Checks that * all declarations have definitions. Checks declaration-definition * compatibility (see the HSA Programming Reference Manual for compatibility * rules). Invoking this function is equivalent to invoking * ::hsa_executable_validate_alt with no options. * * @param[in] executable Executable. Must be in frozen state. * * @param[out] result Memory location where the HSA runtime stores the * validation result. If the executable passes validation, the result is 0. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE @p executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p result is NULL. */ hsa_status_t HSA_API hsa_executable_validate( hsa_executable_t executable, uint32_t *result); /** * @brief Validate an executable. Checks that all code objects have matching * machine model, profile, and default floating-point rounding mode. Checks that * all declarations have definitions. Checks declaration-definition * compatibility (see the HSA Programming Reference Manual for compatibility * rules). * * @param[in] executable Executable. Must be in frozen state. * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @param[out] result Memory location where the HSA runtime stores the * validation result. If the executable passes validation, the result is 0. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE @p executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p result is NULL. */ hsa_status_t HSA_API hsa_executable_validate_alt( hsa_executable_t executable, const char *options, uint32_t *result); /** * @brief Executable symbol handle. * * The lifetime of an executable object symbol matches that of the executable * associated with it. An operation on a symbol whose associated executable has * been destroyed results in undefined behavior. */ typedef struct hsa_executable_symbol_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_executable_symbol_t; /** * @deprecated Use ::hsa_executable_get_symbol_by_name instead. * * @brief Get the symbol handle for a given a symbol name. * * @param[in] executable Executable. * * @param[in] module_name Module name. Must be NULL if the symbol has * program linkage. * * @param[in] symbol_name Symbol name. * * @param[in] agent Agent associated with the symbol. If the symbol is * independent of any agent (for example, a variable with program * allocation), this argument is ignored. * * @param[in] call_convention Call convention associated with the symbol. If the * symbol does not correspond to an indirect function, this argument is ignored. * * @param[out] symbol Memory location where the HSA runtime stores the symbol * handle. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_SYMBOL_NAME There is no symbol with a name * that matches @p symbol_name. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p symbol_name is NULL, or * @p symbol is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_executable_get_symbol( hsa_executable_t executable, const char *module_name, const char *symbol_name, hsa_agent_t agent, int32_t call_convention, hsa_executable_symbol_t *symbol); /** * @brief Retrieve the symbol handle corresponding to a given a symbol name. * * @param[in] executable Executable. * * @param[in] symbol_name Symbol name. Must be a NUL-terminated character * array. The Programmer's Reference Manual describes the standard name mangling * scheme. * * @param[in] agent Pointer to the agent for which the symbol with the given * name is defined. If the symbol corresponding to the given name has program * allocation, @p agent must be NULL. * * @param[out] symbol Memory location where the HSA runtime stores the symbol * handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_SYMBOL_NAME There is no symbol with a name * that matches @p symbol_name. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p symbol_name is NULL, or @p * symbol is NULL. */ hsa_status_t HSA_API hsa_executable_get_symbol_by_name( hsa_executable_t executable, const char *symbol_name, const hsa_agent_t *agent, hsa_executable_symbol_t *symbol); /** * @brief Symbol type. */ typedef enum { /** * Variable. */ HSA_SYMBOL_KIND_VARIABLE = 0, /** * Kernel. */ HSA_SYMBOL_KIND_KERNEL = 1, /** * Indirect function. */ HSA_SYMBOL_KIND_INDIRECT_FUNCTION = 2 } hsa_symbol_kind_t; /** * @brief Linkage type of a symbol. */ typedef enum { /** * Module linkage. */ HSA_SYMBOL_LINKAGE_MODULE = 0, /** * Program linkage. */ HSA_SYMBOL_LINKAGE_PROGRAM = 1 } hsa_symbol_linkage_t; /** * @brief Allocation type of a variable. */ typedef enum { /** * Agent allocation. */ HSA_VARIABLE_ALLOCATION_AGENT = 0, /** * Program allocation. */ HSA_VARIABLE_ALLOCATION_PROGRAM = 1 } hsa_variable_allocation_t; /** * @brief Memory segment associated with a variable. */ typedef enum { /** * Global memory segment. */ HSA_VARIABLE_SEGMENT_GLOBAL = 0, /** * Readonly memory segment. */ HSA_VARIABLE_SEGMENT_READONLY = 1 } hsa_variable_segment_t; /** * @brief Executable symbol attributes. */ typedef enum { /** * The kind of the symbol. The type of this attribute is ::hsa_symbol_kind_t. */ HSA_EXECUTABLE_SYMBOL_INFO_TYPE = 0, /** * The length of the symbol name in bytes, not including the NUL terminator. * The type of this attribute is uint32_t. */ HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH = 1, /** * The name of the symbol. The type of this attribute is character array with * the length equal to the value of ::HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH * attribute. */ HSA_EXECUTABLE_SYMBOL_INFO_NAME = 2, /** * @deprecated * * The length of the module name in bytes (not including the NUL terminator) * to which this symbol belongs if this symbol has module linkage, otherwise 0 * is returned. The type of this attribute is uint32_t. */ HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME_LENGTH = 3, /** * @deprecated * * The module name to which this symbol belongs if this symbol has module * linkage, otherwise an empty string is returned. The type of this attribute * is character array with the length equal to the value of * ::HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME_LENGTH attribute. */ HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME = 4, /** * @deprecated * * Agent associated with this symbol. If the symbol is a variable, the * value of this attribute is only defined if * ::HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALLOCATION is * ::HSA_VARIABLE_ALLOCATION_AGENT. The type of this attribute is hsa_agent_t. */ HSA_EXECUTABLE_SYMBOL_INFO_AGENT = 20, /** * The address of the variable. The value of this attribute is undefined if * the symbol is not a variable. The type of this attribute is uint64_t. * * If executable's state is ::HSA_EXECUTABLE_STATE_UNFROZEN, then 0 is * returned. */ HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS = 21, /** * The linkage kind of the symbol. The type of this attribute is * ::hsa_symbol_linkage_t. */ HSA_EXECUTABLE_SYMBOL_INFO_LINKAGE = 5, /** * Indicates whether the symbol corresponds to a definition. The type of this * attribute is bool. */ HSA_EXECUTABLE_SYMBOL_INFO_IS_DEFINITION = 17, /** * @deprecated * * The allocation kind of the variable. The value of this attribute is * undefined if the symbol is not a variable. The type of this attribute is * ::hsa_variable_allocation_t. */ HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALLOCATION = 6, /** * @deprecated * * The segment kind of the variable. The value of this attribute is undefined * if the symbol is not a variable. The type of this attribute is * ::hsa_variable_segment_t. */ HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SEGMENT = 7, /** * @deprecated * * Alignment of the symbol in memory. The value of this attribute is undefined * if the symbol is not a variable. The type of this attribute is uint32_t. * * The current alignment of the variable in memory may be greater than the * value specified in the source program variable declaration. */ HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALIGNMENT = 8, /** * @deprecated * * Size of the variable. The value of this attribute is undefined if * the symbol is not a variable. The type of this attribute is uint32_t. * * A value of 0 is returned if the variable is an external variable and has an * unknown dimension. */ HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE = 9, /** * @deprecated * * Indicates whether the variable is constant. The value of this attribute is * undefined if the symbol is not a variable. The type of this attribute is * bool. */ HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_IS_CONST = 10, /** * Kernel object handle, used in the kernel dispatch packet. The value of this * attribute is undefined if the symbol is not a kernel. The type of this * attribute is uint64_t. * * If the state of the executable is ::HSA_EXECUTABLE_STATE_UNFROZEN, then 0 * is returned. */ HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT = 22, /** * Size of kernarg segment memory that is required to hold the values of the * kernel arguments, in bytes. Must be a multiple of 16. The value of this * attribute is undefined if the symbol is not a kernel. The type of this * attribute is uint32_t. */ HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE = 11, /** * Alignment (in bytes) of the buffer used to pass arguments to the kernel, * which is the maximum of 16 and the maximum alignment of any of the kernel * arguments. The value of this attribute is undefined if the symbol is not a * kernel. The type of this attribute is uint32_t. */ HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT = 12, /** * Size of static group segment memory required by the kernel (per * work-group), in bytes. The value of this attribute is undefined * if the symbol is not a kernel. The type of this attribute is uint32_t. * * The reported amount does not include any dynamically allocated group * segment memory that may be requested by the application when a kernel is * dispatched. */ HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE = 13, /** * Size of static private, spill, and arg segment memory required by * this kernel (per work-item), in bytes. The value of this attribute is * undefined if the symbol is not a kernel. The type of this attribute is * uint32_t. * * If the value of ::HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK is * true, the kernel may use more private memory than the reported value, and * the application must add the dynamic call stack usage to @a * private_segment_size when populating a kernel dispatch packet. */ HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE = 14, /** * Dynamic callstack flag. The value of this attribute is undefined if the * symbol is not a kernel. The type of this attribute is bool. * * If this flag is set (the value is true), the kernel uses a dynamically * sized call stack. This can happen if recursive calls, calls to indirect * functions, or the HSAIL alloca instruction are present in the kernel. */ HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK = 15, /** * @deprecated * * Call convention of the kernel. The value of this attribute is undefined if * the symbol is not a kernel. The type of this attribute is uint32_t. */ HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_CALL_CONVENTION = 18, /** * Indirect function object handle. The value of this attribute is undefined * if the symbol is not an indirect function, or the associated agent does * not support the Full Profile. The type of this attribute depends on the * machine model: the type is uint32_t for small machine model, and uint64_t * for large model. * * If the state of the executable is ::HSA_EXECUTABLE_STATE_UNFROZEN, then 0 * is returned. */ HSA_EXECUTABLE_SYMBOL_INFO_INDIRECT_FUNCTION_OBJECT = 23, /** * @deprecated * * Call convention of the indirect function. The value of this attribute is * undefined if the symbol is not an indirect function, or the associated * agent does not support the Full Profile. The type of this attribute is * uint32_t. */ HSA_EXECUTABLE_SYMBOL_INFO_INDIRECT_FUNCTION_CALL_CONVENTION = 16 } hsa_executable_symbol_info_t; /** * @brief Get the current value of an attribute for a given executable symbol. * * @param[in] executable_symbol Executable symbol. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE_SYMBOL The executable symbol is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * executable symbol attribute, or @p value is NULL. */ hsa_status_t HSA_API hsa_executable_symbol_get_info( hsa_executable_symbol_t executable_symbol, hsa_executable_symbol_info_t attribute, void *value); /** * @deprecated * * @brief Iterate over the symbols in a executable, and invoke an * application-defined callback on every iteration. * * @param[in] executable Executable. * * @param[in] callback Callback to be invoked once per executable symbol. The * HSA runtime passes three arguments to the callback: the executable, a symbol, * and the application data. If @p callback returns a status other than * ::HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and * ::hsa_executable_iterate_symbols returns that status value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_executable_iterate_symbols( hsa_executable_t executable, hsa_status_t (*callback)(hsa_executable_t exec, hsa_executable_symbol_t symbol, void *data), void *data); /** * @brief Iterate over the kernels, indirect functions, and agent allocation * variables in an executable for a given agent, and invoke an application- * defined callback on every iteration. * * @param[in] executable Executable. * * @param[in] agent Agent. * * @param[in] callback Callback to be invoked once per executable symbol. The * HSA runtime passes three arguments to the callback: the executable, a symbol, * and the application data. If @p callback returns a status other than * ::HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and * ::hsa_executable_iterate_symbols returns that status value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_executable_iterate_agent_symbols( hsa_executable_t executable, hsa_agent_t agent, hsa_status_t (*callback)(hsa_executable_t exec, hsa_agent_t agent, hsa_executable_symbol_t symbol, void *data), void *data); /** * @brief Iterate over the program allocation variables in an executable, and * invoke an application-defined callback on every iteration. * * @param[in] executable Executable. * * @param[in] callback Callback to be invoked once per executable symbol. The * HSA runtime passes three arguments to the callback: the executable, a symbol, * and the application data. If @p callback returns a status other than * ::HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and * ::hsa_executable_iterate_symbols returns that status value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_executable_iterate_program_symbols( hsa_executable_t executable, hsa_status_t (*callback)(hsa_executable_t exec, hsa_executable_symbol_t symbol, void *data), void *data); /** @} */ /** \defgroup code-object Code Objects (deprecated). * @{ */ /** * @deprecated * * @brief Struct containing an opaque handle to a code object, which contains * ISA for finalized kernels and indirect functions together with information * about the global or readonly segment variables they reference. */ typedef struct hsa_code_object_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_code_object_t; /** * @deprecated * * @brief Application data handle that is passed to the serialization * and deserialization functions. */ typedef struct hsa_callback_data_s { /** * Opaque handle. */ uint64_t handle; } hsa_callback_data_t; /** * @deprecated * * @brief Serialize a code object. Can be used for offline finalization, * install-time finalization, disk code caching, etc. * * @param[in] code_object Code object. * * @param[in] alloc_callback Callback function for memory allocation. Must not * be NULL. The HSA runtime passes three arguments to the callback: the * allocation size, the application data, and a pointer to a memory location * where the application stores the allocation result. The HSA runtime invokes * @p alloc_callback once to allocate a buffer that contains the serialized * version of @p code_object. If the callback returns a status code other than * ::HSA_STATUS_SUCCESS, this function returns the same code. * * @param[in] callback_data Application data that is passed to @p * alloc_callback. May be NULL. * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @param[out] serialized_code_object Memory location where the HSA runtime * stores a pointer to the serialized code object. Must not be NULL. * * @param[out] serialized_code_object_size Memory location where the HSA runtime * stores the size (in bytes) of @p serialized_code_object. The returned value * matches the allocation size passed by the HSA runtime to @p * alloc_callback. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT @p code_object is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p alloc_callback, @p * serialized_code_object, or @p serialized_code_object_size are NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_code_object_serialize( hsa_code_object_t code_object, hsa_status_t (*alloc_callback)(size_t size, hsa_callback_data_t data, void **address), hsa_callback_data_t callback_data, const char *options, void **serialized_code_object, size_t *serialized_code_object_size); /** * @deprecated * * @brief Deserialize a code object. * * @param[in] serialized_code_object A serialized code object. Must not be NULL. * * @param[in] serialized_code_object_size The size (in bytes) of @p * serialized_code_object. Must not be 0. * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @param[out] code_object Memory location where the HSA runtime stores the * deserialized code object. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p serialized_code_object, or @p * code_object are NULL, or @p serialized_code_object_size is 0. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_code_object_deserialize( void *serialized_code_object, size_t serialized_code_object_size, const char *options, hsa_code_object_t *code_object); /** * @deprecated * * @brief Destroy a code object. * * @details The lifetime of a code object must exceed that of any executable * where it has been loaded. If an executable that loaded @p code_object has not * been destroyed, the behavior is undefined. * * @param[in] code_object Code object. The handle becomes invalid after it has * been destroyed. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT @p code_object is invalid. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_code_object_destroy( hsa_code_object_t code_object); /** * @deprecated * * @brief Code object type. */ typedef enum { /** * Produces code object that contains ISA for all kernels and indirect * functions in HSA source. */ HSA_CODE_OBJECT_TYPE_PROGRAM = 0 } hsa_code_object_type_t; /** * @deprecated * * @brief Code object attributes. */ typedef enum { /** * The version of the code object. The type of this attribute is a * NUL-terminated char[64]. The name must be at most 63 characters long (not * including the NUL terminator) and all array elements not used for the name * must be NUL. */ HSA_CODE_OBJECT_INFO_VERSION = 0, /** * Type of code object. The type of this attribute is * ::hsa_code_object_type_t. */ HSA_CODE_OBJECT_INFO_TYPE = 1, /** * Instruction set architecture this code object is produced for. The type of * this attribute is ::hsa_isa_t. */ HSA_CODE_OBJECT_INFO_ISA = 2, /** * Machine model this code object is produced for. The type of this attribute * is ::hsa_machine_model_t. */ HSA_CODE_OBJECT_INFO_MACHINE_MODEL = 3, /** * Profile this code object is produced for. The type of this attribute is * ::hsa_profile_t. */ HSA_CODE_OBJECT_INFO_PROFILE = 4, /** * Default floating-point rounding mode used when the code object is * produced. The type of this attribute is * ::hsa_default_float_rounding_mode_t. */ HSA_CODE_OBJECT_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 5 } hsa_code_object_info_t; /** * @deprecated * * @brief Get the current value of an attribute for a given code object. * * @param[in] code_object Code object. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT @p code_object is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * code object attribute, or @p value is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_code_object_get_info( hsa_code_object_t code_object, hsa_code_object_info_t attribute, void *value); /** * @deprecated * * @brief Load code object into the executable. * * @details Every global or readonly variable that is external must be defined * before loading the code object. An internal global or readonly variable is * allocated once the code object, that is being loaded, references this * variable and this variable is not allocated. * * Any module linkage declaration must have been defined either by a define * variable or by loading a code object that has a symbol with module linkage * definition. * * @param[in] executable Executable. * * @param[in] agent Agent to load code object for. The agent must support the * default floating-point rounding mode used by @p code_object. * * @param[in] code_object Code object to load. The lifetime of the code object * must exceed that of the executable: if @p code_object is destroyed before @p * executable, the behavior is undefined. * * @param[in] options Standard and vendor-specific options. Unknown options are * ignored. A standard option begins with the "-hsa_" prefix. Options beginning * with the "-hsa_ext__" prefix are reserved for extensions. A * vendor-specific option begins with the "-_" prefix. Must be a * NUL-terminated string. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT @p code_object is invalid. * * @retval ::HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS @p agent is not compatible * with @p code_object (for example, @p agent does not support the default * floating-point rounding mode specified by @p code_object), or @p code_object * is not compatible with @p executable (for example, @p code_object and @p * executable have different machine models or profiles). * * @retval ::HSA_STATUS_ERROR_FROZEN_EXECUTABLE @p executable is frozen. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_executable_load_code_object( hsa_executable_t executable, hsa_agent_t agent, hsa_code_object_t code_object, const char *options); /** * @deprecated * * @brief Code object symbol handle. * * The lifetime of a code object symbol matches that of the code object * associated with it. An operation on a symbol whose associated code object has * been destroyed results in undefined behavior. */ typedef struct hsa_code_symbol_s { /** * Opaque handle. Two handles reference the same object of the enclosing type * if and only if they are equal. */ uint64_t handle; } hsa_code_symbol_t; /** * @deprecated * * @brief Get the symbol handle within a code object for a given a symbol name. * * @param[in] code_object Code object. * * @param[in] symbol_name Symbol name. * * @param[out] symbol Memory location where the HSA runtime stores the symbol * handle. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT @p code_object is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_SYMBOL_NAME There is no symbol with a name * that matches @p symbol_name. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p symbol_name is NULL, or * @p symbol is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_code_object_get_symbol( hsa_code_object_t code_object, const char *symbol_name, hsa_code_symbol_t *symbol); /** * @deprecated * * @brief Get the symbol handle within a code object for a given a symbol name. * * @param[in] code_object Code object. * * @param[in] module_name Module name. Must be NULL if the symbol has * program linkage. * * @param[in] symbol_name Symbol name. * * @param[out] symbol Memory location where the HSA runtime stores the symbol * handle. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT @p code_object is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_SYMBOL_NAME There is no symbol with a name * that matches @p symbol_name. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p symbol_name is NULL, or * @p symbol is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_code_object_get_symbol_from_name( hsa_code_object_t code_object, const char *module_name, const char *symbol_name, hsa_code_symbol_t *symbol); /** * @deprecated * * @brief Code object symbol attributes. */ typedef enum { /** * The type of the symbol. The type of this attribute is ::hsa_symbol_kind_t. */ HSA_CODE_SYMBOL_INFO_TYPE = 0, /** * The length of the symbol name in bytes, not including the NUL terminator. * The type of this attribute is uint32_t. */ HSA_CODE_SYMBOL_INFO_NAME_LENGTH = 1, /** * The name of the symbol. The type of this attribute is character array with * the length equal to the value of ::HSA_CODE_SYMBOL_INFO_NAME_LENGTH * attribute. */ HSA_CODE_SYMBOL_INFO_NAME = 2, /** * The length of the module name in bytes (not including the NUL terminator) * to which this symbol belongs if this symbol has module linkage, otherwise 0 * is returned. The type of this attribute is uint32_t. */ HSA_CODE_SYMBOL_INFO_MODULE_NAME_LENGTH = 3, /** * The module name to which this symbol belongs if this symbol has module * linkage, otherwise an empty string is returned. The type of this attribute * is character array with the length equal to the value of * ::HSA_CODE_SYMBOL_INFO_MODULE_NAME_LENGTH attribute. */ HSA_CODE_SYMBOL_INFO_MODULE_NAME = 4, /** * The linkage kind of the symbol. The type of this attribute is * ::hsa_symbol_linkage_t. */ HSA_CODE_SYMBOL_INFO_LINKAGE = 5, /** * Indicates whether the symbol corresponds to a definition. The type of this * attribute is bool. */ HSA_CODE_SYMBOL_INFO_IS_DEFINITION = 17, /** * The allocation kind of the variable. The value of this attribute is * undefined if the symbol is not a variable. The type of this attribute is * ::hsa_variable_allocation_t. */ HSA_CODE_SYMBOL_INFO_VARIABLE_ALLOCATION = 6, /** * The segment kind of the variable. The value of this attribute is * undefined if the symbol is not a variable. The type of this attribute is * ::hsa_variable_segment_t. */ HSA_CODE_SYMBOL_INFO_VARIABLE_SEGMENT = 7, /** * Alignment of the symbol in memory. The value of this attribute is undefined * if the symbol is not a variable. The type of this attribute is uint32_t. * * The current alignment of the variable in memory may be greater than the * value specified in the source program variable declaration. */ HSA_CODE_SYMBOL_INFO_VARIABLE_ALIGNMENT = 8, /** * Size of the variable. The value of this attribute is undefined if the * symbol is not a variable. The type of this attribute is uint32_t. * * A size of 0 is returned if the variable is an external variable and has an * unknown dimension. */ HSA_CODE_SYMBOL_INFO_VARIABLE_SIZE = 9, /** * Indicates whether the variable is constant. The value of this attribute is * undefined if the symbol is not a variable. The type of this attribute is * bool. */ HSA_CODE_SYMBOL_INFO_VARIABLE_IS_CONST = 10, /** * Size of kernarg segment memory that is required to hold the values of the * kernel arguments, in bytes. Must be a multiple of 16. The value of this * attribute is undefined if the symbol is not a kernel. The type of this * attribute is uint32_t. */ HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE = 11, /** * Alignment (in bytes) of the buffer used to pass arguments to the kernel, * which is the maximum of 16 and the maximum alignment of any of the kernel * arguments. The value of this attribute is undefined if the symbol is not a * kernel. The type of this attribute is uint32_t. */ HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT = 12, /** * Size of static group segment memory required by the kernel (per * work-group), in bytes. The value of this attribute is undefined * if the symbol is not a kernel. The type of this attribute is uint32_t. * * The reported amount does not include any dynamically allocated group * segment memory that may be requested by the application when a kernel is * dispatched. */ HSA_CODE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE = 13, /** * Size of static private, spill, and arg segment memory required by * this kernel (per work-item), in bytes. The value of this attribute is * undefined if the symbol is not a kernel. The type of this attribute is * uint32_t. * * If the value of ::HSA_CODE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK is true, * the kernel may use more private memory than the reported value, and the * application must add the dynamic call stack usage to @a * private_segment_size when populating a kernel dispatch packet. */ HSA_CODE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE = 14, /** * Dynamic callstack flag. The value of this attribute is undefined if the * symbol is not a kernel. The type of this attribute is bool. * * If this flag is set (the value is true), the kernel uses a dynamically * sized call stack. This can happen if recursive calls, calls to indirect * functions, or the HSAIL alloca instruction are present in the kernel. */ HSA_CODE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK = 15, /** * Call convention of the kernel. The value of this attribute is undefined if * the symbol is not a kernel. The type of this attribute is uint32_t. */ HSA_CODE_SYMBOL_INFO_KERNEL_CALL_CONVENTION = 18, /** * Call convention of the indirect function. The value of this attribute is * undefined if the symbol is not an indirect function. The type of this * attribute is uint32_t. */ HSA_CODE_SYMBOL_INFO_INDIRECT_FUNCTION_CALL_CONVENTION = 16 } hsa_code_symbol_info_t; /** * @deprecated * * @brief Get the current value of an attribute for a given code symbol. * * @param[in] code_symbol Code symbol. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_SYMBOL The code symbol is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * code symbol attribute, or @p value is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_code_symbol_get_info( hsa_code_symbol_t code_symbol, hsa_code_symbol_info_t attribute, void *value); /** * @deprecated * * @brief Iterate over the symbols in a code object, and invoke an * application-defined callback on every iteration. * * @param[in] code_object Code object. * * @param[in] callback Callback to be invoked once per code object symbol. The * HSA runtime passes three arguments to the callback: the code object, a * symbol, and the application data. If @p callback returns a status other than * ::HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and * ::hsa_code_object_iterate_symbols returns that status value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT @p code_object is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API HSA_DEPRECATED hsa_code_object_iterate_symbols( hsa_code_object_t code_object, hsa_status_t (*callback)(hsa_code_object_t code_object, hsa_code_symbol_t symbol, void *data), void *data); /** @} */ #ifdef __cplusplus } // end extern "C" block #endif #endif // header guard ROCR-Runtime-rocm-5.0.0/src/inc/hsa_api_trace.h000066400000000000000000000575301420110115200211300ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_INC_HSA_API_TRACE_H #define HSA_RUNTIME_INC_HSA_API_TRACE_H #include "hsa.h" #ifdef AMD_INTERNAL_BUILD #include "hsa_ext_image.h" #include "hsa_ext_amd.h" #include "hsa_ext_finalize.h" #else #include "inc/hsa_ext_image.h" #include "inc/hsa_ext_amd.h" #include "inc/hsa_ext_finalize.h" #endif #include #include #include // Major Ids of the Api tables exported by Hsa Core Runtime #define HSA_API_TABLE_MAJOR_VERSION 0x01 #define HSA_CORE_API_TABLE_MAJOR_VERSION 0x01 #define HSA_AMD_EXT_API_TABLE_MAJOR_VERSION 0x01 #define HSA_FINALIZER_API_TABLE_MAJOR_VERSION 0x01 #define HSA_IMAGE_API_TABLE_MAJOR_VERSION 0x01 #define HSA_AQLPROFILE_API_TABLE_MAJOR_VERSION 0x01 // Step Ids of the Api tables exported by Hsa Core Runtime #define HSA_API_TABLE_STEP_VERSION 0x00 #define HSA_CORE_API_TABLE_STEP_VERSION 0x00 #define HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x00 #define HSA_FINALIZER_API_TABLE_STEP_VERSION 0x00 #define HSA_IMAGE_API_TABLE_STEP_VERSION 0x00 #define HSA_AQLPROFILE_API_TABLE_STEP_VERSION 0x00 // Min function used to copy Api Tables static inline uint32_t Min(const uint32_t a, const uint32_t b) { return (a > b) ? b : a; } // Declarations of APIs intended for use only by tools. typedef void (*hsa_amd_queue_intercept_packet_writer)(const void* pkts, uint64_t pkt_count); typedef void (*hsa_amd_queue_intercept_handler)(const void* pkts, uint64_t pkt_count, uint64_t user_pkt_index, void* data, hsa_amd_queue_intercept_packet_writer writer); hsa_status_t hsa_amd_queue_intercept_register(hsa_queue_t* queue, hsa_amd_queue_intercept_handler callback, void* user_data); hsa_status_t hsa_amd_queue_intercept_create( hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type, void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data, uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue); typedef void (*hsa_amd_runtime_queue_notifier)(const hsa_queue_t* queue, hsa_agent_t agent, void* data); hsa_status_t hsa_amd_runtime_queue_create_register(hsa_amd_runtime_queue_notifier callback, void* user_data); // Structure of Version used to identify an instance of Api table // Must be the first member (offsetof == 0) of all API tables. // This is the root of the table passing ABI. struct ApiTableVersion { uint32_t major_id; uint32_t minor_id; uint32_t step_id; uint32_t reserved; }; // Table to export HSA Finalizer Extension Apis struct FinalizerExtTable { ApiTableVersion version; decltype(hsa_ext_program_create)* hsa_ext_program_create_fn; decltype(hsa_ext_program_destroy)* hsa_ext_program_destroy_fn; decltype(hsa_ext_program_add_module)* hsa_ext_program_add_module_fn; decltype(hsa_ext_program_iterate_modules)* hsa_ext_program_iterate_modules_fn; decltype(hsa_ext_program_get_info)* hsa_ext_program_get_info_fn; decltype(hsa_ext_program_finalize)* hsa_ext_program_finalize_fn; }; // Table to export HSA Image Extension Apis struct ImageExtTable { ApiTableVersion version; decltype(hsa_ext_image_get_capability)* hsa_ext_image_get_capability_fn; decltype(hsa_ext_image_data_get_info)* hsa_ext_image_data_get_info_fn; decltype(hsa_ext_image_create)* hsa_ext_image_create_fn; decltype(hsa_ext_image_import)* hsa_ext_image_import_fn; decltype(hsa_ext_image_export)* hsa_ext_image_export_fn; decltype(hsa_ext_image_copy)* hsa_ext_image_copy_fn; decltype(hsa_ext_image_clear)* hsa_ext_image_clear_fn; decltype(hsa_ext_image_destroy)* hsa_ext_image_destroy_fn; decltype(hsa_ext_sampler_create)* hsa_ext_sampler_create_fn; decltype(hsa_ext_sampler_destroy)* hsa_ext_sampler_destroy_fn; decltype(hsa_ext_image_get_capability_with_layout)* hsa_ext_image_get_capability_with_layout_fn; decltype(hsa_ext_image_data_get_info_with_layout)* hsa_ext_image_data_get_info_with_layout_fn; decltype(hsa_ext_image_create_with_layout)* hsa_ext_image_create_with_layout_fn; }; // Table to export AMD Extension Apis struct AmdExtTable { ApiTableVersion version; decltype(hsa_amd_coherency_get_type)* hsa_amd_coherency_get_type_fn; decltype(hsa_amd_coherency_set_type)* hsa_amd_coherency_set_type_fn; decltype(hsa_amd_profiling_set_profiler_enabled)* hsa_amd_profiling_set_profiler_enabled_fn; decltype(hsa_amd_profiling_async_copy_enable) *hsa_amd_profiling_async_copy_enable_fn; decltype(hsa_amd_profiling_get_dispatch_time)* hsa_amd_profiling_get_dispatch_time_fn; decltype(hsa_amd_profiling_get_async_copy_time) *hsa_amd_profiling_get_async_copy_time_fn; decltype(hsa_amd_profiling_convert_tick_to_system_domain)* hsa_amd_profiling_convert_tick_to_system_domain_fn; decltype(hsa_amd_signal_async_handler)* hsa_amd_signal_async_handler_fn; decltype(hsa_amd_async_function)* hsa_amd_async_function_fn; decltype(hsa_amd_signal_wait_any)* hsa_amd_signal_wait_any_fn; decltype(hsa_amd_queue_cu_set_mask)* hsa_amd_queue_cu_set_mask_fn; decltype(hsa_amd_memory_pool_get_info)* hsa_amd_memory_pool_get_info_fn; decltype(hsa_amd_agent_iterate_memory_pools)* hsa_amd_agent_iterate_memory_pools_fn; decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn; decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn; decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn; decltype(hsa_amd_agent_memory_pool_get_info)* hsa_amd_agent_memory_pool_get_info_fn; decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn; decltype(hsa_amd_memory_pool_can_migrate)* hsa_amd_memory_pool_can_migrate_fn; decltype(hsa_amd_memory_migrate)* hsa_amd_memory_migrate_fn; decltype(hsa_amd_memory_lock)* hsa_amd_memory_lock_fn; decltype(hsa_amd_memory_unlock)* hsa_amd_memory_unlock_fn; decltype(hsa_amd_memory_fill)* hsa_amd_memory_fill_fn; decltype(hsa_amd_interop_map_buffer)* hsa_amd_interop_map_buffer_fn; decltype(hsa_amd_interop_unmap_buffer)* hsa_amd_interop_unmap_buffer_fn; decltype(hsa_amd_image_create)* hsa_amd_image_create_fn; decltype(hsa_amd_pointer_info)* hsa_amd_pointer_info_fn; decltype(hsa_amd_pointer_info_set_userdata)* hsa_amd_pointer_info_set_userdata_fn; decltype(hsa_amd_ipc_memory_create)* hsa_amd_ipc_memory_create_fn; decltype(hsa_amd_ipc_memory_attach)* hsa_amd_ipc_memory_attach_fn; decltype(hsa_amd_ipc_memory_detach)* hsa_amd_ipc_memory_detach_fn; decltype(hsa_amd_signal_create)* hsa_amd_signal_create_fn; decltype(hsa_amd_ipc_signal_create)* hsa_amd_ipc_signal_create_fn; decltype(hsa_amd_ipc_signal_attach)* hsa_amd_ipc_signal_attach_fn; decltype(hsa_amd_register_system_event_handler)* hsa_amd_register_system_event_handler_fn; decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn; decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn; decltype(hsa_amd_queue_set_priority)* hsa_amd_queue_set_priority_fn; decltype(hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn; decltype(hsa_amd_runtime_queue_create_register)* hsa_amd_runtime_queue_create_register_fn; decltype(hsa_amd_memory_lock_to_pool)* hsa_amd_memory_lock_to_pool_fn; decltype(hsa_amd_register_deallocation_callback)* hsa_amd_register_deallocation_callback_fn; decltype(hsa_amd_deregister_deallocation_callback)* hsa_amd_deregister_deallocation_callback_fn; decltype(hsa_amd_signal_value_pointer)* hsa_amd_signal_value_pointer_fn; decltype(hsa_amd_svm_attributes_set)* hsa_amd_svm_attributes_set_fn; decltype(hsa_amd_svm_attributes_get)* hsa_amd_svm_attributes_get_fn; decltype(hsa_amd_svm_prefetch_async)* hsa_amd_svm_prefetch_async_fn; decltype(hsa_amd_queue_cu_get_mask)* hsa_amd_queue_cu_get_mask_fn; }; // Table to export HSA Core Runtime Apis struct CoreApiTable { ApiTableVersion version; decltype(hsa_init)* hsa_init_fn; decltype(hsa_shut_down)* hsa_shut_down_fn; decltype(hsa_system_get_info)* hsa_system_get_info_fn; decltype(hsa_system_extension_supported)* hsa_system_extension_supported_fn; decltype(hsa_system_get_extension_table)* hsa_system_get_extension_table_fn; decltype(hsa_iterate_agents)* hsa_iterate_agents_fn; decltype(hsa_agent_get_info)* hsa_agent_get_info_fn; decltype(hsa_queue_create)* hsa_queue_create_fn; decltype(hsa_soft_queue_create)* hsa_soft_queue_create_fn; decltype(hsa_queue_destroy)* hsa_queue_destroy_fn; decltype(hsa_queue_inactivate)* hsa_queue_inactivate_fn; decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn; decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn; decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn; decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn; decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn; decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn; decltype(hsa_queue_cas_write_index_scacq_screl)* hsa_queue_cas_write_index_scacq_screl_fn; decltype(hsa_queue_cas_write_index_scacquire)* hsa_queue_cas_write_index_scacquire_fn; decltype(hsa_queue_cas_write_index_relaxed)* hsa_queue_cas_write_index_relaxed_fn; decltype(hsa_queue_cas_write_index_screlease)* hsa_queue_cas_write_index_screlease_fn; decltype(hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn; decltype(hsa_queue_add_write_index_scacquire)* hsa_queue_add_write_index_scacquire_fn; decltype(hsa_queue_add_write_index_relaxed)* hsa_queue_add_write_index_relaxed_fn; decltype(hsa_queue_add_write_index_screlease)* hsa_queue_add_write_index_screlease_fn; decltype(hsa_queue_store_read_index_relaxed)* hsa_queue_store_read_index_relaxed_fn; decltype(hsa_queue_store_read_index_screlease)* hsa_queue_store_read_index_screlease_fn; decltype(hsa_agent_iterate_regions)* hsa_agent_iterate_regions_fn; decltype(hsa_region_get_info)* hsa_region_get_info_fn; decltype(hsa_agent_get_exception_policies)* hsa_agent_get_exception_policies_fn; decltype(hsa_agent_extension_supported)* hsa_agent_extension_supported_fn; decltype(hsa_memory_register)* hsa_memory_register_fn; decltype(hsa_memory_deregister)* hsa_memory_deregister_fn; decltype(hsa_memory_allocate)* hsa_memory_allocate_fn; decltype(hsa_memory_free)* hsa_memory_free_fn; decltype(hsa_memory_copy)* hsa_memory_copy_fn; decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn; decltype(hsa_signal_create)* hsa_signal_create_fn; decltype(hsa_signal_destroy)* hsa_signal_destroy_fn; decltype(hsa_signal_load_relaxed)* hsa_signal_load_relaxed_fn; decltype(hsa_signal_load_scacquire)* hsa_signal_load_scacquire_fn; decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn; decltype(hsa_signal_store_screlease)* hsa_signal_store_screlease_fn; decltype(hsa_signal_wait_relaxed)* hsa_signal_wait_relaxed_fn; decltype(hsa_signal_wait_scacquire)* hsa_signal_wait_scacquire_fn; decltype(hsa_signal_and_relaxed)* hsa_signal_and_relaxed_fn; decltype(hsa_signal_and_scacquire)* hsa_signal_and_scacquire_fn; decltype(hsa_signal_and_screlease)* hsa_signal_and_screlease_fn; decltype(hsa_signal_and_scacq_screl)* hsa_signal_and_scacq_screl_fn; decltype(hsa_signal_or_relaxed)* hsa_signal_or_relaxed_fn; decltype(hsa_signal_or_scacquire)* hsa_signal_or_scacquire_fn; decltype(hsa_signal_or_screlease)* hsa_signal_or_screlease_fn; decltype(hsa_signal_or_scacq_screl)* hsa_signal_or_scacq_screl_fn; decltype(hsa_signal_xor_relaxed)* hsa_signal_xor_relaxed_fn; decltype(hsa_signal_xor_scacquire)* hsa_signal_xor_scacquire_fn; decltype(hsa_signal_xor_screlease)* hsa_signal_xor_screlease_fn; decltype(hsa_signal_xor_scacq_screl)* hsa_signal_xor_scacq_screl_fn; decltype(hsa_signal_exchange_relaxed)* hsa_signal_exchange_relaxed_fn; decltype(hsa_signal_exchange_scacquire)* hsa_signal_exchange_scacquire_fn; decltype(hsa_signal_exchange_screlease)* hsa_signal_exchange_screlease_fn; decltype(hsa_signal_exchange_scacq_screl)* hsa_signal_exchange_scacq_screl_fn; decltype(hsa_signal_add_relaxed)* hsa_signal_add_relaxed_fn; decltype(hsa_signal_add_scacquire)* hsa_signal_add_scacquire_fn; decltype(hsa_signal_add_screlease)* hsa_signal_add_screlease_fn; decltype(hsa_signal_add_scacq_screl)* hsa_signal_add_scacq_screl_fn; decltype(hsa_signal_subtract_relaxed)* hsa_signal_subtract_relaxed_fn; decltype(hsa_signal_subtract_scacquire)* hsa_signal_subtract_scacquire_fn; decltype(hsa_signal_subtract_screlease)* hsa_signal_subtract_screlease_fn; decltype(hsa_signal_subtract_scacq_screl)* hsa_signal_subtract_scacq_screl_fn; decltype(hsa_signal_cas_relaxed)* hsa_signal_cas_relaxed_fn; decltype(hsa_signal_cas_scacquire)* hsa_signal_cas_scacquire_fn; decltype(hsa_signal_cas_screlease)* hsa_signal_cas_screlease_fn; decltype(hsa_signal_cas_scacq_screl)* hsa_signal_cas_scacq_screl_fn; //===--- Instruction Set Architecture -----------------------------------===// decltype(hsa_isa_from_name)* hsa_isa_from_name_fn; // Deprecated since v1.1. decltype(hsa_isa_get_info)* hsa_isa_get_info_fn; // Deprecated since v1.1. decltype(hsa_isa_compatible)* hsa_isa_compatible_fn; //===--- Code Objects (deprecated) --------------------------------------===// // Deprecated since v1.1. decltype(hsa_code_object_serialize)* hsa_code_object_serialize_fn; // Deprecated since v1.1. decltype(hsa_code_object_deserialize)* hsa_code_object_deserialize_fn; // Deprecated since v1.1. decltype(hsa_code_object_destroy)* hsa_code_object_destroy_fn; // Deprecated since v1.1. decltype(hsa_code_object_get_info)* hsa_code_object_get_info_fn; // Deprecated since v1.1. decltype(hsa_code_object_get_symbol)* hsa_code_object_get_symbol_fn; // Deprecated since v1.1. decltype(hsa_code_symbol_get_info)* hsa_code_symbol_get_info_fn; // Deprecated since v1.1. decltype(hsa_code_object_iterate_symbols)* hsa_code_object_iterate_symbols_fn; //===--- Executable -----------------------------------------------------===// // Deprecated since v1.1. decltype(hsa_executable_create)* hsa_executable_create_fn; decltype(hsa_executable_destroy)* hsa_executable_destroy_fn; // Deprecated since v1.1. decltype(hsa_executable_load_code_object)* hsa_executable_load_code_object_fn; decltype(hsa_executable_freeze)* hsa_executable_freeze_fn; decltype(hsa_executable_get_info)* hsa_executable_get_info_fn; decltype(hsa_executable_global_variable_define)* hsa_executable_global_variable_define_fn; decltype(hsa_executable_agent_global_variable_define)* hsa_executable_agent_global_variable_define_fn; decltype(hsa_executable_readonly_variable_define)* hsa_executable_readonly_variable_define_fn; decltype(hsa_executable_validate)* hsa_executable_validate_fn; // Deprecated since v1.1. decltype(hsa_executable_get_symbol)* hsa_executable_get_symbol_fn; decltype(hsa_executable_symbol_get_info)* hsa_executable_symbol_get_info_fn; // Deprecated since v1.1. decltype(hsa_executable_iterate_symbols)* hsa_executable_iterate_symbols_fn; //===--- Runtime Notifications ------------------------------------------===// decltype(hsa_status_string)* hsa_status_string_fn; // Start HSA v1.1 additions decltype(hsa_extension_get_name)* hsa_extension_get_name_fn; decltype(hsa_system_major_extension_supported)* hsa_system_major_extension_supported_fn; decltype(hsa_system_get_major_extension_table)* hsa_system_get_major_extension_table_fn; decltype(hsa_agent_major_extension_supported)* hsa_agent_major_extension_supported_fn; decltype(hsa_cache_get_info)* hsa_cache_get_info_fn; decltype(hsa_agent_iterate_caches)* hsa_agent_iterate_caches_fn; decltype(hsa_signal_silent_store_relaxed)* hsa_signal_silent_store_relaxed_fn; decltype(hsa_signal_silent_store_screlease)* hsa_signal_silent_store_screlease_fn; decltype(hsa_signal_group_create)* hsa_signal_group_create_fn; decltype(hsa_signal_group_destroy)* hsa_signal_group_destroy_fn; decltype(hsa_signal_group_wait_any_scacquire)* hsa_signal_group_wait_any_scacquire_fn; decltype(hsa_signal_group_wait_any_relaxed)* hsa_signal_group_wait_any_relaxed_fn; //===--- Instruction Set Architecture - HSA v1.1 additions --------------===// decltype(hsa_agent_iterate_isas)* hsa_agent_iterate_isas_fn; decltype(hsa_isa_get_info_alt)* hsa_isa_get_info_alt_fn; decltype(hsa_isa_get_exception_policies)* hsa_isa_get_exception_policies_fn; decltype(hsa_isa_get_round_method)* hsa_isa_get_round_method_fn; decltype(hsa_wavefront_get_info)* hsa_wavefront_get_info_fn; decltype(hsa_isa_iterate_wavefronts)* hsa_isa_iterate_wavefronts_fn; //===--- Code Objects (deprecated) - HSA v1.1 additions -----------------===// // Deprecated since v1.1. decltype(hsa_code_object_get_symbol_from_name)* hsa_code_object_get_symbol_from_name_fn; //===--- Executable - HSA v1.1 additions --------------------------------===// decltype(hsa_code_object_reader_create_from_file)* hsa_code_object_reader_create_from_file_fn; decltype(hsa_code_object_reader_create_from_memory)* hsa_code_object_reader_create_from_memory_fn; decltype(hsa_code_object_reader_destroy)* hsa_code_object_reader_destroy_fn; decltype(hsa_executable_create_alt)* hsa_executable_create_alt_fn; decltype(hsa_executable_load_program_code_object)* hsa_executable_load_program_code_object_fn; decltype(hsa_executable_load_agent_code_object)* hsa_executable_load_agent_code_object_fn; decltype(hsa_executable_validate_alt)* hsa_executable_validate_alt_fn; decltype(hsa_executable_get_symbol_by_name)* hsa_executable_get_symbol_by_name_fn; decltype(hsa_executable_iterate_agent_symbols)* hsa_executable_iterate_agent_symbols_fn; decltype(hsa_executable_iterate_program_symbols)* hsa_executable_iterate_program_symbols_fn; }; // Table to export HSA Apis from Core Runtime, Amd Extensions // Finalizer and Images struct HsaApiTable { // Version of Hsa Api Table ApiTableVersion version; // Table of function pointers to HSA Core Runtime CoreApiTable* core_; // Table of function pointers to AMD extensions AmdExtTable* amd_ext_; // Table of function pointers to HSA Finalizer Extension FinalizerExtTable* finalizer_ext_; // Table of function pointers to HSA Image Extension ImageExtTable* image_ext_; }; // Structure containing instances of different api tables struct HsaApiTableContainer { HsaApiTable root; CoreApiTable core; AmdExtTable amd_ext; FinalizerExtTable finalizer_ext; ImageExtTable image_ext; // Default initialization of a container instance HsaApiTableContainer() { root.version.major_id = HSA_API_TABLE_MAJOR_VERSION; root.version.minor_id = sizeof(HsaApiTable); root.version.step_id = HSA_API_TABLE_STEP_VERSION; core.version.major_id = HSA_CORE_API_TABLE_MAJOR_VERSION; core.version.minor_id = sizeof(CoreApiTable); core.version.step_id = HSA_CORE_API_TABLE_STEP_VERSION; root.core_ = &core; amd_ext.version.major_id = HSA_AMD_EXT_API_TABLE_MAJOR_VERSION; amd_ext.version.minor_id = sizeof(AmdExtTable); amd_ext.version.step_id = HSA_AMD_EXT_API_TABLE_STEP_VERSION; root.amd_ext_ = &amd_ext; finalizer_ext.version.major_id = HSA_FINALIZER_API_TABLE_MAJOR_VERSION; finalizer_ext.version.minor_id = sizeof(FinalizerExtTable); finalizer_ext.version.step_id = HSA_FINALIZER_API_TABLE_STEP_VERSION; root.finalizer_ext_ = & finalizer_ext; image_ext.version.major_id = HSA_IMAGE_API_TABLE_MAJOR_VERSION; image_ext.version.minor_id = sizeof(ImageExtTable); image_ext.version.step_id = HSA_IMAGE_API_TABLE_STEP_VERSION; root.image_ext_ = &image_ext; } }; // Api to copy function pointers of a table static void inline copyApi(void* src, void* dest, size_t size) { assert(size >= sizeof(ApiTableVersion)); memcpy((char*)src + sizeof(ApiTableVersion), (char*)dest + sizeof(ApiTableVersion), (size - sizeof(ApiTableVersion))); } // Copy Api child tables if valid. static void inline copyElement(ApiTableVersion* dest, ApiTableVersion* src) { if (src->major_id && (dest->major_id == src->major_id)) { dest->step_id = src->step_id; dest->minor_id = Min(dest->minor_id, src->minor_id); copyApi(dest, src, dest->minor_id); } else { dest->major_id = 0; dest->minor_id = 0; dest->step_id = 0; } } // Copy constructor for all Api tables. The function assumes the // user has initialized an instance of tables container correctly // for the Major, Minor and Stepping Ids of Root and Child Api tables. // The function will overwrite the value of Minor Id by taking the // minimum of source and destination parameters. It will also overwrite // the stepping Id with value from source parameter. static void inline copyTables(const HsaApiTable* src, HsaApiTable* dest) { // Verify Major Id of source and destination tables match if (dest->version.major_id != src->version.major_id) { dest->version.major_id = 0; dest->version.minor_id = 0; dest->version.step_id = 0; return; } // Initialize the stepping id and minor id of root table. For the // minor id which encodes struct size, take the minimum of source // and destination parameters dest->version.step_id = src->version.step_id; dest->version.minor_id = Min(dest->version.minor_id, src->version.minor_id); // Copy child tables if present if ((offsetof(HsaApiTable, core_) < dest->version.minor_id)) copyElement(&dest->core_->version, &src->core_->version); if ((offsetof(HsaApiTable, amd_ext_) < dest->version.minor_id)) copyElement(&dest->amd_ext_->version, &src->amd_ext_->version); if ((offsetof(HsaApiTable, finalizer_ext_) < dest->version.minor_id)) copyElement(&dest->finalizer_ext_->version, &src->finalizer_ext_->version); if ((offsetof(HsaApiTable, image_ext_) < dest->version.minor_id)) copyElement(&dest->image_ext_->version, &src->image_ext_->version); } #endif ROCR-Runtime-rocm-5.0.0/src/inc/hsa_ext_amd.h000066400000000000000000002534231420110115200206210ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA AMD extension. #ifndef HSA_RUNTIME_EXT_AMD_H_ #define HSA_RUNTIME_EXT_AMD_H_ #include "hsa.h" #include "hsa_ext_image.h" #define HSA_AMD_INTERFACE_VERSION_MAJOR 1 #define HSA_AMD_INTERFACE_VERSION_MINOR 0 #ifdef __cplusplus extern "C" { #endif /** \addtogroup aql Architected Queuing Language * @{ */ /** * @brief A fixed-size type used to represent ::hsa_signal_condition_t constants. */ typedef uint32_t hsa_signal_condition32_t; /** * @brief AMD vendor specific packet type. */ typedef enum { /** * Packet used by agents to delay processing of subsequent packets until a * configurable condition is satisfied by an HSA signal. Only kernel dispatch * queues created from AMD GPU Agents support this packet. */ HSA_AMD_PACKET_TYPE_BARRIER_VALUE = 2, } hsa_amd_packet_type_t; /** * @brief A fixed-size type used to represent ::hsa_amd_packet_type_t constants. */ typedef uint8_t hsa_amd_packet_type8_t; /** * @brief AMD vendor specific AQL packet header */ typedef struct hsa_amd_packet_header_s { /** * Packet header. Used to configure multiple packet parameters such as the * packet type. The parameters are described by ::hsa_packet_header_t. */ uint16_t header; /** *Format of the vendor specific packet. */ hsa_amd_packet_type8_t AmdFormat; /** * Reserved. Must be 0. */ uint8_t reserved; } hsa_amd_vendor_packet_header_t; /** * @brief AMD barrier value packet. Halts packet processing and waits for * (signal_value & ::mask) ::cond ::value to be satisfied, where signal_value * is the value of the signal ::signal. */ typedef struct hsa_amd_barrier_value_packet_s { /** * AMD vendor specific packet header. */ hsa_amd_vendor_packet_header_t header; /** * Reserved. Must be 0. */ uint32_t reserved0; /** * Dependent signal object. A signal with a handle value of 0 is * allowed and is interpreted by the packet processor a satisfied * dependency. */ hsa_signal_t signal; /** * Value to compare against. */ hsa_signal_value_t value; /** * Bit mask to be combined by bitwise AND with ::signal's value. */ hsa_signal_value_t mask; /** * Comparison operation. See ::hsa_signal_condition_t. */ hsa_signal_condition32_t cond; /** * Reserved. Must be 0. */ uint32_t reserved1; /** * Reserved. Must be 0. */ uint64_t reserved2; /** * Reserved. Must be 0. */ uint64_t reserved3; /** * Signal used to indicate completion of the job. The application can use the * special signal handle 0 to indicate that no signal is used. */ hsa_signal_t completion_signal; } hsa_amd_barrier_value_packet_t; /** @} */ /** * @brief Enumeration constants added to ::hsa_status_t. * * @remark Additions to hsa_status_t */ enum { /** * The memory pool is invalid. */ HSA_STATUS_ERROR_INVALID_MEMORY_POOL = 40, /** * Agent accessed memory beyond the maximum legal address. */ HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION = 41, /** * Agent executed an invalid shader instruction. */ HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION = 42, /** * Agent attempted to access an inaccessible address. * See hsa_amd_register_system_event_handler and * HSA_AMD_GPU_MEMORY_FAULT_EVENT for more information on illegal accesses. */ HSA_STATUS_ERROR_MEMORY_FAULT = 43, /** * The CU mask was successfully set but the mask attempted to enable a CU * which was disabled for the process. CUs disabled for the process remain * disabled. */ HSA_STATUS_CU_MASK_REDUCED = 44, }; /** * @brief Agent attributes. */ typedef enum hsa_amd_agent_info_s { /** * Chip identifier. The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_CHIP_ID = 0xA000, /** * Size of a cacheline in bytes. The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_CACHELINE_SIZE = 0xA001, /** * The number of compute unit available in the agent. The type of this * attribute is uint32_t. */ HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT = 0xA002, /** * The maximum clock frequency of the agent in MHz. The type of this * attribute is uint32_t. */ HSA_AMD_AGENT_INFO_MAX_CLOCK_FREQUENCY = 0xA003, /** * Internal driver node identifier. The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_DRIVER_NODE_ID = 0xA004, /** * Max number of watch points on memory address ranges to generate exception * events when the watched addresses are accessed. The type of this * attribute is uint32_t. */ HSA_AMD_AGENT_INFO_MAX_ADDRESS_WATCH_POINTS = 0xA005, /** * Agent BDF_ID, named LocationID in thunk. The type of this attribute is * uint32_t. */ HSA_AMD_AGENT_INFO_BDFID = 0xA006, /** * Memory Interface width, the return value type is uint32_t. * This attribute is deprecated. */ HSA_AMD_AGENT_INFO_MEMORY_WIDTH = 0xA007, /** * Max Memory Clock, the return value type is uint32_t. */ HSA_AMD_AGENT_INFO_MEMORY_MAX_FREQUENCY = 0xA008, /** * Board name of Agent - populated from MarketingName of Kfd Node * The value is an Ascii string of 64 chars. */ HSA_AMD_AGENT_INFO_PRODUCT_NAME = 0xA009, /** * Maximum number of waves possible in a Compute Unit. * The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU = 0xA00A, /** * Number of SIMD's per compute unit CU * The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_NUM_SIMDS_PER_CU = 0xA00B, /** * Number of Shader Engines (SE) in Gpu * The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_NUM_SHADER_ENGINES = 0xA00C, /** * Number of Shader Arrays Per Shader Engines in Gpu * The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE = 0xA00D, /** * Address of the HDP flush registers. Use of these registers does not conform to the HSA memory * model and should be treated with caution. * The type of this attribute is hsa_amd_hdp_flush_t. */ HSA_AMD_AGENT_INFO_HDP_FLUSH = 0xA00E, /** * PCIe domain for the agent. Pairs with HSA_AMD_AGENT_INFO_BDFID * to give the full physical location of the Agent. * The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_DOMAIN = 0xA00F, /** * Queries for support of cooperative queues. See ::HSA_QUEUE_TYPE_COOPERATIVE. * The type of this attribute is bool. */ HSA_AMD_AGENT_INFO_COOPERATIVE_QUEUES = 0xA010, /** * Queries UUID of an agent. The value is an Ascii string with a maximum * of 21 chars including NUL. The string value consists of two parts: header * and body. The header identifies device type (GPU, CPU, DSP) while body * encodes UUID as a 16 digit hex string * * Agents that do not support UUID will return the string "GPU-XX" or * "CPU-XX" or "DSP-XX" depending upon their device type ::hsa_device_type_t */ HSA_AMD_AGENT_INFO_UUID = 0xA011, /** * Queries for the ASIC revision of an agent. The value is an integer that * increments for each revision. This can be used by user-level software to * change how it operates, depending on the hardware version. This allows * selective workarounds for hardware errata. * The type of this attribute is uint32_t. */ HSA_AMD_AGENT_INFO_ASIC_REVISION = 0xA012, /** * Queries whether or not the host can directly access SVM memory that is * physically resident in the agent's local memory. * The type of this attribute is bool. */ HSA_AMD_AGENT_INFO_SVM_DIRECT_HOST_ACCESS = 0xA013, /** * Some processors support more CUs than can reliably be used in a cooperative * dispatch. This queries the count of CUs which are fully enabled for * cooperative dispatch. */ HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT = 0xA014 } hsa_amd_agent_info_t; typedef struct hsa_amd_hdp_flush_s { uint32_t* HDP_MEM_FLUSH_CNTL; uint32_t* HDP_REG_FLUSH_CNTL; } hsa_amd_hdp_flush_t; /** * @brief Region attributes. */ typedef enum hsa_amd_region_info_s { /** * Determine if host can access the region. The type of this attribute * is bool. */ HSA_AMD_REGION_INFO_HOST_ACCESSIBLE = 0xA000, /** * Base address of the region in flat address space. */ HSA_AMD_REGION_INFO_BASE = 0xA001, /** * Memory Interface width, the return value type is uint32_t. * This attribute is deprecated. Use HSA_AMD_AGENT_INFO_MEMORY_WIDTH. */ HSA_AMD_REGION_INFO_BUS_WIDTH = 0xA002, /** * Max Memory Clock, the return value type is uint32_t. * This attribute is deprecated. Use HSA_AMD_AGENT_INFO_MEMORY_MAX_FREQUENCY. */ HSA_AMD_REGION_INFO_MAX_CLOCK_FREQUENCY = 0xA003 } hsa_amd_region_info_t; /** * @brief Coherency attributes of fine grain region. */ typedef enum hsa_amd_coherency_type_s { /** * Coherent region. */ HSA_AMD_COHERENCY_TYPE_COHERENT = 0, /** * Non coherent region. */ HSA_AMD_COHERENCY_TYPE_NONCOHERENT = 1 } hsa_amd_coherency_type_t; /** * @brief Get the coherency type of the fine grain region of an agent. * * @param[in] agent A valid agent. * * @param[out] type Pointer to a memory location where the HSA runtime will * store the coherency type of the fine grain region. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p type is NULL. */ hsa_status_t HSA_API hsa_amd_coherency_get_type(hsa_agent_t agent, hsa_amd_coherency_type_t* type); /** * @brief Set the coherency type of the fine grain region of an agent. * Deprecated. This is supported on KV platforms. For backward compatibility * other platforms will spuriously succeed. * * @param[in] agent A valid agent. * * @param[in] type The coherency type to be set. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p type is invalid. */ hsa_status_t HSA_API hsa_amd_coherency_set_type(hsa_agent_t agent, hsa_amd_coherency_type_t type); /** * @brief Structure containing profiling dispatch time information. * * Times are reported as ticks in the domain of the HSA system clock. * The HSA system clock tick and frequency is obtained via hsa_system_get_info. */ typedef struct hsa_amd_profiling_dispatch_time_s { /** * Dispatch packet processing start time. */ uint64_t start; /** * Dispatch packet completion time. */ uint64_t end; } hsa_amd_profiling_dispatch_time_t; /** * @brief Structure containing profiling async copy time information. * * Times are reported as ticks in the domain of the HSA system clock. * The HSA system clock tick and frequency is obtained via hsa_system_get_info. */ typedef struct hsa_amd_profiling_async_copy_time_s { /** * Async copy processing start time. */ uint64_t start; /** * Async copy completion time. */ uint64_t end; } hsa_amd_profiling_async_copy_time_t; /** * @brief Enable or disable profiling capability of a queue. * * @param[in] queue A valid queue. * * @param[in] enable 1 to enable profiling. 0 to disable profiling. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_QUEUE The queue is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p queue is NULL. */ hsa_status_t HSA_API hsa_amd_profiling_set_profiler_enabled(hsa_queue_t* queue, int enable); /** * @brief Enable or disable asynchronous memory copy profiling. * * @details The runtime will provide the copy processing start timestamp and * completion timestamp of each call to hsa_amd_memory_async_copy if the * async copy profiling is enabled prior to the call to * hsa_amd_memory_async_copy. The completion signal object is used to * hold the last async copy start and end timestamp. The client can retrieve * these timestamps via call to hsa_amd_profiling_get_async_copy_time. * * @param[in] enable True to enable profiling. False to disable profiling. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES Failed on allocating resources * needed to profile the asynchronous copy. */ hsa_status_t HSA_API hsa_amd_profiling_async_copy_enable(bool enable); /** * @brief Retrieve packet processing time stamps. * * @param[in] agent The agent with which the signal was last used. For * instance, if the profiled dispatch packet is dispatched onto queue Q, * which was created on agent A, then this parameter must be A. * * @param[in] signal A signal used as the completion signal of the dispatch * packet to retrieve time stamps from. This dispatch packet must have been * issued to a queue with profiling enabled and have already completed. Also * the signal must not have yet been used in any other packet following the * completion of the profiled dispatch packet. * * @param[out] time Packet processing timestamps in the HSA system clock * domain. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_SIGNAL The signal is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p time is NULL. */ hsa_status_t HSA_API hsa_amd_profiling_get_dispatch_time( hsa_agent_t agent, hsa_signal_t signal, hsa_amd_profiling_dispatch_time_t* time); /** * @brief Retrieve asynchronous copy timestamps. * * @details Async copy profiling is enabled via call to * hsa_amd_profiling_async_copy_enable. * * @param[in] signal A signal used as the completion signal of the call to * hsa_amd_memory_async_copy. * * @param[out] time Async copy processing timestamps in the HSA system clock * domain. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_SIGNAL The signal is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p time is NULL. */ hsa_status_t HSA_API hsa_amd_profiling_get_async_copy_time( hsa_signal_t signal, hsa_amd_profiling_async_copy_time_t* time); /** * @brief Computes the frequency ratio and offset between the agent clock and * HSA system clock and converts the agent's tick to HSA system domain tick. * * @param[in] agent The agent used to retrieve the agent_tick. It is user's * responsibility to make sure the tick number is from this agent, otherwise, * the behavior is undefined. * * @param[in] agent_tick The tick count retrieved from the specified @p agent. * * @param[out] system_tick The translated HSA system domain clock counter tick. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p system_tick is NULL; */ hsa_status_t HSA_API hsa_amd_profiling_convert_tick_to_system_domain(hsa_agent_t agent, uint64_t agent_tick, uint64_t* system_tick); /** * @brief Signal attribute flags. */ typedef enum { /** * Signal will only be consumed by AMD GPUs. Limits signal consumption to * AMD GPU agents only. Ignored if @p num_consumers is not zero (all agents). */ HSA_AMD_SIGNAL_AMD_GPU_ONLY = 1, /** * Signal may be used for interprocess communication. * IPC signals can be read, written, and waited on from any process. * Profiling using an IPC enabled signal is only supported in a single process * at a time. Producing profiling data in one process and consuming it in * another process is undefined. */ HSA_AMD_SIGNAL_IPC = 2, } hsa_amd_signal_attribute_t; /** * @brief Create a signal with specific attributes. * * @param[in] initial_value Initial value of the signal. * * @param[in] num_consumers Size of @p consumers. A value of 0 indicates that * any agent might wait on the signal. * * @param[in] consumers List of agents that might consume (wait on) the * signal. If @p num_consumers is 0, this argument is ignored; otherwise, the * HSA runtime might use the list to optimize the handling of the signal * object. If an agent not listed in @p consumers waits on the returned * signal, the behavior is undefined. The memory associated with @p consumers * can be reused or freed after the function returns. * * @param[in] attributes Requested signal attributes. Multiple signal attributes * may be requested by combining them with bitwise OR. Requesting no attributes * (@p attributes == 0) results in the same signal as would have been obtained * via hsa_signal_create. * * @param[out] signal Pointer to a memory location where the HSA runtime will * store the newly created signal handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p signal is NULL, @p * num_consumers is greater than 0 but @p consumers is NULL, or @p consumers * contains duplicates. */ hsa_status_t HSA_API hsa_amd_signal_create(hsa_signal_value_t initial_value, uint32_t num_consumers, const hsa_agent_t* consumers, uint64_t attributes, hsa_signal_t* signal); /** * @brief Returns a pointer to the value of a signal. * * Use of this API does not modify the lifetime of ::signal and any * hsa_signal_value_t retrieved by this API has lifetime equal to that of * ::signal. * * This API is intended for partial interoperability with non-HSA compatible * devices and should not be used where HSA interfaces are available. * * Use of the signal value must comply with use restritions of ::signal. * Use may result in data races if the operations performed are not platform * atomic. Use with HSA_AMD_SIGNAL_AMD_GPU_ONLY or HSA_AMD_SIGNAL_IPC * attributed signals is required. * * @param[in] Signal handle to extract the signal value pointer from. * * @param[out] Location where the extracted signal value pointer will be placed. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_SIGNAL signal is not a valid hsa_signal_t * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT value_ptr is NULL. */ hsa_status_t hsa_amd_signal_value_pointer(hsa_signal_t signal, volatile hsa_signal_value_t** value_ptr); /** * @brief Asyncronous signal handler function type. * * @details Type definition of callback function to be used with * hsa_amd_signal_async_handler. This callback is invoked if the associated * signal and condition are met. The callback receives the value of the signal * which satisfied the associated wait condition and a user provided value. If * the callback returns true then the callback will be called again if the * associated signal and condition are satisfied again. If the callback returns * false then it will not be called again. * * @param[in] value Contains the value of the signal observed by * hsa_amd_signal_async_handler which caused the signal handler to be invoked. * * @param[in] arg Contains the user provided value given when the signal handler * was registered with hsa_amd_signal_async_handler * * @retval true resumes monitoring the signal with this handler (as if calling * hsa_amd_signal_async_handler again with identical parameters) * * @retval false stops monitoring the signal with this handler (handler will * not be called again for this signal) * */ typedef bool (*hsa_amd_signal_handler)(hsa_signal_value_t value, void* arg); /** * @brief Register asynchronous signal handler function. * * @details Allows registering a callback function and user provided value with * a signal and wait condition. The callback will be invoked if the associated * signal and wait condition are satisfied. Callbacks will be invoked serially * but in an arbitrary order so callbacks should be independent of each other. * After being invoked a callback may continue to wait for its associated signal * and condition and, possibly, be invoked again. Or the callback may stop * waiting. If the callback returns true then it will continue waiting and may * be called again. If false then the callback will not wait again and will not * be called again for the associated signal and condition. It is possible to * register the same callback multiple times with the same or different signals * and/or conditions. Each registration of the callback will be treated entirely * independently. * * @param[in] signal hsa signal to be asynchronously monitored * * @param[in] cond condition value to monitor for * * @param[in] value signal value used in condition expression * * @param[in] handler asynchronous signal handler invoked when signal's * condition is met * * @param[in] arg user provided value which is provided to handler when handler * is invoked * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_SIGNAL signal is not a valid hsa_signal_t * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT handler is invalid (NULL) * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime is out of * resources or blocking signals are not supported by the HSA driver component. * */ hsa_status_t HSA_API hsa_amd_signal_async_handler(hsa_signal_t signal, hsa_signal_condition_t cond, hsa_signal_value_t value, hsa_amd_signal_handler handler, void* arg); /** * @brief Call a function asynchronously * * @details Provides access to the runtime's asynchronous event handling thread * for general asynchronous functions. Functions queued this way are executed * in the same manner as if they were a signal handler who's signal is * satisfied. * * @param[in] callback asynchronous function to be invoked * * @param[in] arg user provided value which is provided to handler when handler * is invoked * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT handler is invalid (NULL) * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime is out of * resources or blocking signals are not supported by the HSA driver component. * */ hsa_status_t HSA_API hsa_amd_async_function(void (*callback)(void* arg), void* arg); /** * @brief Wait for any signal-condition pair to be satisfied. * * @details Allows waiting for any of several signal and conditions pairs to be * satisfied. The function returns the index into the list of signals of the * first satisfying signal-condition pair. The value of the satisfying signal's * value is returned in satisfying_value unless satisfying_value is NULL. This * function provides only relaxed memory semantics. */ uint32_t HSA_API hsa_amd_signal_wait_any(uint32_t signal_count, hsa_signal_t* signals, hsa_signal_condition_t* conds, hsa_signal_value_t* values, uint64_t timeout_hint, hsa_wait_state_t wait_hint, hsa_signal_value_t* satisfying_value); /** * @brief Query image limits. * * @param[in] agent A valid agent. * * @param[in] attribute HSA image info attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_QUEUE @p value is NULL or @p attribute < * HSA_EXT_AGENT_INFO_IMAGE_1D_MAX_ELEMENTS or @p attribute > * HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS. * */ hsa_status_t HSA_API hsa_amd_image_get_info_max_dim(hsa_agent_t agent, hsa_agent_info_t attribute, void* value); /** * @brief Set a queue's CU affinity mask. * * @details Enables the queue to run on only selected CUs. The given mask is * combined by bitwise AND with any device wide mask in HSA_CU_MASK before * being applied. * If num_cu_mask_count is 0 then the request is interpreted as a request to * enable all CUs and no cu_mask array need be given. * * @param[in] queue A pointer to HSA queue. * * @param[in] num_cu_mask_count Size of CUMask bit array passed in, in bits. * * @param[in] cu_mask Bit-vector representing the CU mask. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_CU_MASK_REDUCED The function was successfully executed * but the given mask attempted to enable a CU which was disabled by * HSA_CU_MASK. CUs disabled by HSA_CU_MASK remain disabled. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_QUEUE @p queue is NULL or invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p num_cu_mask_count is not * a multiple of 32 or @p num_cu_mask_count is not 0 and cu_mask is NULL. * */ hsa_status_t HSA_API hsa_amd_queue_cu_set_mask(const hsa_queue_t* queue, uint32_t num_cu_mask_count, const uint32_t* cu_mask); /** * @brief Retrieve a queue's CU affinity mask. * * @details Returns the first num_cu_mask_count bits of a queue's CU mask. * Ensure that num_cu_mask_count is at least as large as * HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT to retrieve the entire mask. * * @param[in] queue A pointer to HSA queue. * * @param[in] num_cu_mask_count Size of CUMask bit array passed in, in bits. * * @param[out] cu_mask Bit-vector representing the CU mask. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_QUEUE @p queue is NULL or invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p num_cu_mask_count is 0, not * a multiple of 32 or @p cu_mask is NULL. * */ hsa_status_t HSA_API hsa_amd_queue_cu_get_mask(const hsa_queue_t* queue, uint32_t num_cu_mask_count, uint32_t* cu_mask); /** * @brief Memory segments associated with a memory pool. */ typedef enum { /** * Global segment. Used to hold data that is shared by all agents. */ HSA_AMD_SEGMENT_GLOBAL = 0, /** * Read-only segment. Used to hold data that remains constant during the * execution of a kernel. */ HSA_AMD_SEGMENT_READONLY = 1, /** * Private segment. Used to hold data that is local to a single work-item. */ HSA_AMD_SEGMENT_PRIVATE = 2, /** * Group segment. Used to hold data that is shared by the work-items of a * work-group. */ HSA_AMD_SEGMENT_GROUP = 3, } hsa_amd_segment_t; /** * @brief A memory pool encapsulates physical storage on an agent * along with a memory access model. * * @details A memory pool encapsulates a physical partition of an agent's * memory system along with a memory access model. Division of a single * memory system into separate pools allows querying each partition's access * path properties (see ::hsa_amd_agent_memory_pool_get_info). Allocations * from a pool are preferentially bound to that pool's physical partition. * Binding to the pool's preferential physical partition may not be * possible or persistent depending on the system's memory policy * and/or state which is beyond the scope of HSA APIs. * * For example, a multi-node NUMA memory system may be represented by multiple * pool's with each pool providing size and access path information for the * partition it represents. Allocations from a pool are preferentially bound * to the pool's partition (which in this example is a NUMA node) while * following its memory access model. The actual placement may vary or migrate * due to the system's NUMA policy and state, which is beyond the scope of * HSA APIs. */ typedef struct hsa_amd_memory_pool_s { /** * Opaque handle. */ uint64_t handle; } hsa_amd_memory_pool_t; typedef enum hsa_amd_memory_pool_global_flag_s { /** * The application can use allocations in the memory pool to store kernel * arguments, and provide the values for the kernarg segment of * a kernel dispatch. */ HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_KERNARG_INIT = 1, /** * Updates to memory in this pool conform to HSA memory consistency model. * If this flag is set, then ::HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_COARSE_GRAINED * must not be set. */ HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_FINE_GRAINED = 2, /** * Writes to memory in this pool can be performed by a single agent at a time. */ HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_COARSE_GRAINED = 4 } hsa_amd_memory_pool_global_flag_t; /** * @brief Memory pool features. */ typedef enum { /** * Segment where the memory pool resides. The type of this attribute is * ::hsa_amd_segment_t. */ HSA_AMD_MEMORY_POOL_INFO_SEGMENT = 0, /** * Flag mask. The value of this attribute is undefined if the value of * ::HSA_AMD_MEMORY_POOL_INFO_SEGMENT is not ::HSA_AMD_SEGMENT_GLOBAL. The type * of * this attribute is uint32_t, a bit-field of * ::hsa_amd_memory_pool_global_flag_t * values. */ HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS = 1, /** * Size of this pool, in bytes. The type of this attribute is size_t. */ HSA_AMD_MEMORY_POOL_INFO_SIZE = 2, /** * Indicates whether memory in this pool can be allocated using * ::hsa_amd_memory_pool_allocate. The type of this attribute is bool. * * The value of this flag is always false for memory pools in the group and * private segments. */ HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED = 5, /** * Allocation granularity of buffers allocated by * ::hsa_amd_memory_pool_allocate * in this memory pool. The size of a buffer allocated in this pool is a * multiple of the value of this attribute. The value of this attribute is * only defined if ::HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED is true for * this pool. The type of this attribute is size_t. */ HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_GRANULE = 6, /** * Alignment of buffers allocated by ::hsa_amd_memory_pool_allocate in this * pool. The value of this attribute is only defined if * ::HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED is true for this pool, and * must be a power of 2. The type of this attribute is size_t. */ HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALIGNMENT = 7, /** * This memory_pool can be made directly accessible by all the agents in the * system (::hsa_amd_agent_memory_pool_get_info does not return * ::HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED for any agent). The type of this * attribute is bool. */ HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL = 15, /** * Maximum aggregate allocation size in bytes. The type of this attribute * is size_t. */ HSA_AMD_MEMORY_POOL_INFO_ALLOC_MAX_SIZE = 16, } hsa_amd_memory_pool_info_t; /** * @brief Get the current value of an attribute of a memory pool. * * @param[in] memory_pool A valid memory pool. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to a application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * */ hsa_status_t HSA_API hsa_amd_memory_pool_get_info(hsa_amd_memory_pool_t memory_pool, hsa_amd_memory_pool_info_t attribute, void* value); /** * @brief Iterate over the memory pools associated with a given agent, and * invoke an application-defined callback on every iteration. * * @details An agent can directly access buffers located in some memory pool, or * be enabled to access them by the application (see ::hsa_amd_agents_allow_access), * yet that memory pool may not be returned by this function for that given * agent. * * A memory pool of fine-grained type must be associated only with the host. * * @param[in] agent A valid agent. * * @param[in] callback Callback to be invoked on the same thread that called * ::hsa_amd_agent_iterate_memory_pools, serially, once per memory pool that is * associated with the agent. The HSA runtime passes two arguments to the * callback: the memory pool, and the application data. If @p callback * returns a status other than ::HSA_STATUS_SUCCESS for a particular iteration, * the traversal stops and ::hsa_amd_agent_iterate_memory_pools returns that status * value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_amd_agent_iterate_memory_pools( hsa_agent_t agent, hsa_status_t (*callback)(hsa_amd_memory_pool_t memory_pool, void* data), void* data); /** * @brief Allocate a block of memory (or buffer) in the specified pool. * * @param[in] memory_pool Memory pool where to allocate memory from. The memory * pool must have the ::HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED flag set. * * @param[in] size Allocation size, in bytes. Must not be zero. This value is * rounded up to the nearest multiple of * ::HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_GRANULE in @p memory_pool. * * @param[in] flags A bit-field that is used to specify allocation * directives. Reserved parameter, must be 0. * * @param[out] ptr Pointer to the location where to store the base virtual * address of * the allocated block. The returned base address is aligned to the value of * ::HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALIGNMENT in @p memory_pool. If the * allocation fails, the returned value is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES No memory is available. * * @retval ::HSA_STATUS_ERROR_INVALID_MEMORY_POOL The memory pool is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ALLOCATION The host is not allowed to * allocate memory in @p memory_pool, or @p size is greater than * the value of HSA_AMD_MEMORY_POOL_INFO_ALLOC_MAX_SIZE in @p memory_pool. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p ptr is NULL, or @p size is 0, * or flags is not 0. * */ hsa_status_t HSA_API hsa_amd_memory_pool_allocate(hsa_amd_memory_pool_t memory_pool, size_t size, uint32_t flags, void** ptr); /** * @brief Deallocate a block of memory previously allocated using * ::hsa_amd_memory_pool_allocate. * * @param[in] ptr Pointer to a memory block. If @p ptr does not match a value * previously returned by ::hsa_amd_memory_pool_allocate, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * */ hsa_status_t HSA_API hsa_amd_memory_pool_free(void* ptr); /** * @brief Asynchronously copy a block of memory from the location pointed to by * @p src on the @p src_agent to the memory block pointed to by @p dst on the @p * dst_agent. * Because the DMA engines used may not be in the same coherency domain, the caller must ensure * that buffers are system-level coherent. In general this requires the sending device to have * released the buffer to system scope prior to executing the copy API and the receiving device * must execute a system scope acquire fence prior to use of the destination buffer. * * @param[out] dst Buffer where the content is to be copied. * * @param[in] dst_agent Agent associated with the @p dst. The agent must be able to directly * access both the source and destination buffers in their current locations. * * @param[in] src A valid pointer to the source of data to be copied. The source * buffer must not overlap with the destination buffer, otherwise the copy will succeed * but contents of @p dst is undefined. * * @param[in] src_agent Agent associated with the @p src. The agent must be able to directly * access both the source and destination buffers in their current locations. * * @param[in] size Number of bytes to copy. If @p size is 0, no copy is * performed and the function returns success. Copying a number of bytes larger * than the size of the buffers pointed by @p dst or @p src results in undefined * behavior. * * @param[in] num_dep_signals Number of dependent signals. Can be 0. * * @param[in] dep_signals List of signals that must be waited on before the copy * operation starts. The copy will start after every signal has been observed with * the value 0. The dependent signal should not include completion signal from hsa_amd_memory_async_copy * operation to be issued in future as that can result in a deadlock. If @p num_dep_signals is 0, this * argument is ignored. * * @param[in] completion_signal Signal used to indicate completion of the copy * operation. When the copy operation is finished, the value of the signal is * decremented. The runtime indicates that an error has occurred during the copy * operation by setting the value of the completion signal to a negative * number. The signal handle must not be 0. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. The * application is responsible for checking for asynchronous error conditions * (see the description of @p completion_signal). * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_SIGNAL @p completion_signal is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT The source or destination * pointers are NULL, or the completion signal is 0. */ hsa_status_t HSA_API hsa_amd_memory_async_copy(void* dst, hsa_agent_t dst_agent, const void* src, hsa_agent_t src_agent, size_t size, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal); /* [Provisional API] Pitched memory descriptor. All elements must be 4 byte aligned. Pitch and slice are in bytes. */ typedef struct hsa_pitched_ptr_s { void* base; size_t pitch; size_t slice; } hsa_pitched_ptr_t; /* [Provisional API] Copy direction flag. */ typedef enum { hsaHostToHost = 0, hsaHostToDevice = 1, hsaDeviceToHost = 2, hsaDeviceToDevice = 3 } hsa_amd_copy_direction_t; /* [Provisional API] SDMA 3D memory copy API. The same requirements must be met by src and dst as in hsa_amd_memory_async_copy. Both src and dst must be directly accessible to the copy_agent during the copy, src and dst rects must not overlap. CPU agents are not supported. API requires SDMA and will return an error if SDMA is not available. Offsets and range carry x in bytes, y and z in rows and layers. */ hsa_status_t HSA_API hsa_amd_memory_async_copy_rect( const hsa_pitched_ptr_t* dst, const hsa_dim3_t* dst_offset, const hsa_pitched_ptr_t* src, const hsa_dim3_t* src_offset, const hsa_dim3_t* range, hsa_agent_t copy_agent, hsa_amd_copy_direction_t dir, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal); /** * @brief Type of accesses to a memory pool from a given agent. */ typedef enum { /** * The agent cannot directly access any buffer in the memory pool. */ HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED = 0, /** * The agent can directly access a buffer located in the pool; the application * does not need to invoke ::hsa_amd_agents_allow_access. */ HSA_AMD_MEMORY_POOL_ACCESS_ALLOWED_BY_DEFAULT = 1, /** * The agent can directly access a buffer located in the pool, but only if the * application has previously requested access to that buffer using * ::hsa_amd_agents_allow_access. */ HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT = 2 } hsa_amd_memory_pool_access_t; /** * @brief Properties of the relationship between an agent a memory pool. */ typedef enum { /** * Hyper-transport bus type. */ HSA_AMD_LINK_INFO_TYPE_HYPERTRANSPORT = 0, /** * QPI bus type. */ HSA_AMD_LINK_INFO_TYPE_QPI = 1, /** * PCIe bus type. */ HSA_AMD_LINK_INFO_TYPE_PCIE = 2, /** * Infiniband bus type. */ HSA_AMD_LINK_INFO_TYPE_INFINBAND = 3, /** * xGMI link type. */ HSA_AMD_LINK_INFO_TYPE_XGMI = 4 } hsa_amd_link_info_type_t; /** * @brief Link properties when accessing the memory pool from the specified * agent. */ typedef struct hsa_amd_memory_pool_link_info_s { /** * Minimum transfer latency (rounded to ns). */ uint32_t min_latency; /** * Maximum transfer latency (rounded to ns). */ uint32_t max_latency; /** * Minimum link interface bandwidth in MB/s. */ uint32_t min_bandwidth; /** * Maximum link interface bandwidth in MB/s. */ uint32_t max_bandwidth; /** * Support for 32-bit atomic transactions. */ bool atomic_support_32bit; /** * Support for 64-bit atomic transactions. */ bool atomic_support_64bit; /** * Support for cache coherent transactions. */ bool coherent_support; /** * The type of bus/link. */ hsa_amd_link_info_type_t link_type; /** * NUMA distance of memory pool relative to querying agent */ uint32_t numa_distance; } hsa_amd_memory_pool_link_info_t; /** * @brief Properties of the relationship between an agent a memory pool. */ typedef enum { /** * Access to buffers located in the memory pool. The type of this attribute * is ::hsa_amd_memory_pool_access_t. * * An agent can always directly access buffers currently located in a memory * pool that is associated (the memory_pool is one of the values returned by * ::hsa_amd_agent_iterate_memory_pools on the agent) with that agent. If the * buffer is currently located in a memory pool that is not associated with * the agent, and the value returned by this function for the given * combination of agent and memory pool is not * HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED, the application still needs to invoke * ::hsa_amd_agents_allow_access in order to gain direct access to the buffer. * * If the given agent can directly access buffers the pool, the result is not * HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED. If the memory pool is associated with * the agent, or it is of fined-grained type, the result must not be * HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED. If the memory pool is not associated * with the agent, and does not reside in the global segment, the result must * be HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED. */ HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS = 0, /** * Number of links to hop when accessing the memory pool from the specified * agent. The value of this attribute is zero if the memory pool is associated * with the agent, or if the access type is * HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED. The type of this attribute is * uint32_t. */ HSA_AMD_AGENT_MEMORY_POOL_INFO_NUM_LINK_HOPS = 1, /** * Details of each link hop when accessing the memory pool starting from the * specified agent. The type of this attribute is an array size of * HSA_AMD_AGENT_MEMORY_POOL_INFO_NUM_LINK_HOPS with each element containing * ::hsa_amd_memory_pool_link_info_t. */ HSA_AMD_AGENT_MEMORY_POOL_INFO_LINK_INFO = 2 } hsa_amd_agent_memory_pool_info_t; /** * @brief Get the current value of an attribute of the relationship between an * agent and a memory pool. * * @param[in] agent Agent. * * @param[in] memory_pool Memory pool. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to a application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * */ hsa_status_t HSA_API hsa_amd_agent_memory_pool_get_info( hsa_agent_t agent, hsa_amd_memory_pool_t memory_pool, hsa_amd_agent_memory_pool_info_t attribute, void* value); /** * @brief Enable direct access to a buffer from a given set of agents. * * @details * * Upon return, only the listed agents and the agent associated with the * buffer's memory pool have direct access to the @p ptr. * * Any agent that has access to the buffer before and after the call to * ::hsa_amd_agents_allow_access will also have access while * ::hsa_amd_agents_allow_access is in progress. * * The caller is responsible for ensuring that each agent in the list * must be able to access the memory pool containing @p ptr * (using ::hsa_amd_agent_memory_pool_get_info with ::HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS attribute), * otherwise error code is returned. * * @param[in] num_agents Size of @p agents. * * @param[in] agents List of agents. If @p num_agents is 0, this argument is * ignored. * * @param[in] flags A list of bit-field that is used to specify access * information in a per-agent basis. This is currently reserved and must be NULL. * * @param[in] ptr A buffer previously allocated using ::hsa_amd_memory_pool_allocate. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p num_agents is 0, or @p agents * is NULL, @p flags is not NULL, or attempting to enable access to agent(s) * because @p ptr is allocated from an inaccessible pool. * */ hsa_status_t HSA_API hsa_amd_agents_allow_access(uint32_t num_agents, const hsa_agent_t* agents, const uint32_t* flags, const void* ptr); /** * @brief Query if buffers currently located in some memory pool can be * relocated to a destination memory pool. * * @details If the returned value is non-zero, a migration of a buffer to @p * dst_memory_pool using ::hsa_amd_memory_migrate may nevertheless fail due to * resource limitations. * * @param[in] src_memory_pool Source memory pool. * * @param[in] dst_memory_pool Destination memory pool. * * @param[out] result Pointer to a memory location where the result of the query * is stored. Must not be NULL. If buffers currently located in @p * src_memory_pool can be relocated to @p dst_memory_pool, the result is * true. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_MEMORY_POOL One of the memory pools is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p result is NULL. */ hsa_status_t HSA_API hsa_amd_memory_pool_can_migrate(hsa_amd_memory_pool_t src_memory_pool, hsa_amd_memory_pool_t dst_memory_pool, bool* result); /** * @brief Relocate a buffer to a new memory pool. * * @details When a buffer is migrated, its virtual address remains the same but * its physical contents are moved to the indicated memory pool. * * After migration, only the agent associated with the destination pool will have access. * * The caller is also responsible for ensuring that the allocation in the * source memory pool where the buffer is currently located can be migrated to the * specified destination memory pool (using ::hsa_amd_memory_pool_can_migrate returns a value of true * for the source and destination memory pools), otherwise behavior is undefined. * * The caller must ensure that the buffer is not accessed while it is migrated. * * @param[in] ptr Buffer to be relocated. The buffer must have been released to system * prior to call this API. The buffer will be released to system upon completion. * * @param[in] memory_pool Memory pool where to place the buffer. * * @param[in] flags A bit-field that is used to specify migration * information. Must be zero. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_MEMORY_POOL The destination memory pool is * invalid. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES There is a failure in * allocating the necessary resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p flags is not 0. */ hsa_status_t HSA_API hsa_amd_memory_migrate(const void* ptr, hsa_amd_memory_pool_t memory_pool, uint32_t flags); /** * * @brief Pin a host pointer allocated by C/C++ or OS allocator (i.e. ordinary system DRAM) and * return a new pointer accessible by the @p agents. If the @p host_ptr overlaps with previously * locked memory, then the overlap area is kept locked (i.e multiple mappings are permitted). In * this case, the same input @p host_ptr may give different locked @p agent_ptr and when it does, * they are not necessarily coherent (i.e. accessing either @p agent_ptr is not equivalent). * Accesses to @p agent_ptr are coarse grained. * * @param[in] host_ptr A buffer allocated by C/C++ or OS allocator. * * @param[in] size The size to be locked. * * @param[in] agents Array of agent handle to gain access to the @p host_ptr. * If this parameter is NULL and the @p num_agent is 0, all agents * in the platform will gain access to the @p host_ptr. * * @param[out] agent_ptr Pointer to the location where to store the new address. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES There is a failure in * allocating the necessary resources. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT One or more agent in @p agents is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p size is 0 or @p host_ptr or * @p agent_ptr is NULL or @p agents not NULL but @p num_agent is 0 or @p agents * is NULL but @p num_agent is not 0. */ hsa_status_t HSA_API hsa_amd_memory_lock(void* host_ptr, size_t size, hsa_agent_t* agents, int num_agent, void** agent_ptr); /** * * @brief Pin a host pointer allocated by C/C++ or OS allocator (i.e. ordinary system DRAM) and * return a new pointer accessible by the @p agents. If the @p host_ptr overlaps with previously * locked memory, then the overlap area is kept locked (i.e. multiple mappings are permitted). * In this case, the same input @p host_ptr may give different locked @p agent_ptr and when it * does, they are not necessarily coherent (i.e. accessing either @p agent_ptr is not equivalent). * Acesses to the memory via @p agent_ptr have the same access properties as memory allocated from * @p pool as determined by ::hsa_amd_memory_pool_get_info and ::hsa_amd_agent_memory_pool_get_info * (ex. coarse/fine grain, platform atomic support, link info). Physical composition and placement * of the memory (ex. page size, NUMA binding) is not changed. * * @param[in] host_ptr A buffer allocated by C/C++ or OS allocator. * * @param[in] size The size to be locked. * * @param[in] agents Array of agent handle to gain access to the @p host_ptr. * If this parameter is NULL and the @p num_agent is 0, all agents * in the platform will gain access to the @p host_ptr. * * @param[in] pool Global memory pool owned by a CPU agent. * * @param[in] flags A bit-field that is used to specify allocation * directives. Reserved parameter, must be 0. * * @param[out] agent_ptr Pointer to the location where to store the new address. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES There is a failure in * allocating the necessary resources. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT One or more agent in @p agents is * invalid or can not access @p pool. * * @retval ::HSA_STATUS_ERROR_INVALID_MEMORY_POOL @p pool is invalid or not owned * by a CPU agent. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p size is 0 or @p host_ptr or * @p agent_ptr is NULL or @p agents not NULL but @p num_agent is 0 or @p agents * is NULL but @p num_agent is not 0 or flags is not 0. */ hsa_status_t HSA_API hsa_amd_memory_lock_to_pool(void* host_ptr, size_t size, hsa_agent_t* agents, int num_agent, hsa_amd_memory_pool_t pool, uint32_t flags, void** agent_ptr); /** * * @brief Unpin the host pointer previously pinned via ::hsa_amd_memory_lock or * ::hsa_amd_memory_lock_to_pool. * * @details The behavior is undefined if the host pointer being unpinned does not * match previous pinned address or if the host pointer was already deallocated. * * @param[in] host_ptr A buffer allocated by C/C++ or OS allocator that was * pinned previously via ::hsa_amd_memory_lock or ::hsa_amd_memory_lock_to_pool. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. */ hsa_status_t HSA_API hsa_amd_memory_unlock(void* host_ptr); /** * @brief Sets the first @p count of uint32_t of the block of memory pointed by * @p ptr to the specified @p value. * * @param[in] ptr Pointer to the block of memory to fill. * * @param[in] value Value to be set. * * @param[in] count Number of uint32_t element to be set to the value. * * @retval HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p ptr is NULL or * not 4 bytes aligned * * @retval HSA_STATUS_ERROR_INVALID_ALLOCATION if the given memory * region was not allocated with HSA runtime APIs. * */ hsa_status_t HSA_API hsa_amd_memory_fill(void* ptr, uint32_t value, size_t count); /** * @brief Maps an interop object into the HSA flat address space and establishes * memory residency. The metadata pointer is valid during the lifetime of the * map (until hsa_amd_interop_unmap_buffer is called). * Multiple calls to hsa_amd_interop_map_buffer with the same interop_handle * result in multiple mappings with potentially different addresses and * different metadata pointers. Concurrent operations on these addresses are * not coherent. Memory must be fenced to system scope to ensure consistency, * between mappings and with any views of this buffer in the originating * software stack. * * @param[in] num_agents Number of agents which require access to the memory * * @param[in] agents List of accessing agents. * * @param[in] interop_handle Handle of interop buffer (dmabuf handle in Linux) * * @param [in] flags Reserved, must be 0 * * @param[out] size Size in bytes of the mapped object * * @param[out] ptr Base address of the mapped object * * @param[out] metadata_size Size of metadata in bytes, may be NULL * * @param[out] metadata Pointer to metadata, may be NULL * * @retval HSA_STATUS_SUCCESS if successfully mapped * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT all other errors */ hsa_status_t HSA_API hsa_amd_interop_map_buffer(uint32_t num_agents, hsa_agent_t* agents, int interop_handle, uint32_t flags, size_t* size, void** ptr, size_t* metadata_size, const void** metadata); /** * @brief Removes a previously mapped interop object from HSA's flat address space. * Ends lifetime for the mapping's associated metadata pointer. */ hsa_status_t HSA_API hsa_amd_interop_unmap_buffer(void* ptr); /** * @brief Encodes an opaque vendor specific image format. The length of data * depends on the underlying format. This structure must not be copied as its * true length can not be determined. */ typedef struct hsa_amd_image_descriptor_s { /* Version number of the descriptor */ uint32_t version; /* Vendor and device PCI IDs for the format as VENDOR_ID<<16|DEVICE_ID. */ uint32_t deviceID; /* Start of vendor specific data. */ uint32_t data[1]; } hsa_amd_image_descriptor_t; /** * @brief Creates an image from an opaque vendor specific image format. * Does not modify data at image_data. Intended initially for * accessing interop images. * * @param agent[in] Agent on which to create the image * * @param[in] image_descriptor[in] Vendor specific image format * * @param[in] image_data Pointer to image backing store * * @param[in] access_permission Access permissions for the image object * * @param[out] image Created image object. * * @retval HSA_STATUS_SUCCESS Image created successfully * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT Bad or mismatched descriptor, * null image_data, or mismatched access_permission. */ hsa_status_t HSA_API hsa_amd_image_create( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, const hsa_amd_image_descriptor_t *image_layout, const void *image_data, hsa_access_permission_t access_permission, hsa_ext_image_t *image ); /** * @brief Denotes the type of memory in a pointer info query. */ typedef enum { /* Memory is not known to the HSA driver. Unallocated or unlocked system memory. */ HSA_EXT_POINTER_TYPE_UNKNOWN = 0, /* Memory was allocated with an HSA memory allocator. */ HSA_EXT_POINTER_TYPE_HSA = 1, /* System memory which has been locked for use with an HSA agent. Memory of this type is normal malloc'd memory and is always accessible to the CPU. Pointer info queries may not include CPU agents in the accessible agents list as the CPU has implicit access. */ HSA_EXT_POINTER_TYPE_LOCKED = 2, /* Memory originated in a graphics component and is shared with ROCr. */ HSA_EXT_POINTER_TYPE_GRAPHICS = 3, /* Memory has been shared with the local process via ROCr IPC APIs. */ HSA_EXT_POINTER_TYPE_IPC = 4 } hsa_amd_pointer_type_t; /** * @brief Describes a memory allocation known to ROCr. * Within a ROCr major version this structure can only grow. */ typedef struct hsa_amd_pointer_info_s { /* Size in bytes of this structure. Used for version control within a major ROCr revision. Set to sizeof(hsa_amd_pointer_t) prior to calling hsa_amd_pointer_info. If the runtime supports an older version of pointer info then size will be smaller on return. Members starting after the return value of size will not be updated by hsa_amd_pointer_info. */ uint32_t size; /* The type of allocation referenced. */ hsa_amd_pointer_type_t type; /* Base address at which non-host agents may access the allocation. */ void* agentBaseAddress; /* Base address at which the host agent may access the allocation. */ void* hostBaseAddress; /* Size of the allocation */ size_t sizeInBytes; /* Application provided value. */ void* userData; /* Reports an agent which "owns" (ie has preferred access to) the pool in which the allocation was made. When multiple agents share equal access to a pool (ex: multiple CPU agents, or multi-die GPU boards) any such agent may be returned. */ hsa_agent_t agentOwner; /* Contains a bitfield of hsa_amd_memory_pool_global_flag_t values. Reports the effective global flags bitmask for the allocation. This field is not meaningful if the type of the allocation is HSA_EXT_POINTER_TYPE_UNKNOWN. */ uint32_t global_flags; } hsa_amd_pointer_info_t; /** * @brief Retrieves information about the allocation referenced by the given * pointer. Optionally returns the number and list of agents which can * directly access the allocation. * * @param[in] ptr Pointer which references the allocation to retrieve info for. * * @param[in, out] info Pointer to structure to be filled with allocation info. * Data member size must be set to the size of the structure prior to calling * hsa_amd_pointer_info. On return size will be set to the size of the * pointer info structure supported by the runtime, if smaller. Members * beyond the returned value of size will not be updated by the API. * Must not be NULL. * * @param[in] alloc Function pointer to an allocator used to allocate the * @p accessible array. If NULL @p accessible will not be returned. * * @param[out] num_agents_accessible Recieves the count of agents in * @p accessible. If NULL @p accessible will not be returned. * * @param[out] accessible Recieves a pointer to the array, allocated by @p alloc, * holding the list of agents which may directly access the allocation. * May be NULL. * * @retval HSA_STATUS_SUCCESS Info retrieved successfully * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT NULL in @p ptr or @p info. */ hsa_status_t HSA_API hsa_amd_pointer_info(const void* ptr, hsa_amd_pointer_info_t* info, void* (*alloc)(size_t), uint32_t* num_agents_accessible, hsa_agent_t** accessible); /** * @brief Associates an arbitrary pointer with an allocation known to ROCr. * The pointer can be fetched by hsa_amd_pointer_info in the userData field. * * @param[in] ptr Pointer to the first byte of an allocation known to ROCr * with which to associate @p userdata. * * @param[in] userdata Abitrary pointer to associate with the allocation. * * @retval HSA_STATUS_SUCCESS @p userdata successfully stored. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p ptr is not known to ROCr. */ hsa_status_t HSA_API hsa_amd_pointer_info_set_userdata(const void* ptr, void* userdata); /** * @brief 256-bit process independent identifier for a ROCr shared memory * allocation. */ typedef struct hsa_amd_ipc_memory_s { uint32_t handle[8]; } hsa_amd_ipc_memory_t; /** * @brief Prepares an allocation for interprocess sharing and creates a * handle of type hsa_amd_ipc_memory_t uniquely identifying the allocation. A * handle is valid while the allocation it references remains accessible in * any process. In general applications should confirm that a shared memory * region has been attached (via hsa_amd_ipc_memory_attach) in the remote * process prior to releasing that memory in the local process. * Repeated calls for the same allocation may, but are not required to, return * unique handles. * * @param[in] ptr Pointer to memory allocated via ROCr APIs to prepare for * sharing. * * @param[in] len Length in bytes of the allocation to share. * * @param[out] handle Process independent identifier referencing the shared * allocation. * * @retval HSA_STATUS_SUCCESS allocation is prepared for interprocess sharing. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p ptr does not point to the * first byte of an allocation made through ROCr, or len is not the full length * of the allocation or handle is NULL. */ hsa_status_t HSA_API hsa_amd_ipc_memory_create(void* ptr, size_t len, hsa_amd_ipc_memory_t* handle); /** * @brief Imports shared memory into the local process and makes it accessible * by the given agents. If a shared memory handle is attached multiple times * in a process each attach may return a different address. Each returned * address is refcounted and requires a matching number of calls to * hsa_amd_ipc_memory_detach to release the shared memory mapping. * * @param[in] handle Pointer to the identifier for the shared memory. * * @param[in] len Length of the shared memory to import. * Reserved. Must be the full length of the shared allocation in this version. * * @param[in] num_agents Count of agents in @p mapping_agents. * May be zero if all agents are to be allowed access. * * @param[in] mapping_agents List of agents to access the shared memory. * Ignored if @p num_agents is zero. * * @param[out] mapped_ptr Recieves a process local pointer to the shared memory. * * @retval HSA_STATUS_SUCCESS if memory is successfully imported. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p handle is not valid, @p len is * incorrect, @p mapped_ptr is NULL, or some agent for which access was * requested can not access the shared memory. */ hsa_status_t HSA_API hsa_amd_ipc_memory_attach( const hsa_amd_ipc_memory_t* handle, size_t len, uint32_t num_agents, const hsa_agent_t* mapping_agents, void** mapped_ptr); /** * @brief Decrements the reference count for the shared memory mapping and * releases access to shared memory imported with hsa_amd_ipc_memory_attach. * * @param[in] mapped_ptr Pointer to the first byte of a shared allocation * imported with hsa_amd_ipc_memory_attach. * * @retval HSA_STATUS_SUCCESS if @p mapped_ptr was imported with * hsa_amd_ipc_memory_attach. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p mapped_ptr was not imported * with hsa_amd_ipc_memory_attach. */ hsa_status_t HSA_API hsa_amd_ipc_memory_detach(void* mapped_ptr); /** * @brief 256-bit process independent identifier for a ROCr IPC signal. */ typedef hsa_amd_ipc_memory_t hsa_amd_ipc_signal_t; /** * @brief Obtains an interprocess sharing handle for a signal. The handle is * valid while the signal it references remains valid in any process. In * general applications should confirm that the signal has been attached (via * hsa_amd_ipc_signal_attach) in the remote process prior to destroying that * signal in the local process. * Repeated calls for the same signal may, but are not required to, return * unique handles. * * @param[in] signal Signal created with attribute HSA_AMD_SIGNAL_IPC. * * @param[out] handle Process independent identifier referencing the shared * signal. * * @retval HSA_STATUS_SUCCESS @p handle is ready to use for interprocess sharing. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p signal is not a valid signal * created with attribute HSA_AMD_SIGNAL_IPC or handle is NULL. */ hsa_status_t HSA_API hsa_amd_ipc_signal_create(hsa_signal_t signal, hsa_amd_ipc_signal_t* handle); /** * @brief Imports an IPC capable signal into the local process. If an IPC * signal handle is attached multiple times in a process each attach may return * a different signal handle. Each returned signal handle is refcounted and * requires a matching number of calls to hsa_signal_destroy to release the * shared signal. * * @param[in] handle Pointer to the identifier for the shared signal. * * @param[out] signal Recieves a process local signal handle to the shared signal. * * @retval HSA_STATUS_SUCCESS if the signal is successfully imported. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED if HSA is not initialized * * @retval HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p handle is not valid. */ hsa_status_t HSA_API hsa_amd_ipc_signal_attach(const hsa_amd_ipc_signal_t* handle, hsa_signal_t* signal); /** * @brief GPU system event type. */ typedef enum hsa_amd_event_type_s { /* AMD GPU memory fault. */ HSA_AMD_GPU_MEMORY_FAULT_EVENT = 0, } hsa_amd_event_type_t; /** * @brief Flags denoting the cause of a memory fault. */ typedef enum { // Page not present or supervisor privilege. HSA_AMD_MEMORY_FAULT_PAGE_NOT_PRESENT = 1 << 0, // Write access to a read-only page. HSA_AMD_MEMORY_FAULT_READ_ONLY = 1 << 1, // Execute access to a page marked NX. HSA_AMD_MEMORY_FAULT_NX = 1 << 2, // GPU attempted access to a host only page. HSA_AMD_MEMORY_FAULT_HOST_ONLY = 1 << 3, // DRAM ECC failure. HSA_AMD_MEMORY_FAULT_DRAMECC = 1 << 4, // Can't determine the exact fault address. HSA_AMD_MEMORY_FAULT_IMPRECISE = 1 << 5, // SRAM ECC failure (ie registers, no fault address). HSA_AMD_MEMORY_FAULT_SRAMECC = 1 << 6, // GPU reset following unspecified hang. HSA_AMD_MEMORY_FAULT_HANG = 1 << 31 } hsa_amd_memory_fault_reason_t; /** * @brief AMD GPU memory fault event data. */ typedef struct hsa_amd_gpu_memory_fault_info_s { /* The agent where the memory fault occurred. */ hsa_agent_t agent; /* Virtual address accessed. */ uint64_t virtual_address; /* Bit field encoding the memory access failure reasons. There could be multiple bits set for one fault. Bits are defined in hsa_amd_memory_fault_reason_t. */ uint32_t fault_reason_mask; } hsa_amd_gpu_memory_fault_info_t; /** * @brief AMD GPU event data passed to event handler. */ typedef struct hsa_amd_event_s { /* The event type. */ hsa_amd_event_type_t event_type; union { /* The memory fault info, only valid when @p event_type is HSA_AMD_GPU_MEMORY_FAULT_EVENT. */ hsa_amd_gpu_memory_fault_info_t memory_fault; }; } hsa_amd_event_t; typedef hsa_status_t (*hsa_amd_system_event_callback_t)(const hsa_amd_event_t* event, void* data); /** * @brief Register AMD GPU event handler. * * @param[in] callback Callback to be invoked when an event is triggered. * The HSA runtime passes two arguments to the callback: @p event * is defined per event by the HSA runtime, and @p data is the user data. * * @param[in] data User data that is passed to @p callback. May be NULL. * * @retval HSA_STATUS_SUCCESS The handler has been registered successfully. * * @retval HSA_STATUS_ERROR An event handler has already been registered. * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p event is invalid. */ hsa_status_t HSA_API hsa_amd_register_system_event_handler(hsa_amd_system_event_callback_t callback, void* data); /** * @brief Per-queue dispatch and wavefront scheduling priority. */ typedef enum hsa_amd_queue_priority_s { /* Below normal/high priority compute and all graphics */ HSA_AMD_QUEUE_PRIORITY_LOW = 0, /* Above low priority compute, below high priority compute and all graphics */ HSA_AMD_QUEUE_PRIORITY_NORMAL = 1, /* Above low/normal priority compute and all graphics */ HSA_AMD_QUEUE_PRIORITY_HIGH = 2, } hsa_amd_queue_priority_t; /** * @brief Modifies the dispatch and wavefront scheduling prioirty for a * given compute queue. The default is HSA_AMD_QUEUE_PRIORITY_NORMAL. * * @param[in] queue Compute queue to apply new priority to. * * @param[in] priority Priority to associate with queue. * * @retval HSA_STATUS_SUCCESS if priority was changed successfully. * * @retval HSA_STATUS_ERROR_INVALID_QUEUE if queue is not a valid * compute queue handle. * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT if priority is not a valid * value from hsa_amd_queue_priority_t. */ hsa_status_t HSA_API hsa_amd_queue_set_priority(hsa_queue_t* queue, hsa_amd_queue_priority_t priority); /** * @brief Deallocation notifier function type. */ typedef void (*hsa_amd_deallocation_callback_t)(void* ptr, void* user_data); /** * @brief Registers a deallocation notifier monitoring for release of agent * accessible address @p ptr. If successful, @p callback will be invoked when * @p ptr is removed from accessibility from all agents. * * Notification callbacks are automatically deregistered when they are invoked. * * Note: The current version supports notifications of address release * originating from ::hsa_amd_memory_pool_free. Support for other address * release APIs will follow. * * @param[in] ptr Agent accessible address to monitor for deallocation. Passed * to @p callback. * * @param[in] callback Notifier to be invoked when @p ptr is released from * agent accessibility. * * @param[in] user_data User provided value passed to @p callback. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The notifier registered successfully * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ALLOCATION @p ptr does not refer to a valid agent accessible * address. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL or @p ptr is NULL. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES if there is a failure in allocating * necessary resources */ hsa_status_t HSA_API hsa_amd_register_deallocation_callback(void* ptr, hsa_amd_deallocation_callback_t callback, void* user_data); /** * @brief Removes a deallocation notifier previously registered with * ::hsa_amd_register_deallocation_callback. Arguments must be identical to * those given in ::hsa_amd_register_deallocation_callback. * * @param[in] ptr Agent accessible address which was monitored for deallocation. * * @param[in] callback Notifier to be removed. * * @retval ::HSA_STATUS_SUCCESS The notifier has been removed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT The given notifier was not registered. */ hsa_status_t HSA_API hsa_amd_deregister_deallocation_callback(void* ptr, hsa_amd_deallocation_callback_t callback); typedef enum hsa_amd_svm_model_s { /** * Updates to memory with this attribute conform to HSA memory consistency * model. */ HSA_AMD_SVM_GLOBAL_FLAG_FINE_GRAINED = 0, /** * Writes to memory with this attribute can be performed by a single agent * at a time. */ HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED = 1, /** * Memory region queried contains subregions with both * HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED and * HSA_AMD_SVM_GLOBAL_FLAG_FINE_GRAINED attributes. * * This attribute can not be used in hsa_amd_svm_attributes_set. It is a * possible return from hsa_amd_svm_attributes_get indicating that the query * region contains both coarse and fine grained memory. */ HSA_AMD_SVM_GLOBAL_FLAG_INDETERMINATE = 2 } hsa_amd_svm_model_t; typedef enum hsa_amd_svm_attribute_s { // Memory model attribute. // Type of this attribute is hsa_amd_svm_model_t. HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG = 0, // Marks the range read only. This allows multiple physical copies to be // placed local to each accessing device. // Type of this attribute is bool. HSA_AMD_SVM_ATTRIB_READ_ONLY = 1, // Automatic migrations should attempt to keep the memory within the xgmi hive // containing accessible agents. // Type of this attribute is bool. HSA_AMD_SVM_ATTRIB_HIVE_LOCAL = 2, // Page granularity to migrate at once. Page granularity is specified as // log2(page_count). // Type of this attribute is uint64_t. HSA_AMD_SVM_ATTRIB_MIGRATION_GRANULARITY = 3, // Physical location to prefer when automatic migration occurs. // Set to the null agent handle (handle == 0) to indicate there // is no preferred location. // Type of this attribute is hsa_agent_t. HSA_AMD_SVM_ATTRIB_PREFERRED_LOCATION = 4, // This attribute can not be used in ::hsa_amd_svm_attributes_set (see // ::hsa_amd_svm_prefetch_async). // Queries the physical location of most recent prefetch command. // If the prefetch location has not been set or is not uniform across the // address range then returned hsa_agent_t::handle will be 0. // Querying this attribute will return the destination agent of the most // recent ::hsa_amd_svm_prefetch_async targeting the address range. If // multiple async prefetches have been issued targeting the region and the // most recently issued prefetch has completed then the query will return // the location of the most recently completed prefetch. // Type of this attribute is hsa_agent_t. HSA_AMD_SVM_ATTRIB_PREFETCH_LOCATION = 5, // Optimizes with the anticipation that the majority of operations to the // range will be read operations. // Type of this attribute is bool. HSA_AMD_SVM_ATTRIB_READ_MOSTLY = 6, // This attribute can not be used in ::hsa_amd_svm_attributes_get. // Enables an agent for access to the range. Access may incur a page fault // and associated memory migration. Either this or // HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE is required prior to SVM // access if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is false. // Type of this attribute is hsa_agent_t. HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE = 0x200, // This attribute can not be used in ::hsa_amd_svm_attributes_get. // Enables an agent for access to the range without page faults. Access // will not incur a page fault and will not cause access based migration. // and associated memory migration. Either this or // HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE is required prior to SVM access if // HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is false. // Type of this attribute is hsa_agent_t. HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE = 0x201, // This attribute can not be used in ::hsa_amd_svm_attributes_get. // Denies an agent access to the memory range. Access will cause a terminal // segfault. // Type of this attribute is hsa_agent_t. HSA_AMD_SVM_ATTRIB_AGENT_NO_ACCESS = 0x202, // This attribute can not be used in ::hsa_amd_svm_attributes_set. // Returns the access attribute associated with the agent. // The agent to query must be set in the attribute value field. // The attribute enum will be replaced with the agent's current access // attribute for the address range. // TODO: Clarify KFD return value for non-uniform access attribute. // Type of this attribute is hsa_agent_t. HSA_AMD_SVM_ATTRIB_ACCESS_QUERY = 0x203, } hsa_amd_svm_attribute_t; // List type for hsa_amd_svm_attributes_set/get. typedef struct hsa_amd_svm_attribute_pair_s { // hsa_amd_svm_attribute_t value. uint64_t attribute; // Attribute value. Bit values should be interpreted according to the type // given in the associated attribute description. uint64_t value; } hsa_amd_svm_attribute_pair_t; /** * @brief Sets SVM memory attributes. * * If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT returns false then enabling * access to an Agent via this API (setting HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE * or HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE) is required prior to SVM * memory access by that Agent. * * Attributes HSA_AMD_SVM_ATTRIB_ACCESS_QUERY and HSA_AMD_SVM_ATTRIB_PREFETCH_LOCATION * may not be used with this API. * * @param[in] ptr Will be aligned down to nearest page boundary. * * @param[in] size Will be aligned up to nearest page boundary. * * @param[in] attribute_list List of attributes to set for the address range. * * @param[in] attribute_count Length of @p attribute_list. */ hsa_status_t hsa_amd_svm_attributes_set(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count); /** * @brief Gets SVM memory attributes. * * Attributes HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE, * HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE and * HSA_AMD_SVM_ATTRIB_PREFETCH_LOCATION may not be used with this API. * * Note that attribute HSA_AMD_SVM_ATTRIB_ACCESS_QUERY takes as input an * hsa_agent_t and returns the current access type through its attribute field. * * @param[in] ptr Will be aligned down to nearest page boundary. * * @param[in] size Will be aligned up to nearest page boundary. * * @param[in] attribute_list List of attributes to set for the address range. * * @param[in] attribute_count Length of @p attribute_list. */ hsa_status_t hsa_amd_svm_attributes_get(void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, size_t attribute_count); /** * @brief Asynchronously migrates memory to an agent. * * Schedules memory migration to @p agent when @p dep_signals have been observed equal to zero. * @p completion_signal will decrement when the migration is complete. * * @param[in] ptr Will be aligned down to nearest page boundary. * * @param[in] size Will be aligned up to nearest page boundary. * * @param[in] agent Agent to migrate to. * * @param[in] num_dep_signals Number of dependent signals. Can be 0. * * @param[in] dep_signals List of signals that must be waited on before the migration * operation starts. The migration will start after every signal has been observed with * the value 0. If @p num_dep_signals is 0, this argument is ignored. * * @param[in] completion_signal Signal used to indicate completion of the migration * operation. When the migration operation is finished, the value of the signal is * decremented. The runtime indicates that an error has occurred during the copy * operation by setting the value of the completion signal to a negative * number. If no completion signal is required this handle may be null. */ hsa_status_t hsa_amd_svm_prefetch_async(void* ptr, size_t size, hsa_agent_t agent, uint32_t num_dep_signals, const hsa_signal_t* dep_signals, hsa_signal_t completion_signal); #ifdef __cplusplus } // end extern "C" block #endif #endif // header guard ROCR-Runtime-rocm-5.0.0/src/inc/hsa_ext_finalize.h000066400000000000000000000474031420110115200216600ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_INC_HSA_EXT_FINALIZE_H_ #define HSA_RUNTIME_INC_HSA_EXT_FINALIZE_H_ #include "hsa.h" #undef HSA_API #ifdef HSA_EXPORT_FINALIZER #define HSA_API HSA_API_EXPORT #else #define HSA_API HSA_API_IMPORT #endif #ifdef __cplusplus extern "C" { #endif // __cplusplus struct BrigModuleHeader; typedef struct BrigModuleHeader* BrigModule_t; /** \defgroup ext-alt-finalizer-extensions Finalization Extensions * @{ */ /** * @brief Enumeration constants added to ::hsa_status_t by this extension. */ enum { /** * The HSAIL program is invalid. */ HSA_EXT_STATUS_ERROR_INVALID_PROGRAM = 0x2000, /** * The HSAIL module is invalid. */ HSA_EXT_STATUS_ERROR_INVALID_MODULE = 0x2001, /** * Machine model or profile of the HSAIL module do not match the machine model * or profile of the HSAIL program. */ HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE = 0x2002, /** * The HSAIL module is already a part of the HSAIL program. */ HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED = 0x2003, /** * Compatibility mismatch between symbol declaration and symbol definition. */ HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH = 0x2004, /** * The finalization encountered an error while finalizing a kernel or * indirect function. */ HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED = 0x2005, /** * Mismatch between a directive in the control directive structure and in * the HSAIL kernel. */ HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH = 0x2006 }; /** @} */ /** \defgroup ext-alt-finalizer-program Finalization Program * @{ */ /** * @brief HSAIL (BRIG) module. The HSA Programmer's Reference Manual contains * the definition of the BrigModule_t type. */ typedef BrigModule_t hsa_ext_module_t; /** * @brief An opaque handle to a HSAIL program, which groups a set of HSAIL * modules that collectively define functions and variables used by kernels and * indirect functions. */ typedef struct hsa_ext_program_s { /** * Opaque handle. */ uint64_t handle; } hsa_ext_program_t; /** * @brief Create an empty HSAIL program. * * @param[in] machine_model Machine model used in the HSAIL program. * * @param[in] profile Profile used in the HSAIL program. * * @param[in] default_float_rounding_mode Default float rounding mode used in * the HSAIL program. * * @param[in] options Vendor-specific options. May be NULL. * * @param[out] program Memory location where the HSA runtime stores the newly * created HSAIL program handle. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES There is a failure to allocate * resources required for the operation. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p machine_model is invalid, * @p profile is invalid, @p default_float_rounding_mode is invalid, or * @p program is NULL. */ hsa_status_t HSA_API hsa_ext_program_create( hsa_machine_model_t machine_model, hsa_profile_t profile, hsa_default_float_rounding_mode_t default_float_rounding_mode, const char *options, hsa_ext_program_t *program); /** * @brief Destroy a HSAIL program. * * @details The HSAIL program handle becomes invalid after it has been * destroyed. Code object handles produced by ::hsa_ext_program_finalize are * still valid after the HSAIL program has been destroyed, and can be used as * intended. Resources allocated outside and associated with the HSAIL program * (such as HSAIL modules that are added to the HSAIL program) can be released * after the finalization program has been destroyed. * * @param[in] program HSAIL program. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_EXT_STATUS_ERROR_INVALID_PROGRAM The HSAIL program is * invalid. */ hsa_status_t HSA_API hsa_ext_program_destroy( hsa_ext_program_t program); /** * @brief Add a HSAIL module to an existing HSAIL program. * * @details The HSA runtime does not perform a deep copy of the HSAIL module * upon addition. Instead, it stores a pointer to the HSAIL module. The * ownership of the HSAIL module belongs to the application, which must ensure * that @p module is not released before destroying the HSAIL program. * * The HSAIL module is successfully added to the HSAIL program if @p module is * valid, if all the declarations and definitions for the same symbol are * compatible, and if @p module specify machine model and profile that matches * the HSAIL program. * * @param[in] program HSAIL program. * * @param[in] module HSAIL module. The application can add the same HSAIL module * to @p program at most once. The HSAIL module must specify the same machine * model and profile as @p program. If the floating-mode rounding mode of @p * module is not default, then it should match that of @p program. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES There is a failure to allocate * resources required for the operation. * * @retval ::HSA_EXT_STATUS_ERROR_INVALID_PROGRAM The HSAIL program is invalid. * * @retval ::HSA_EXT_STATUS_ERROR_INVALID_MODULE The HSAIL module is invalid. * * @retval ::HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE The machine model of @p * module does not match machine model of @p program, or the profile of @p * module does not match profile of @p program. * * @retval ::HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED The HSAIL module is * already a part of the HSAIL program. * * @retval ::HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH Symbol declaration and symbol * definition compatibility mismatch. See the symbol compatibility rules in the * HSA Programming Reference Manual. */ hsa_status_t HSA_API hsa_ext_program_add_module( hsa_ext_program_t program, hsa_ext_module_t module); /** * @brief Iterate over the HSAIL modules in a program, and invoke an * application-defined callback on every iteration. * * @param[in] program HSAIL program. * * @param[in] callback Callback to be invoked once per HSAIL module in the * program. The HSA runtime passes three arguments to the callback: the program, * a HSAIL module, and the application data. If @p callback returns a status * other than ::HSA_STATUS_SUCCESS for a particular iteration, the traversal * stops and ::hsa_ext_program_iterate_modules returns that status value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_EXT_STATUS_ERROR_INVALID_PROGRAM The program is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t HSA_API hsa_ext_program_iterate_modules( hsa_ext_program_t program, hsa_status_t (*callback)(hsa_ext_program_t program, hsa_ext_module_t module, void* data), void* data); /** * @brief HSAIL program attributes. */ typedef enum { /** * Machine model specified when the HSAIL program was created. The type * of this attribute is ::hsa_machine_model_t. */ HSA_EXT_PROGRAM_INFO_MACHINE_MODEL = 0, /** * Profile specified when the HSAIL program was created. The type of * this attribute is ::hsa_profile_t. */ HSA_EXT_PROGRAM_INFO_PROFILE = 1, /** * Default float rounding mode specified when the HSAIL program was * created. The type of this attribute is ::hsa_default_float_rounding_mode_t. */ HSA_EXT_PROGRAM_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 2 } hsa_ext_program_info_t; /** * @brief Get the current value of an attribute for a given HSAIL program. * * @param[in] program HSAIL program. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behaviour is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_EXT_STATUS_ERROR_INVALID_PROGRAM The HSAIL program is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * HSAIL program attribute, or @p value is NULL. */ hsa_status_t HSA_API hsa_ext_program_get_info( hsa_ext_program_t program, hsa_ext_program_info_t attribute, void *value); /** * @brief Finalizer-determined call convention. */ typedef enum { /** * Finalizer-determined call convention. */ HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO = -1 } hsa_ext_finalizer_call_convention_t; /** * @brief Control directives specify low-level information about the * finalization process. */ typedef struct hsa_ext_control_directives_s { /** * Bitset indicating which control directives are enabled. The bit assigned to * a control directive is determined by the corresponding value in * BrigControlDirective. * * If a control directive is disabled, its corresponding field value (if any) * must be 0. Control directives that are only present or absent (such as * partial workgroups) have no corresponding field as the presence of the bit * in this mask is sufficient. */ uint64_t control_directives_mask; /** * Bitset of HSAIL exceptions that must have the BREAK policy enabled. The bit * assigned to an HSAIL exception is determined by the corresponding value * in BrigExceptionsMask. If the kernel contains a enablebreakexceptions * control directive, the finalizer uses the union of the two masks. */ uint16_t break_exceptions_mask; /** * Bitset of HSAIL exceptions that must have the DETECT policy enabled. The * bit assigned to an HSAIL exception is determined by the corresponding value * in BrigExceptionsMask. If the kernel contains a enabledetectexceptions * control directive, the finalizer uses the union of the two masks. */ uint16_t detect_exceptions_mask; /** * Maximum size (in bytes) of dynamic group memory that will be allocated by * the application for any dispatch of the kernel. If the kernel contains a * maxdynamicsize control directive, the two values should match. */ uint32_t max_dynamic_group_size; /** * Maximum number of grid work-items that will be used by the application to * launch the kernel. If the kernel contains a maxflatgridsize control * directive, the value of @a max_flat_grid_size must not be greater than the * value of the directive, and takes precedence. * * The value specified for maximum absolute grid size must be greater than or * equal to the product of the values specified by @a required_grid_size. * * If the bit at position BRIG_CONTROL_MAXFLATGRIDSIZE is set in @a * control_directives_mask, this field must be greater than 0. */ uint64_t max_flat_grid_size; /** * Maximum number of work-group work-items that will be used by the * application to launch the kernel. If the kernel contains a * maxflatworkgroupsize control directive, the value of @a * max_flat_workgroup_size must not be greater than the value of the * directive, and takes precedence. * * The value specified for maximum absolute grid size must be greater than or * equal to the product of the values specified by @a required_workgroup_size. * * If the bit at position BRIG_CONTROL_MAXFLATWORKGROUPSIZE is set in @a * control_directives_mask, this field must be greater than 0. */ uint32_t max_flat_workgroup_size; /** * Reserved. Must be 0. */ uint32_t reserved1; /** * Grid size that will be used by the application in any dispatch of the * kernel. If the kernel contains a requiredgridsize control directive, the * dimensions should match. * * The specified grid size must be consistent with @a required_workgroup_size * and @a required_dim. Also, the product of the three dimensions must not * exceed @a max_flat_grid_size. Note that the listed invariants must hold * only if all the corresponding control directives are enabled. * * If the bit at position BRIG_CONTROL_REQUIREDGRIDSIZE is set in @a * control_directives_mask, the three dimension values must be greater than 0. */ uint64_t required_grid_size[3]; /** * Work-group size that will be used by the application in any dispatch of the * kernel. If the kernel contains a requiredworkgroupsize control directive, * the dimensions should match. * * The specified work-group size must be consistent with @a required_grid_size * and @a required_dim. Also, the product of the three dimensions must not * exceed @a max_flat_workgroup_size. Note that the listed invariants must * hold only if all the corresponding control directives are enabled. * * If the bit at position BRIG_CONTROL_REQUIREDWORKGROUPSIZE is set in @a * control_directives_mask, the three dimension values must be greater than 0. */ hsa_dim3_t required_workgroup_size; /** * Number of dimensions that will be used by the application to launch the * kernel. If the kernel contains a requireddim control directive, the two * values should match. * * The specified dimensions must be consistent with @a required_grid_size and * @a required_workgroup_size. This invariant must hold only if all the * corresponding control directives are enabled. * * If the bit at position BRIG_CONTROL_REQUIREDDIM is set in @a * control_directives_mask, this field must be 1, 2, or 3. */ uint8_t required_dim; /** * Reserved. Must be 0. */ uint8_t reserved2[75]; } hsa_ext_control_directives_t; /** * @brief Finalize an HSAIL program for a given instruction set architecture. * * @details Finalize all of the kernels and indirect functions that belong to * the same HSAIL program for a specific instruction set architecture (ISA). The * transitive closure of all functions specified by call or scall must be * defined. Kernels and indirect functions that are being finalized must be * defined. Kernels and indirect functions that are referenced in kernels and * indirect functions being finalized may or may not be defined, but must be * declared. All the global/readonly segment variables that are referenced in * kernels and indirect functions being finalized may or may not be defined, but * must be declared. * * @param[in] program HSAIL program. * * @param[in] isa Instruction set architecture to finalize for. * * @param[in] call_convention A call convention used in a finalization. Must * have a value between ::HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO (inclusive) * and the value of the attribute ::HSA_ISA_INFO_CALL_CONVENTION_COUNT in @p * isa (not inclusive). * * @param[in] control_directives Low-level control directives that influence * the finalization process. * * @param[in] options Vendor-specific options. May be NULL. * * @param[in] code_object_type Type of code object to produce. * * @param[out] code_object Code object generated by the Finalizer, which * contains the machine code for the kernels and indirect functions in the HSAIL * program. The code object is independent of the HSAIL module that was used to * generate it. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES There is a failure to allocate * resources required for the operation. * * @retval ::HSA_EXT_STATUS_ERROR_INVALID_PROGRAM The HSAIL program is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ISA @p isa is invalid. * * @retval ::HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH The directive in * the control directive structure and in the HSAIL kernel mismatch, or if the * same directive is used with a different value in one of the functions used by * this kernel. * * @retval ::HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED The Finalizer * encountered an error while compiling a kernel or an indirect function. */ hsa_status_t HSA_API hsa_ext_program_finalize( hsa_ext_program_t program, hsa_isa_t isa, int32_t call_convention, hsa_ext_control_directives_t control_directives, const char *options, hsa_code_object_type_t code_object_type, hsa_code_object_t *code_object); /** @} */ #define hsa_ext_finalizer_1_00 typedef struct hsa_ext_finalizer_1_00_pfn_s { hsa_status_t (*hsa_ext_program_create)( hsa_machine_model_t machine_model, hsa_profile_t profile, hsa_default_float_rounding_mode_t default_float_rounding_mode, const char *options, hsa_ext_program_t *program); hsa_status_t (*hsa_ext_program_destroy)(hsa_ext_program_t program); hsa_status_t (*hsa_ext_program_add_module)(hsa_ext_program_t program, hsa_ext_module_t module); hsa_status_t (*hsa_ext_program_iterate_modules)( hsa_ext_program_t program, hsa_status_t (*callback)(hsa_ext_program_t program, hsa_ext_module_t module, void *data), void *data); hsa_status_t (*hsa_ext_program_get_info)( hsa_ext_program_t program, hsa_ext_program_info_t attribute, void *value); hsa_status_t (*hsa_ext_program_finalize)( hsa_ext_program_t program, hsa_isa_t isa, int32_t call_convention, hsa_ext_control_directives_t control_directives, const char *options, hsa_code_object_type_t code_object_type, hsa_code_object_t *code_object); } hsa_ext_finalizer_1_00_pfn_t; #ifdef __cplusplus } // extern "C" block #endif // __cplusplus #endif // HSA_RUNTIME_INC_HSA_EXT_FINALIZE_H_ ROCR-Runtime-rocm-5.0.0/src/inc/hsa_ext_image.h000066400000000000000000001517301420110115200211400ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_EXT_IMAGE_H #define HSA_EXT_IMAGE_H #include "hsa.h" #undef HSA_API #ifdef HSA_EXPORT_IMAGES #define HSA_API HSA_API_EXPORT #else #define HSA_API HSA_API_IMPORT #endif #ifdef __cplusplus extern "C" { #endif /*__cplusplus*/ /** \defgroup ext-images Images and Samplers * @{ */ /** * @brief Enumeration constants added to ::hsa_status_t by this extension. * * @remark Additions to hsa_status_t */ enum { /** * Image format is not supported. */ HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED = 0x3000, /** * Image size is not supported. */ HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED = 0x3001, /** * Image pitch is not supported or invalid. */ HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED = 0x3002, /** * Sampler descriptor is not supported or invalid. */ HSA_EXT_STATUS_ERROR_SAMPLER_DESCRIPTOR_UNSUPPORTED = 0x3003 }; /** * @brief Enumeration constants added to ::hsa_agent_info_t by this * extension. * * @remark Additions to hsa_agent_info_t */ enum { /** * Maximum number of elements in 1D images. Must be at least 16384. The type * of this attribute is size_t. */ HSA_EXT_AGENT_INFO_IMAGE_1D_MAX_ELEMENTS = 0x3000, /** * Maximum number of elements in 1DA images. Must be at least 16384. The type * of this attribute is size_t. */ HSA_EXT_AGENT_INFO_IMAGE_1DA_MAX_ELEMENTS = 0x3001, /** * Maximum number of elements in 1DB images. Must be at least 65536. The type * of this attribute is size_t. */ HSA_EXT_AGENT_INFO_IMAGE_1DB_MAX_ELEMENTS = 0x3002, /** * Maximum dimensions (width, height) of 2D images, in image elements. The X * and Y maximums must be at least 16384. The type of this attribute is * size_t[2]. */ HSA_EXT_AGENT_INFO_IMAGE_2D_MAX_ELEMENTS = 0x3003, /** * Maximum dimensions (width, height) of 2DA images, in image elements. The X * and Y maximums must be at least 16384. The type of this attribute is * size_t[2]. */ HSA_EXT_AGENT_INFO_IMAGE_2DA_MAX_ELEMENTS = 0x3004, /** * Maximum dimensions (width, height) of 2DDEPTH images, in image * elements. The X and Y maximums must be at least 16384. The type of this * attribute is size_t[2]. */ HSA_EXT_AGENT_INFO_IMAGE_2DDEPTH_MAX_ELEMENTS = 0x3005, /** * Maximum dimensions (width, height) of 2DADEPTH images, in image * elements. The X and Y maximums must be at least 16384. The type of this * attribute is size_t[2]. */ HSA_EXT_AGENT_INFO_IMAGE_2DADEPTH_MAX_ELEMENTS = 0x3006, /** * Maximum dimensions (width, height, depth) of 3D images, in image * elements. The maximum along any dimension must be at least 2048. The type * of this attribute is size_t[3]. */ HSA_EXT_AGENT_INFO_IMAGE_3D_MAX_ELEMENTS = 0x3007, /** * Maximum number of image layers in a image array. Must be at least 2048. The * type of this attribute is size_t. */ HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS = 0x3008, /** * Maximum number of read-only image handles that can be created for an agent at any one * time. Must be at least 128. The type of this attribute is size_t. */ HSA_EXT_AGENT_INFO_MAX_IMAGE_RD_HANDLES = 0x3009, /** * Maximum number of write-only and read-write image handles (combined) that * can be created for an agent at any one time. Must be at least 64. The type of this * attribute is size_t. */ HSA_EXT_AGENT_INFO_MAX_IMAGE_RORW_HANDLES = 0x300A, /** * Maximum number of sampler handlers that can be created for an agent at any one * time. Must be at least 16. The type of this attribute is size_t. */ HSA_EXT_AGENT_INFO_MAX_SAMPLER_HANDLERS = 0x300B, /** * Image pitch alignment. The agent only supports linear image data * layouts with a row pitch that is a multiple of this value. Must be * a power of 2. The type of this attribute is size_t. */ HSA_EXT_AGENT_INFO_IMAGE_LINEAR_ROW_PITCH_ALIGNMENT = 0x300C }; /** * @brief Image handle, populated by ::hsa_ext_image_create or * ::hsa_ext_image_create_with_layout. Image * handles are only unique within an agent, not across agents. * */ typedef struct hsa_ext_image_s { /** * Opaque handle. For a given agent, two handles reference the same object of * the enclosing type if and only if they are equal. */ uint64_t handle; } hsa_ext_image_t; /** * @brief Geometry associated with the image. This specifies the * number of image dimensions and whether the image is an image * array. See the Image Geometry section in the HSA * Programming Reference Manual for definitions on each * geometry. The enumeration values match the BRIG type @p * hsa_ext_brig_image_geometry_t. */ typedef enum { /** * One-dimensional image addressed by width coordinate. */ HSA_EXT_IMAGE_GEOMETRY_1D = 0, /** * Two-dimensional image addressed by width and height coordinates. */ HSA_EXT_IMAGE_GEOMETRY_2D = 1, /** * Three-dimensional image addressed by width, height, and depth coordinates. */ HSA_EXT_IMAGE_GEOMETRY_3D = 2, /** * Array of one-dimensional images with the same size and format. 1D arrays * are addressed by width and index coordinate. */ HSA_EXT_IMAGE_GEOMETRY_1DA = 3, /** * Array of two-dimensional images with the same size and format. 2D arrays * are addressed by width, height, and index coordinates. */ HSA_EXT_IMAGE_GEOMETRY_2DA = 4, /** * One-dimensional image addressed by width coordinate. It has * specific restrictions compared to ::HSA_EXT_IMAGE_GEOMETRY_1D. An * image with an opaque image data layout will always use a linear * image data layout, and one with an explicit image data layout * must specify ::HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR. */ HSA_EXT_IMAGE_GEOMETRY_1DB = 5, /** * Two-dimensional depth image addressed by width and height coordinates. */ HSA_EXT_IMAGE_GEOMETRY_2DDEPTH = 6, /** * Array of two-dimensional depth images with the same size and format. 2D * arrays are addressed by width, height, and index coordinates. */ HSA_EXT_IMAGE_GEOMETRY_2DADEPTH = 7 } hsa_ext_image_geometry_t; /** * @brief Channel type associated with the elements of an image. See * the Channel Type section in the HSA Programming Reference * Manual for definitions on each channel type. The * enumeration values and definition match the BRIG type @p * hsa_ext_brig_image_channel_type_t. */ typedef enum { HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT8 = 0, HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT16 = 1, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8 = 2, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT16 = 3, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24 = 4, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555 = 5, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565 = 6, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010 = 7, HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT8 = 8, HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT16 = 9, HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT32 = 10, HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8 = 11, HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16 = 12, HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 = 13, HSA_EXT_IMAGE_CHANNEL_TYPE_HALF_FLOAT = 14, HSA_EXT_IMAGE_CHANNEL_TYPE_FLOAT = 15 } hsa_ext_image_channel_type_t; /** * @brief A fixed-size type used to represent ::hsa_ext_image_channel_type_t constants. */ typedef uint32_t hsa_ext_image_channel_type32_t; /** * * @brief Channel order associated with the elements of an image. See * the Channel Order section in the HSA Programming Reference * Manual for definitions on each channel order. The * enumeration values match the BRIG type @p * hsa_ext_brig_image_channel_order_t. */ typedef enum { HSA_EXT_IMAGE_CHANNEL_ORDER_A = 0, HSA_EXT_IMAGE_CHANNEL_ORDER_R = 1, HSA_EXT_IMAGE_CHANNEL_ORDER_RX = 2, HSA_EXT_IMAGE_CHANNEL_ORDER_RG = 3, HSA_EXT_IMAGE_CHANNEL_ORDER_RGX = 4, HSA_EXT_IMAGE_CHANNEL_ORDER_RA = 5, HSA_EXT_IMAGE_CHANNEL_ORDER_RGB = 6, HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX = 7, HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA = 8, HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA = 9, HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB = 10, HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR = 11, HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB = 12, HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX = 13, HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA = 14, HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA = 15, HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY = 16, HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE = 17, HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH = 18, HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL = 19 } hsa_ext_image_channel_order_t; /** * @brief A fixed-size type used to represent ::hsa_ext_image_channel_order_t constants. */ typedef uint32_t hsa_ext_image_channel_order32_t; /** * @brief Image format. */ typedef struct hsa_ext_image_format_s { /** * Channel type. */ hsa_ext_image_channel_type32_t channel_type; /** * Channel order. */ hsa_ext_image_channel_order32_t channel_order; } hsa_ext_image_format_t; /** * @brief Implementation independent image descriptor. */ typedef struct hsa_ext_image_descriptor_s { /** * Image geometry. */ hsa_ext_image_geometry_t geometry; /** * Width of the image, in components. */ size_t width; /** * Height of the image, in components. Only used if the geometry is * ::HSA_EXT_IMAGE_GEOMETRY_2D, ::HSA_EXT_IMAGE_GEOMETRY_3D, * HSA_EXT_IMAGE_GEOMETRY_2DA, HSA_EXT_IMAGE_GEOMETRY_2DDEPTH, or * HSA_EXT_IMAGE_GEOMETRY_2DADEPTH, otherwise must be 0. */ size_t height; /** * Depth of the image, in components. Only used if the geometry is * ::HSA_EXT_IMAGE_GEOMETRY_3D, otherwise must be 0. */ size_t depth; /** * Number of image layers in the image array. Only used if the geometry is * ::HSA_EXT_IMAGE_GEOMETRY_1DA, ::HSA_EXT_IMAGE_GEOMETRY_2DA, or * HSA_EXT_IMAGE_GEOMETRY_2DADEPTH, otherwise must be 0. */ size_t array_size; /** * Image format. */ hsa_ext_image_format_t format; } hsa_ext_image_descriptor_t; /** * @brief Image capability. */ typedef enum { /** * Images of this geometry, format, and layout are not supported by * the agent. */ HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED = 0x0, /** * Read-only images of this geometry, format, and layout are * supported by the agent. */ HSA_EXT_IMAGE_CAPABILITY_READ_ONLY = 0x1, /** * Write-only images of this geometry, format, and layout are * supported by the agent. */ HSA_EXT_IMAGE_CAPABILITY_WRITE_ONLY = 0x2, /** * Read-write images of this geometry, format, and layout are * supported by the agent. */ HSA_EXT_IMAGE_CAPABILITY_READ_WRITE = 0x4, /** * @deprecated Images of this geometry, format, and layout can be accessed from * read-modify-write atomic operations in the agent. */ HSA_EXT_IMAGE_CAPABILITY_READ_MODIFY_WRITE = 0x8, /** * Images of this geometry, format, and layout are guaranteed to * have a consistent data layout regardless of how they are * accessed by the associated agent. */ HSA_EXT_IMAGE_CAPABILITY_ACCESS_INVARIANT_DATA_LAYOUT = 0x10 } hsa_ext_image_capability_t; /** * @brief Image data layout. * * @details An image data layout denotes such aspects of image data * layout as tiling and organization of channels in memory. Some image * data layouts may only apply to specific image geometries, formats, * and access permissions. Different agents may support different * image layout identifiers, including vendor specific layouts. Note * that an agent may not support the same image data layout for * different access permissions to images with the same image * geometry, size, and format. If multiple agents support the same * image data layout then it is possible to use separate image handles * for each agent that references the same image data. */ typedef enum { /** * An implementation specific opaque image data layout which can * vary depending on the agent, geometry, image format, image size, * and access permissions. */ HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE = 0x0, /** * The image data layout is specified by the following rules in * ascending byte address order. For a 3D image, 2DA image array, * or 1DA image array, the image data is stored as a linear sequence * of adjacent 2D image slices, 2D images, or 1D images * respectively, spaced according to the slice pitch. Each 2D image * is stored as a linear sequence of adjacent image rows, spaced * according to the row pitch. Each 1D or 1DB image is stored as a * single image row. Each image row is stored as a linear sequence * of image elements. Each image element is stored as a linear * sequence of image components specified by the left to right * channel order definition. Each image component is stored using * the memory type specified by the channel type. * * The 1DB image geometry always uses the linear image data layout. */ HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR = 0x1 } hsa_ext_image_data_layout_t; /** * @brief Retrieve the supported image capabilities for a given combination of * agent, geometry, and image format for an image created with an opaque image * data layout. * * @param[in] agent Agent to be associated with the image handle. * * @param[in] geometry Geometry. * * @param[in] image_format Pointer to an image format. Must not be NULL. * * @param[out] capability_mask Pointer to a memory location where the HSA * runtime stores a bit-mask of supported image capability * (::hsa_ext_image_capability_t) values. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p image_format is * NULL, or @p capability_mask is NULL. */ hsa_status_t HSA_API hsa_ext_image_get_capability( hsa_agent_t agent, hsa_ext_image_geometry_t geometry, const hsa_ext_image_format_t *image_format, uint32_t *capability_mask); /** * @brief Retrieve the supported image capabilities for a given combination of * agent, geometry, image format, and image layout for an image created with * an explicit image data layout. * * @param[in] agent Agent to be associated with the image handle. * * @param[in] geometry Geometry. * * @param[in] image_format Pointer to an image format. Must not be NULL. * * @param[in] image_data_layout The image data layout. * It is invalid to use ::HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE; use * ::hsa_ext_image_get_capability instead. * * @param[out] capability_mask Pointer to a memory location where the HSA * runtime stores a bit-mask of supported image capability * (::hsa_ext_image_capability_t) values. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p image_format is * NULL, @p image_data_layout is ::HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE, * or @p capability_mask is NULL. */ hsa_status_t HSA_API hsa_ext_image_get_capability_with_layout( hsa_agent_t agent, hsa_ext_image_geometry_t geometry, const hsa_ext_image_format_t *image_format, hsa_ext_image_data_layout_t image_data_layout, uint32_t *capability_mask); /** * @brief Agent specific image size and alignment requirements, populated by * ::hsa_ext_image_data_get_info and ::hsa_ext_image_data_get_info_with_layout. */ typedef struct hsa_ext_image_data_info_s { /** * Image data size, in bytes. */ size_t size; /** * Image data alignment, in bytes. Must always be a power of 2. */ size_t alignment; } hsa_ext_image_data_info_t; /** * @brief Retrieve the image data requirements for a given combination of agent, image * descriptor, and access permission for an image created with an opaque image * data layout. * * @details The optimal image data size and alignment requirements may * vary depending on the image attributes specified in @p * image_descriptor, the @p access_permission, and the @p agent. Also, * different implementations of the HSA runtime may return different * requirements for the same input values. * * The implementation must return the same image data requirements for * different access permissions with matching image descriptors as long * as ::hsa_ext_image_get_capability reports * ::HSA_EXT_IMAGE_CAPABILITY_ACCESS_INVARIANT_DATA_LAYOUT. Image * descriptors match if they have the same values, with the exception * that s-form channel orders match the corresponding non-s-form * channel order and vice versa. * * @param[in] agent Agent to be associated with the image handle. * * @param[in] image_descriptor Pointer to an image descriptor. Must not be NULL. * * @param[in] access_permission Access permission of the image when * accessed by @p agent. The access permission defines how the agent * is allowed to access the image and must match the corresponding * HSAIL image handle type. The @p agent must support the image format * specified in @p image_descriptor for the given @p * access_permission. * * @param[out] image_data_info Memory location where the runtime stores the * size and alignment requirements. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED The @p * agent does not support the image format specified by @p * image_descriptor with the specified @p access_permission. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED The agent * does not support the image dimensions specified by @p * image_descriptor with the specified @p access_permission. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p image_descriptor is NULL, @p * access_permission is not a valid access permission value, or @p * image_data_info is NULL. */ hsa_status_t HSA_API hsa_ext_image_data_get_info( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_info_t *image_data_info); /** * @brief Retrieve the image data requirements for a given combination of * image descriptor, access permission, image data layout, image data row pitch, * and image data slice pitch for an image created with an explicit image * data layout. * * @details The image data size and alignment requirements may vary * depending on the image attributes specified in @p image_descriptor, * the @p access_permission, and the image layout. However, different * implementations of the HSA runtime will return the same * requirements for the same input values. * * The implementation must return the same image data requirements for * different access permissions with matching image descriptors and * matching image layouts as long as ::hsa_ext_image_get_capability * reports * ::HSA_EXT_IMAGE_CAPABILITY_ACCESS_INVARIANT_DATA_LAYOUT. Image * descriptors match if they have the same values, with the exception * that s-form channel orders match the corresponding non-s-form * channel order and vice versa. Image layouts match if they are the * same image data layout and use the same image row and slice pitch * values. * * @param[in] image_descriptor Pointer to an image descriptor. Must not be NULL. * * @param[in] access_permission Access permission of the image when * accessed by an agent. The access permission defines how the agent * is allowed to access the image and must match the corresponding * HSAIL image handle type. * * @param[in] image_data_layout The image data layout to use. * It is invalid to use ::HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE; use * ::hsa_ext_image_data_get_info instead. * * @param[in] image_data_row_pitch The size in bytes for a single row * of the image in the image data. If 0 is specified then the default * row pitch value is used: image width * image element byte size. * The value used must be greater than or equal to the default row * pitch, and be a multiple of the image element byte size. For the * linear image layout it must also be a multiple of the image linear * row pitch alignment for the agents that will access the image data * using image instructions. * * @param[in] image_data_slice_pitch The size in bytes of a single * slice of a 3D image, or the size in bytes of each image layer in an * image array in the image data. If 0 is specified then the default * slice pitch value is used: row pitch * height if geometry is * ::HSA_EXT_IMAGE_GEOMETRY_3D, ::HSA_EXT_IMAGE_GEOMETRY_2DA, or * ::HSA_EXT_IMAGE_GEOMETRY_2DADEPTH; row pitch if geometry is * ::HSA_EXT_IMAGE_GEOMETRY_1DA; and 0 otherwise. The value used must * be 0 if the default slice pitch is 0, be greater than or equal to * the default slice pitch, and be a multiple of the row pitch. * * @param[out] image_data_info Memory location where the runtime stores the * size and alignment requirements. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED The image * format specified by @p image_descriptor is not supported for the * @p access_permission and @p image_data_layout specified. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED The image * dimensions specified by @p image_descriptor are not supported for * the @p access_permission and @p image_data_layout specified. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED The row and * slice pitch specified by @p image_data_row_pitch and @p * image_data_slice_pitch are invalid or not supported. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p image_descriptor is * NULL, @p image_data_layout is ::HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE, * or @p image_data_info is NULL. */ hsa_status_t HSA_API hsa_ext_image_data_get_info_with_layout( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t *image_data_info); /** * @brief Creates an agent specific image handle to an image with an * opaque image data layout. * * @details Images with an opaque image data layout created with * different access permissions but matching image descriptors and * same agent can share the same image data if * ::HSA_EXT_IMAGE_CAPABILITY_ACCESS_INVARIANT_DATA_LAYOUT is reported * by ::hsa_ext_image_get_capability for the image format specified in * the image descriptor. Image descriptors match if they have the same * values, with the exception that s-form channel orders match the * corresponding non-s-form channel order and vice versa. * * If necessary, an application can use image operations (import, * export, copy, clear) to prepare the image for the intended use * regardless of the access permissions. * * @param[in] agent agent to be associated with the image handle created. * * @param[in] image_descriptor Pointer to an image descriptor. Must not be NULL. * * @param[in] image_data Image data buffer that must have been allocated * according to the size and alignment requirements dictated by * ::hsa_ext_image_data_get_info. Must not be NULL. * * Any previous memory contents are preserved upon creation. The application is * responsible for ensuring that the lifetime of the image data exceeds that of * all the associated images. * * @param[in] access_permission Access permission of the image when * accessed by agent. The access permission defines how the agent * is allowed to access the image using the image handle created and * must match the corresponding HSAIL image handle type. The agent * must support the image format specified in @p image_descriptor for * the given @p access_permission. * * @param[out] image Pointer to a memory location where the HSA runtime stores * the newly created image handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED The agent * does not have the capability to support the image format contained * in @p image_descriptor using the specified @p access_permission. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED The agent * does not support the image dimensions specified by @p * image_descriptor using the specified @p access_permission. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * support the creation of more image handles with the given @p access_permission). * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p image_descriptor is NULL, @p * image_data is NULL, @p image_data does not have a valid alignment, * @p access_permission is not a valid access permission * value, or @p image is NULL. */ hsa_status_t HSA_API hsa_ext_image_create( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, const void *image_data, hsa_access_permission_t access_permission, hsa_ext_image_t *image); /** * @brief Creates an agent specific image handle to an image with an explicit * image data layout. * * @details Images with an explicit image data layout created with * different access permissions but matching image descriptors and * matching image layout can share the same image data if * ::HSA_EXT_IMAGE_CAPABILITY_ACCESS_INVARIANT_DATA_LAYOUT is reported * by ::hsa_ext_image_get_capability_with_layout for the image format * specified in the image descriptor and specified image data * layout. Image descriptors match if they have the same values, with * the exception that s-form channel orders match the corresponding * non-s-form channel order and vice versa. Image layouts match if * they are the same image data layout and use the same image row and * slice values. * * If necessary, an application can use image operations (import, export, copy, * clear) to prepare the image for the intended use regardless of the access * permissions. * * @param[in] agent agent to be associated with the image handle created. * * @param[in] image_descriptor Pointer to an image descriptor. Must not be NULL. * * @param[in] image_data Image data buffer that must have been allocated * according to the size and alignment requirements dictated by * ::hsa_ext_image_data_get_info_with_layout. Must not be NULL. * * Any previous memory contents are preserved upon creation. The application is * responsible for ensuring that the lifetime of the image data exceeds that of * all the associated images. * * @param[in] access_permission Access permission of the image when * accessed by the agent. The access permission defines how the agent * is allowed to access the image and must match the corresponding * HSAIL image handle type. The agent must support the image format * specified in @p image_descriptor for the given @p access_permission * and @p image_data_layout. * * @param[in] image_data_layout The image data layout to use for the * @p image_data. It is invalid to use * ::HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE; use ::hsa_ext_image_create * instead. * * @param[in] image_data_row_pitch The size in bytes for a single row * of the image in the image data. If 0 is specified then the default * row pitch value is used: image width * image element byte size. * The value used must be greater than or equal to the default row * pitch, and be a multiple of the image element byte size. For the * linear image layout it must also be a multiple of the image linear * row pitch alignment for the agents that will access the image data * using image instructions. * * @param[in] image_data_slice_pitch The size in bytes of a single * slice of a 3D image, or the size in bytes of each image layer in an * image array in the image data. If 0 is specified then the default * slice pitch value is used: row pitch * height if geometry is * ::HSA_EXT_IMAGE_GEOMETRY_3D, ::HSA_EXT_IMAGE_GEOMETRY_2DA, or * ::HSA_EXT_IMAGE_GEOMETRY_2DADEPTH; row pitch if geometry is * ::HSA_EXT_IMAGE_GEOMETRY_1DA; and 0 otherwise. The value used must * be 0 if the default slice pitch is 0, be greater than or equal to * the default slice pitch, and be a multiple of the row pitch. * * @param[out] image Pointer to a memory location where the HSA runtime stores * the newly created image handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED The agent does * not have the capability to support the image format contained in the image * descriptor using the specified @p access_permission and @p image_data_layout. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED The agent * does not support the image dimensions specified by @p * image_descriptor using the specified @p access_permission and @p * image_data_layout. * * @retval ::HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED The agent does * not support the row and slice pitch specified by @p image_data_row_pitch * and @p image_data_slice_pitch, or the values are invalid. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * support the creation of more image handles with the given @p access_permission). * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p image_descriptor is NULL, @p * image_data is NULL, @p image_data does not have a valid alignment, * @p image_data_layout is ::HSA_EXT_IMAGE_DATA_LAYOUT_OPAQUE, * or @p image is NULL. */ hsa_status_t HSA_API hsa_ext_image_create_with_layout( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, const void *image_data, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_t *image); /** * @brief Destroy an image handle previously created using ::hsa_ext_image_create or * ::hsa_ext_image_create_with_layout. * * @details Destroying the image handle does not free the associated image data, * or modify its contents. The application should not destroy an image handle while * there are references to it queued for execution or currently being used in a * kernel dispatch. * * @param[in] agent Agent associated with the image handle. * * @param[in] image Image handle to destroy. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. */ hsa_status_t HSA_API hsa_ext_image_destroy( hsa_agent_t agent, hsa_ext_image_t image); /** * @brief Copies a portion of one image (the source) to another image (the * destination). * * @details The source and destination image formats should be the * same, with the exception that s-form channel orders match the * corresponding non-s-form channel order and vice versa. For example, * it is allowed to copy a source image with a channel order of * HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB to a destination image with a * channel order of HSA_EXT_IMAGE_CHANNEL_ORDER_RGB. * * The source and destination images do not have to be of the same geometry and * appropriate scaling is performed by the HSA runtime. It is possible to copy * subregions between any combinations of source and destination geometries, provided * that the dimensions of the subregions are the same. For example, it is * allowed to copy a rectangular region from a 2D image to a slice of a 3D * image. * * If the source and destination image data overlap, or the combination of * offset and range references an out-out-bounds element in any of the images, * the behavior is undefined. * * @param[in] agent Agent associated with both the source and destination image handles. * * @param[in] src_image Image handle of source image. The agent associated with the source * image handle must be identical to that of the destination image. * * @param[in] src_offset Pointer to the offset within the source image where to * copy the data from. Must not be NULL. * * @param[in] dst_image Image handle of destination image. * * @param[in] dst_offset Pointer to the offset within the destination * image where to copy the data. Must not be NULL. * * @param[in] range Dimensions of the image portion to be copied. The HSA * runtime computes the size of the image data to be copied using this * argument. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p src_offset is * NULL, @p dst_offset is NULL, or @p range is NULL. */ hsa_status_t HSA_API hsa_ext_image_copy( hsa_agent_t agent, hsa_ext_image_t src_image, const hsa_dim3_t* src_offset, hsa_ext_image_t dst_image, const hsa_dim3_t* dst_offset, const hsa_dim3_t* range); /** * @brief Image region. */ typedef struct hsa_ext_image_region_s { /** * Offset within an image (in coordinates). */ hsa_dim3_t offset; /** * Dimension size of the image range (in coordinates). The x, y, and z dimensions * correspond to width, height, and depth or index respectively. */ hsa_dim3_t range; } hsa_ext_image_region_t; /** * @brief Import a linearly organized image data from memory directly to an * image handle. * * @details This operation updates the image data referenced by the image handle * from the source memory. The size of the data imported from memory is * implicitly derived from the image region. * * It is the application's responsibility to avoid out of bounds memory access. * * None of the source memory or destination image data memory can * overlap. Overlapping of any of the source and destination image * data memory within the import operation produces undefined results. * * @param[in] agent Agent associated with the image handle. * * @param[in] src_memory Source memory. Must not be NULL. * * @param[in] src_row_pitch The size in bytes of a single row of the image in the * source memory. If the value is smaller than the destination image region * width * image element byte size, then region width * image element byte * size is used. * * @param[in] src_slice_pitch The size in bytes of a single 2D slice of a 3D image, * or the size in bytes of each image layer in an image array in the source memory. * If the geometry is ::HSA_EXT_IMAGE_GEOMETRY_1DA and the value is smaller than the * value used for @p src_row_pitch, then the value used for @p src_row_pitch is used. * If the geometry is ::HSA_EXT_IMAGE_GEOMETRY_3D, ::HSA_EXT_IMAGE_GEOMETRY_2DA, or * HSA_EXT_IMAGE_GEOMETRY_2DADEPTH and the value is smaller than the value used for * @p src_row_pitch * destination image region height, then the value used for * @p src_row_pitch * destination image region height is used. * Otherwise, the value is not used. * * @param[in] dst_image Image handle of destination image. * * @param[in] image_region Pointer to the image region to be updated. Must not * be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p src_memory is NULL, or @p * image_region is NULL. * */ hsa_status_t HSA_API hsa_ext_image_import( hsa_agent_t agent, const void *src_memory, size_t src_row_pitch, size_t src_slice_pitch, hsa_ext_image_t dst_image, const hsa_ext_image_region_t *image_region); /** * @brief Export the image data to linearly organized memory. * * @details The operation updates the destination memory with the image data of * @p src_image. The size of the data exported to memory is implicitly derived * from the image region. * * It is the application's responsibility to avoid out of bounds memory access. * * None of the destination memory or source image data memory can * overlap. Overlapping of any of the source and destination image * data memory within the export operation produces undefined results. * * @param[in] agent Agent associated with the image handle. * * @param[in] src_image Image handle of source image. * * @param[in] dst_memory Destination memory. Must not be NULL. * * @param[in] dst_row_pitch The size in bytes of a single row of the image in the * destination memory. If the value is smaller than the source image region * width * image element byte size, then region width * image element byte * size is used. * * @param[in] dst_slice_pitch The size in bytes of a single 2D slice of a 3D image, * or the size in bytes of each image in an image array in the destination memory. * If the geometry is ::HSA_EXT_IMAGE_GEOMETRY_1DA and the value is smaller than the * value used for @p dst_row_pitch, then the value used for @p dst_row_pitch is used. * If the geometry is ::HSA_EXT_IMAGE_GEOMETRY_3D, ::HSA_EXT_IMAGE_GEOMETRY_2DA, or * HSA_EXT_IMAGE_GEOMETRY_2DADEPTH and the value is smaller than the value used for * @p dst_row_pitch * source image region height, then the value used for * @p dst_row_pitch * source image region height is used. * Otherwise, the value is not used. * * @param[in] image_region Pointer to the image region to be exported. Must not * be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p dst_memory is NULL, or @p * image_region is NULL. */ hsa_status_t HSA_API hsa_ext_image_export( hsa_agent_t agent, hsa_ext_image_t src_image, void *dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t *image_region); /** * @brief Clear a region of an image so that every image element has * the specified value. * * @param[in] agent Agent associated with the image handle. * * @param[in] image Image handle for image to be cleared. * * @param[in] data The value to which to set each image element being * cleared. It is specified as an array of image component values. The * number of array elements must match the number of access components * for the image channel order. The type of each array element must * match the image access type of the image channel type. When the * value is used to set the value of an image element, the conversion * method corresponding to the image channel type is used. See the * Channel Order section and Channel Type section in * the HSA Programming Reference Manual for more * information. Must not be NULL. * * @param[in] image_region Pointer to the image region to clear. Must not be * NULL. If the region references an out-out-bounds element, the behavior is * undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p data is NULL, or @p * image_region is NULL. */ hsa_status_t HSA_API hsa_ext_image_clear( hsa_agent_t agent, hsa_ext_image_t image, const void* data, const hsa_ext_image_region_t *image_region); /** * @brief Sampler handle. Samplers are populated by * ::hsa_ext_sampler_create. Sampler handles are only unique within an * agent, not across agents. */ typedef struct hsa_ext_sampler_s { /** * Opaque handle. For a given agent, two handles reference the same object of * the enclosing type if and only if they are equal. */ uint64_t handle; } hsa_ext_sampler_t; /** * @brief Sampler address modes. The sampler address mode describes * the processing of out-of-range image coordinates. See the * Addressing Mode section in the HSA Programming Reference * Manual for definitions on each address mode. The values * match the BRIG type @p hsa_ext_brig_sampler_addressing_t. */ typedef enum { /** * Out-of-range coordinates are not handled. */ HSA_EXT_SAMPLER_ADDRESSING_MODE_UNDEFINED = 0, /** * Clamp out-of-range coordinates to the image edge. */ HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_EDGE = 1, /** * Clamp out-of-range coordinates to the image border color. */ HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_BORDER = 2, /** * Wrap out-of-range coordinates back into the valid coordinate * range so the image appears as repeated tiles. */ HSA_EXT_SAMPLER_ADDRESSING_MODE_REPEAT = 3, /** * Mirror out-of-range coordinates back into the valid coordinate * range so the image appears as repeated tiles with every other * tile a reflection. */ HSA_EXT_SAMPLER_ADDRESSING_MODE_MIRRORED_REPEAT = 4 } hsa_ext_sampler_addressing_mode_t; /** * @brief A fixed-size type used to represent ::hsa_ext_sampler_addressing_mode_t constants. */ typedef uint32_t hsa_ext_sampler_addressing_mode32_t; /** * @brief Sampler coordinate normalization modes. See the * Coordinate Normalization Mode section in the HSA * Programming Reference Manual for definitions on each * coordinate normalization mode. The values match the BRIG type @p * hsa_ext_brig_sampler_coord_normalization_t. */ typedef enum { /** * Coordinates are used to directly address an image element. */ HSA_EXT_SAMPLER_COORDINATE_MODE_UNNORMALIZED = 0, /** * Coordinates are scaled by the image dimension size before being * used to address an image element. */ HSA_EXT_SAMPLER_COORDINATE_MODE_NORMALIZED = 1 } hsa_ext_sampler_coordinate_mode_t; /** * @brief A fixed-size type used to represent ::hsa_ext_sampler_coordinate_mode_t constants. */ typedef uint32_t hsa_ext_sampler_coordinate_mode32_t; /** * @brief Sampler filter modes. See the Filter Mode section * in the HSA Programming Reference Manual for definitions * on each address mode. The enumeration values match the BRIG type @p * hsa_ext_brig_sampler_filter_t. */ typedef enum { /** * Filter to the image element nearest (in Manhattan distance) to the * specified coordinate. */ HSA_EXT_SAMPLER_FILTER_MODE_NEAREST = 0, /** * Filter to the image element calculated by combining the elements in a 2x2 * square block or 2x2x2 cube block around the specified coordinate. The * elements are combined using linear interpolation. */ HSA_EXT_SAMPLER_FILTER_MODE_LINEAR = 1 } hsa_ext_sampler_filter_mode_t; /** * @brief A fixed-size type used to represent ::hsa_ext_sampler_filter_mode_t constants. */ typedef uint32_t hsa_ext_sampler_filter_mode32_t; /** * @brief Implementation independent sampler descriptor. */ typedef struct hsa_ext_sampler_descriptor_s { /** * Sampler coordinate mode describes the normalization of image coordinates. */ hsa_ext_sampler_coordinate_mode32_t coordinate_mode; /** * Sampler filter type describes the type of sampling performed. */ hsa_ext_sampler_filter_mode32_t filter_mode; /** * Sampler address mode describes the processing of out-of-range image * coordinates. */ hsa_ext_sampler_addressing_mode32_t address_mode; } hsa_ext_sampler_descriptor_t; /** * @brief Create an agent specific sampler handle for a given agent * independent sampler descriptor and agent. * * @param[in] agent Agent to be associated with the sampler handle created. * * @param[in] sampler_descriptor Pointer to a sampler descriptor. Must not be * NULL. * * @param[out] sampler Memory location where the HSA runtime stores the newly * created sampler handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. * * @retval ::HSA_EXT_STATUS_ERROR_SAMPLER_DESCRIPTOR_UNSUPPORTED The * @p agent does not have the capability to support the properties * specified by @p sampler_descriptor or it is invalid. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to allocate * the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p sampler_descriptor is NULL, or * @p sampler is NULL. */ hsa_status_t HSA_API hsa_ext_sampler_create( hsa_agent_t agent, const hsa_ext_sampler_descriptor_t *sampler_descriptor, hsa_ext_sampler_t *sampler); /** * @brief Destroy a sampler handle previously created using ::hsa_ext_sampler_create. * * @details The sampler handle should not be destroyed while there are * references to it queued for execution or currently being used in a * kernel dispatch. * * @param[in] agent Agent associated with the sampler handle. * * @param[in] sampler Sampler handle to destroy. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_AGENT The agent is invalid. */ hsa_status_t HSA_API hsa_ext_sampler_destroy( hsa_agent_t agent, hsa_ext_sampler_t sampler); #define hsa_ext_images_1_00 /** * @brief The function pointer table for the images v1.00 extension. Can be returned by ::hsa_system_get_extension_table or ::hsa_system_get_major_extension_table. */ typedef struct hsa_ext_images_1_00_pfn_s { hsa_status_t (*hsa_ext_image_get_capability)( hsa_agent_t agent, hsa_ext_image_geometry_t geometry, const hsa_ext_image_format_t *image_format, uint32_t *capability_mask); hsa_status_t (*hsa_ext_image_data_get_info)( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_info_t *image_data_info); hsa_status_t (*hsa_ext_image_create)( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, const void *image_data, hsa_access_permission_t access_permission, hsa_ext_image_t *image); hsa_status_t (*hsa_ext_image_destroy)( hsa_agent_t agent, hsa_ext_image_t image); hsa_status_t (*hsa_ext_image_copy)( hsa_agent_t agent, hsa_ext_image_t src_image, const hsa_dim3_t* src_offset, hsa_ext_image_t dst_image, const hsa_dim3_t* dst_offset, const hsa_dim3_t* range); hsa_status_t (*hsa_ext_image_import)( hsa_agent_t agent, const void *src_memory, size_t src_row_pitch, size_t src_slice_pitch, hsa_ext_image_t dst_image, const hsa_ext_image_region_t *image_region); hsa_status_t (*hsa_ext_image_export)( hsa_agent_t agent, hsa_ext_image_t src_image, void *dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t *image_region); hsa_status_t (*hsa_ext_image_clear)( hsa_agent_t agent, hsa_ext_image_t image, const void* data, const hsa_ext_image_region_t *image_region); hsa_status_t (*hsa_ext_sampler_create)( hsa_agent_t agent, const hsa_ext_sampler_descriptor_t *sampler_descriptor, hsa_ext_sampler_t *sampler); hsa_status_t (*hsa_ext_sampler_destroy)( hsa_agent_t agent, hsa_ext_sampler_t sampler); } hsa_ext_images_1_00_pfn_t; #define hsa_ext_images_1 /** * @brief The function pointer table for the images v1 extension. Can be returned by ::hsa_system_get_extension_table or ::hsa_system_get_major_extension_table. */ typedef struct hsa_ext_images_1_pfn_s { hsa_status_t (*hsa_ext_image_get_capability)( hsa_agent_t agent, hsa_ext_image_geometry_t geometry, const hsa_ext_image_format_t *image_format, uint32_t *capability_mask); hsa_status_t (*hsa_ext_image_data_get_info)( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_info_t *image_data_info); hsa_status_t (*hsa_ext_image_create)( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, const void *image_data, hsa_access_permission_t access_permission, hsa_ext_image_t *image); hsa_status_t (*hsa_ext_image_destroy)( hsa_agent_t agent, hsa_ext_image_t image); hsa_status_t (*hsa_ext_image_copy)( hsa_agent_t agent, hsa_ext_image_t src_image, const hsa_dim3_t* src_offset, hsa_ext_image_t dst_image, const hsa_dim3_t* dst_offset, const hsa_dim3_t* range); hsa_status_t (*hsa_ext_image_import)( hsa_agent_t agent, const void *src_memory, size_t src_row_pitch, size_t src_slice_pitch, hsa_ext_image_t dst_image, const hsa_ext_image_region_t *image_region); hsa_status_t (*hsa_ext_image_export)( hsa_agent_t agent, hsa_ext_image_t src_image, void *dst_memory, size_t dst_row_pitch, size_t dst_slice_pitch, const hsa_ext_image_region_t *image_region); hsa_status_t (*hsa_ext_image_clear)( hsa_agent_t agent, hsa_ext_image_t image, const void* data, const hsa_ext_image_region_t *image_region); hsa_status_t (*hsa_ext_sampler_create)( hsa_agent_t agent, const hsa_ext_sampler_descriptor_t *sampler_descriptor, hsa_ext_sampler_t *sampler); hsa_status_t (*hsa_ext_sampler_destroy)( hsa_agent_t agent, hsa_ext_sampler_t sampler); hsa_status_t (*hsa_ext_image_get_capability_with_layout)( hsa_agent_t agent, hsa_ext_image_geometry_t geometry, const hsa_ext_image_format_t *image_format, hsa_ext_image_data_layout_t image_data_layout, uint32_t *capability_mask); hsa_status_t (*hsa_ext_image_data_get_info_with_layout)( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_data_info_t *image_data_info); hsa_status_t (*hsa_ext_image_create_with_layout)( hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor, const void *image_data, hsa_access_permission_t access_permission, hsa_ext_image_data_layout_t image_data_layout, size_t image_data_row_pitch, size_t image_data_slice_pitch, hsa_ext_image_t *image); } hsa_ext_images_1_pfn_t; /** @} */ #ifdef __cplusplus } // end extern "C" block #endif /*__cplusplus*/ #endif ROCR-Runtime-rocm-5.0.0/src/inc/hsa_ven_amd_aqlprofile.h000066400000000000000000000366151420110115200230310ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2017-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef OPENSRC_HSA_RUNTIME_INC_HSA_VEN_AMD_AQLPROFILE_H_ #define OPENSRC_HSA_RUNTIME_INC_HSA_VEN_AMD_AQLPROFILE_H_ #include #include "hsa.h" #define HSA_AQLPROFILE_VERSION_MAJOR 2 #define HSA_AQLPROFILE_VERSION_MINOR 0 #ifdef __cplusplus extern "C" { #endif // __cplusplus //////////////////////////////////////////////////////////////////////////////// // Library version uint32_t hsa_ven_amd_aqlprofile_version_major(); uint32_t hsa_ven_amd_aqlprofile_version_minor(); /////////////////////////////////////////////////////////////////////// // Library API: // The library provides helper methods for instantiation of // the profile context object and for populating of the start // and stop AQL packets. The profile object contains a profiling // events list and needed for profiling buffers descriptors, // a command buffer and an output data buffer. To check if there // was an error the library methods return a status code. Also // the library provides methods for querying required buffers // attributes, to validate the event attributes and to get profiling // output data. // // Returned status: // hsa_status_t – HSA status codes are used from hsa.h header // // Supported profiling features: // // Supported profiling events typedef enum { HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC = 0, HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_TRACE = 1, } hsa_ven_amd_aqlprofile_event_type_t; // Supported performance counters (PMC) blocks // The block ID is the same for a block instances set, for example // each block instance from the TCC block set, TCC0, TCC1, …, TCCN // will have the same block ID HSA_VEN_AMD_AQLPROFILE_BLOCKS_TCC. typedef enum { HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_CPC = 0, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_CPF = 1, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GDS = 2, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GRBM = 3, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GRBMSE = 4, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SPI = 5, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SQ = 6, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SQCS = 7, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SRBM = 8, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SX = 9, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_TA = 10, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_TCA = 11, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_TCC = 12, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_TCP = 13, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_TD = 14, // Memory related blocks HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_MCARB = 15, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_MCHUB = 16, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_MCMCBVM = 17, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_MCSEQ = 18, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_MCVML2 = 19, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_MCXBAR = 20, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_ATC = 21, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_ATCL2 = 22, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GCEA = 23, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_RPB = 24, // System blocks HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SDMA = 25, // GFX10 added blocks HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GL1A = 26, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GL1C = 27, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GL2A = 28, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GL2C = 29, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GCR = 30, HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_GUS = 31, HSA_VEN_AMD_AQLPROFILE_BLOCKS_NUMBER } hsa_ven_amd_aqlprofile_block_name_t; // PMC event object structure // ‘counter_id’ value is specified in GFXIPs perfcounter user guides // which is the counters select value, “Performance Counters Selection” // chapter. typedef struct { hsa_ven_amd_aqlprofile_block_name_t block_name; uint32_t block_index; uint32_t counter_id; } hsa_ven_amd_aqlprofile_event_t; // Check if event is valid for the specific GPU hsa_status_t hsa_ven_amd_aqlprofile_validate_event( hsa_agent_t agent, // HSA handle for the profiling GPU const hsa_ven_amd_aqlprofile_event_t* event, // [in] Pointer on validated event bool* result); // [out] True if the event valid, False otherwise // Profiling parameters // All parameters are generic and if not applicable for a specific // profile configuration then error status will be returned. typedef enum { // Trace applicable parameters HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_COMPUTE_UNIT_TARGET = 0, HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_VM_ID_MASK = 1, HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_MASK = 2, HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_TOKEN_MASK = 3, HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_TOKEN_MASK2 = 4, HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_SE_MASK = 5, HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_SAMPLE_RATE = 6, HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_K_CONCURRENT = 7, } hsa_ven_amd_aqlprofile_parameter_name_t; // Profile parameter object typedef struct { hsa_ven_amd_aqlprofile_parameter_name_t parameter_name; uint32_t value; } hsa_ven_amd_aqlprofile_parameter_t; // // Profile context object: // The library provides a profile object structure which contains // the events array, a buffer for the profiling start/stop commands // and a buffer for the output data. // The buffers are specified by the buffer descriptors and allocated // by the application. The buffers allocation attributes, the command // buffer size, the PMC output buffer size as well as profiling output // data can be get using the generic get profile info helper _get_info. // // Buffer descriptor typedef struct { void* ptr; uint32_t size; } hsa_ven_amd_aqlprofile_descriptor_t; // Profile context object structure, contains profiling events list and // needed for profiling buffers descriptors, a command buffer and // an output data buffer typedef struct { hsa_agent_t agent; // GFXIP handle hsa_ven_amd_aqlprofile_event_type_t type; // Events type const hsa_ven_amd_aqlprofile_event_t* events; // Events array uint32_t event_count; // Events count const hsa_ven_amd_aqlprofile_parameter_t* parameters; // Parameters array uint32_t parameter_count; // Parameters count hsa_ven_amd_aqlprofile_descriptor_t output_buffer; // Output buffer hsa_ven_amd_aqlprofile_descriptor_t command_buffer; // PM4 commands } hsa_ven_amd_aqlprofile_profile_t; // // AQL packets populating methods: // The helper methods to populate provided by the application START and // STOP AQL packets which the application is required to submit before and // after profiled GPU task packets respectively. // // AQL Vendor Specific packet which carries a PM4 command typedef struct { uint16_t header; uint16_t pm4_command[27]; hsa_signal_t completion_signal; } hsa_ext_amd_aql_pm4_packet_t; // Method to populate the provided AQL packet with profiling start commands // Only 'pm4_command' fields of the packet are set and the application // is responsible to set Vendor Specific header type a completion signal hsa_status_t hsa_ven_amd_aqlprofile_start( hsa_ven_amd_aqlprofile_profile_t* profile, // [in/out] profile contex object hsa_ext_amd_aql_pm4_packet_t* aql_start_packet); // [out] profile start AQL packet // Method to populate the provided AQL packet with profiling stop commands // Only 'pm4_command' fields of the packet are set and the application // is responsible to set Vendor Specific header type and a completion signal hsa_status_t hsa_ven_amd_aqlprofile_stop( const hsa_ven_amd_aqlprofile_profile_t* profile, // [in] profile contex object hsa_ext_amd_aql_pm4_packet_t* aql_stop_packet); // [out] profile stop AQL packet // Method to populate the provided AQL packet with profiling read commands // Only 'pm4_command' fields of the packet are set and the application // is responsible to set Vendor Specific header type and a completion signal hsa_status_t hsa_ven_amd_aqlprofile_read( const hsa_ven_amd_aqlprofile_profile_t* profile, // [in] profile contex object hsa_ext_amd_aql_pm4_packet_t* aql_read_packet); // [out] profile stop AQL packet // Legacy devices, PM4 profiling packet size const unsigned HSA_VEN_AMD_AQLPROFILE_LEGACY_PM4_PACKET_SIZE = 192; // Legacy devices, converting the profiling AQL packet to PM4 packet blob hsa_status_t hsa_ven_amd_aqlprofile_legacy_get_pm4( const hsa_ext_amd_aql_pm4_packet_t* aql_packet, // [in] AQL packet void* data); // [out] PM4 packet blob // // Get profile info: // Generic method for getting various profile info including profile buffers // attributes like the command buffer size and the profiling PMC results. // It’s implied that all counters are 64bit values. // // Profile generic output data: typedef struct { uint32_t sample_id; // PMC sample or trace buffer index union { struct { hsa_ven_amd_aqlprofile_event_t event; // PMC event uint64_t result; // PMC result } pmc_data; hsa_ven_amd_aqlprofile_descriptor_t trace_data; // Trace output data descriptor }; } hsa_ven_amd_aqlprofile_info_data_t; // ID query type typedef struct { const char* name; uint32_t id; uint32_t instance_count; } hsa_ven_amd_aqlprofile_id_query_t; // Profile attributes typedef enum { HSA_VEN_AMD_AQLPROFILE_INFO_COMMAND_BUFFER_SIZE = 0, // get_info returns uint32_t value HSA_VEN_AMD_AQLPROFILE_INFO_PMC_DATA_SIZE = 1, // get_info returns uint32_t value HSA_VEN_AMD_AQLPROFILE_INFO_PMC_DATA = 2, // get_info returns PMC uint64_t value // in info_data object HSA_VEN_AMD_AQLPROFILE_INFO_TRACE_DATA = 3, // get_info returns trace buffer ptr/size // in info_data object // HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS = 4, // get_info returns number of block counter HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_ID = 5, // get_info returns block id, instances // by name string using _id_query_t // HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD = 6, // get_info returns size/pointer for // counters enable command buffer HSA_VEN_AMD_AQLPROFILE_INFO_DISABLE_CMD = 7, // get_info returns size/pointer for // counters disable command buffer } hsa_ven_amd_aqlprofile_info_type_t; // Definition of output data iterator callback typedef hsa_status_t (*hsa_ven_amd_aqlprofile_data_callback_t)( hsa_ven_amd_aqlprofile_info_type_t info_type, // [in] data type, PMC or trace data hsa_ven_amd_aqlprofile_info_data_t* info_data, // [in] info_data object void* callback_data); // [in/out] data passed to the callback // Method for getting the profile info hsa_status_t hsa_ven_amd_aqlprofile_get_info( const hsa_ven_amd_aqlprofile_profile_t* profile, // [in] profile context object hsa_ven_amd_aqlprofile_info_type_t attribute, // [in] requested profile attribute void* value); // [in/out] returned value // Method for iterating the events output data hsa_status_t hsa_ven_amd_aqlprofile_iterate_data( const hsa_ven_amd_aqlprofile_profile_t* profile, // [in] profile context object hsa_ven_amd_aqlprofile_data_callback_t callback, // [in] callback to iterate the output data void* data); // [in/out] data passed to the callback // Return error string hsa_status_t hsa_ven_amd_aqlprofile_error_string( const char** str); // [out] pointer on the error string /** * @brief Extension version. */ #define hsa_ven_amd_aqlprofile_VERSION_MAJOR 1 #define hsa_ven_amd_aqlprofile_LIB(suff) "libhsa-amd-aqlprofile" suff ".so" #ifdef HSA_LARGE_MODEL static const char kAqlProfileLib[] = hsa_ven_amd_aqlprofile_LIB("64"); #else static const char kAqlProfileLib[] = hsa_ven_amd_aqlprofile_LIB(""); #endif /** * @brief Extension function table. */ typedef struct hsa_ven_amd_aqlprofile_1_00_pfn_s { uint32_t (*hsa_ven_amd_aqlprofile_version_major)(); uint32_t (*hsa_ven_amd_aqlprofile_version_minor)(); hsa_status_t (*hsa_ven_amd_aqlprofile_error_string)( const char** str); hsa_status_t (*hsa_ven_amd_aqlprofile_validate_event)( hsa_agent_t agent, const hsa_ven_amd_aqlprofile_event_t* event, bool* result); hsa_status_t (*hsa_ven_amd_aqlprofile_start)( hsa_ven_amd_aqlprofile_profile_t* profile, hsa_ext_amd_aql_pm4_packet_t* aql_start_packet); hsa_status_t (*hsa_ven_amd_aqlprofile_stop)( const hsa_ven_amd_aqlprofile_profile_t* profile, hsa_ext_amd_aql_pm4_packet_t* aql_stop_packet); hsa_status_t (*hsa_ven_amd_aqlprofile_read)( const hsa_ven_amd_aqlprofile_profile_t* profile, hsa_ext_amd_aql_pm4_packet_t* aql_read_packet); hsa_status_t (*hsa_ven_amd_aqlprofile_legacy_get_pm4)( const hsa_ext_amd_aql_pm4_packet_t* aql_packet, void* data); hsa_status_t (*hsa_ven_amd_aqlprofile_get_info)( const hsa_ven_amd_aqlprofile_profile_t* profile, hsa_ven_amd_aqlprofile_info_type_t attribute, void* value); hsa_status_t (*hsa_ven_amd_aqlprofile_iterate_data)( const hsa_ven_amd_aqlprofile_profile_t* profile, hsa_ven_amd_aqlprofile_data_callback_t callback, void* data); } hsa_ven_amd_aqlprofile_1_00_pfn_t; typedef hsa_ven_amd_aqlprofile_1_00_pfn_t hsa_ven_amd_aqlprofile_pfn_t; #ifdef __cplusplus } #endif // __cplusplus #endif // OPENSRC_HSA_RUNTIME_INC_HSA_VEN_AMD_AQLPROFILE_H_ ROCR-Runtime-rocm-5.0.0/src/inc/hsa_ven_amd_loader.h000066400000000000000000000630421420110115200221330ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// // HSA AMD extension for additional loader functionality. #ifndef HSA_VEN_AMD_LOADER_H #define HSA_VEN_AMD_LOADER_H #include "hsa.h" #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ /** * @brief Queries equivalent host address for given @p device_address, and * records it in @p host_address. * * * @details Contents of memory pointed to by @p host_address would be identical * to contents of memory pointed to by @p device_address. Only difference * between the two is host accessibility: @p host_address is always accessible * from host, @p device_address might not be accessible from host. * * If @p device_address already points to host accessible memory, then the value * of @p device_address is simply copied into @p host_address. * * The lifetime of @p host_address is the same as the lifetime of @p * device_address, and both lifetimes are limited by the lifetime of the * executable that is managing these addresses. * * * @param[in] device_address Device address to query equivalent host address * for. * * @param[out] host_address Pointer to application-allocated buffer to record * queried equivalent host address in. * * * @retval HSA_STATUS_SUCCESS Function is executed successfully. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED Runtime is not initialized. * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p device_address is invalid or * null, or @p host_address is null. */ hsa_status_t hsa_ven_amd_loader_query_host_address( const void *device_address, const void **host_address); /** * @brief The storage type of the code object that is backing loaded memory * segment. */ typedef enum { /** * Loaded memory segment is not backed by any code object (anonymous), as the * case would be with BSS (uninitialized data). */ HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE = 0, /** * Loaded memory segment is backed by the code object that is stored in the * file. */ HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE = 1, /** * Loaded memory segment is backed by the code object that is stored in the * memory. */ HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY = 2 } hsa_ven_amd_loader_code_object_storage_type_t; /** * @brief Loaded memory segment descriptor. * * * @details Loaded memory segment descriptor describes underlying loaded memory * segment. Loaded memory segment is created/allocated by the executable during * the loading of the code object that is backing underlying memory segment. * * The lifetime of underlying memory segment is limited by the lifetime of the * executable that is managing underlying memory segment. */ typedef struct hsa_ven_amd_loader_segment_descriptor_s { /** * Agent underlying memory segment is allocated on. If the code object that is * backing underlying memory segment is program code object, then 0. */ hsa_agent_t agent; /** * Executable that is managing this underlying memory segment. */ hsa_executable_t executable; /** * Storage type of the code object that is backing underlying memory segment. */ hsa_ven_amd_loader_code_object_storage_type_t code_object_storage_type; /** * If the storage type of the code object that is backing underlying memory * segment is: * - HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE, then null; * - HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE, then null-terminated * filepath to the code object; * - HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY, then host * accessible pointer to the first byte of the code object. */ const void *code_object_storage_base; /** * If the storage type of the code object that is backing underlying memory * segment is: * - HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE, then 0; * - HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE, then the length of * the filepath to the code object (including null-terminating character); * - HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY, then the size, in * bytes, of the memory occupied by the code object. */ size_t code_object_storage_size; /** * If the storage type of the code object that is backing underlying memory * segment is: * - HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE, then 0; * - other, then offset, in bytes, from the beginning of the code object to * the first byte in the code object data is copied from. */ size_t code_object_storage_offset; /** * Starting address of the underlying memory segment. */ const void *segment_base; /** * Size, in bytes, of the underlying memory segment. */ size_t segment_size; } hsa_ven_amd_loader_segment_descriptor_t; /** * @brief Either queries loaded memory segment descriptors, or total number of * loaded memory segment descriptors. * * * @details If @p segment_descriptors is not null and @p num_segment_descriptors * points to number that exactly matches total number of loaded memory segment * descriptors, then queries loaded memory segment descriptors, and records them * in @p segment_descriptors. If @p segment_descriptors is null and @p * num_segment_descriptors points to zero, then queries total number of loaded * memory segment descriptors, and records it in @p num_segment_descriptors. In * all other cases returns appropriate error code (see below). * * The caller of this function is responsible for the allocation/deallocation * and the lifetime of @p segment_descriptors and @p num_segment_descriptors. * * The lifetime of loaded memory segments that are described by queried loaded * memory segment descriptors is limited by the lifetime of the executable that * is managing loaded memory segments. * * Queried loaded memory segment descriptors are always self-consistent: they * describe a complete set of loaded memory segments that are being backed by * fully loaded code objects that are present at the time (i.e. this function * is blocked until all executable manipulations are fully complete). * * * @param[out] segment_descriptors Pointer to application-allocated buffer to * record queried loaded memory segment descriptors in. Can be null if @p * num_segment_descriptors points to zero. * * @param[in,out] num_segment_descriptors Pointer to application-allocated * buffer that contains either total number of loaded memory segment descriptors * or zero. * * * @retval HSA_STATUS_SUCCESS Function is executed successfully. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED Runtime is not initialized. * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p segment_descriptors is null * while @p num_segment_descriptors points to non-zero number, @p * segment_descriptors is not null while @p num_segment_descriptors points to * zero, or @p num_segment_descriptors is null. * * @retval HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS @p num_segment_descriptors * does not point to number that exactly matches total number of loaded memory * segment descriptors. */ hsa_status_t hsa_ven_amd_loader_query_segment_descriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors); /** * @brief Obtains the handle of executable to which the device address belongs. * * @details This method should not be used to obtain executable handle by using * a host address. The executable returned is expected to be alive until its * destroyed by the user. * * @retval HSA_STATUS_SUCCESS Function is executed successfully. * * @retval HSA_STATUS_ERROR_NOT_INITIALIZED Runtime is not initialized. * * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT The input is invalid or there * is no exectuable found for this kernel code object. */ hsa_status_t hsa_ven_amd_loader_query_executable( const void *device_address, hsa_executable_t *executable); //===----------------------------------------------------------------------===// /** * @brief Iterate over the loaded code objects in an executable, and invoke * an application-defined callback on every iteration. * * @param[in] executable Executable. * * @param[in] callback Callback to be invoked once per loaded code object. The * HSA runtime passes three arguments to the callback: the executable, a * loaded code object, and the application data. If @p callback returns a * status other than ::HSA_STATUS_SUCCESS for a particular iteration, the * traversal stops and * ::hsa_ven_amd_loader_executable_iterate_loaded_code_objects returns that * status value. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_EXECUTABLE The executable is invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t hsa_ven_amd_loader_executable_iterate_loaded_code_objects( hsa_executable_t executable, hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data); /** * @brief Loaded code object kind. */ typedef enum { /** * Program code object. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_KIND_PROGRAM = 1, /** * Agent code object. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_KIND_AGENT = 2 } hsa_ven_amd_loader_loaded_code_object_kind_t; /** * @brief Loaded code object attributes. */ typedef enum hsa_ven_amd_loader_loaded_code_object_info_e { /** * The executable in which this loaded code object is loaded. The * type of this attribute is ::hsa_executable_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_EXECUTABLE = 1, /** * The kind of this loaded code object. The type of this attribute is * ::uint32_t interpreted as ::hsa_ven_amd_loader_loaded_code_object_kind_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_KIND = 2, /** * The agent on which this loaded code object is loaded. The * value of this attribute is only defined if * ::HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_KIND is * ::HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_KIND_AGENT. The type of this * attribute is ::hsa_agent_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_AGENT = 3, /** * The storage type of the code object reader used to load the loaded code object. * The type of this attribute is ::uint32_t interpreted as a * ::hsa_ven_amd_loader_code_object_storage_type_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE = 4, /** * The memory address of the first byte of the code object that was loaaded. * The value of this attribute is only defined if * ::HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE is * ::HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY. The type of this * attribute is ::uint64_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE = 5, /** * The memory size in bytes of the code object that was loaaded. * The value of this attribute is only defined if * ::HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE is * ::HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY. The type of this * attribute is ::uint64_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE = 6, /** * The file descriptor of the code object that was loaaded. * The value of this attribute is only defined if * ::HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE is * ::HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE. The type of this * attribute is ::int. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE = 7, /** * The signed byte address difference of the memory address at which the code * object is loaded minus the virtual address specified in the code object * that is loaded. The value of this attribute is only defined if the * executable in which the code object is loaded is froozen. The type of this * attribute is ::int64_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA = 8, /** * The base memory address at which the code object is loaded. This is the * base address of the allocation for the lowest addressed segment of the code * object that is loaded. Note that any non-loaded segments before the first * loaded segment are ignored. The value of this attribute is only defined if * the executable in which the code object is loaded is froozen. The type of * this attribute is ::uint64_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE = 9, /** * The byte size of the loaded code objects contiguous memory allocation. The * value of this attribute is only defined if the executable in which the code * object is loaded is froozen. The type of this attribute is ::uint64_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE = 10, /** * The length of the URI in bytes, not including the NUL terminator. The type * of this attribute is uint32_t. */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH = 11, /** * The URI name from which the code object was loaded. The type of this * attribute is a NUL terminated \p char* with the length equal to the value * of ::HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH attribute. * The URI name syntax is defined by the following BNF syntax: * * code_object_uri ::== file_uri | memory_uri * file_uri ::== "file://" file_path [ range_specifier ] * memory_uri ::== "memory://" process_id range_specifier * range_specifier ::== [ "#" | "?" ] "offset=" number "&" "size=" number * file_path ::== URI_ENCODED_OS_FILE_PATH * process_id ::== DECIMAL_NUMBER * number ::== HEX_NUMBER | DECIMAL_NUMBER | OCTAL_NUMBER * * ``number`` is a C integral literal where hexadecimal values are prefixed by * "0x" or "0X", and octal values by "0". * * ``file_path`` is the file's path specified as a URI encoded UTF-8 string. * In URI encoding, every character that is not in the regular expression * ``[a-zA-Z0-9/_.~-]`` is encoded as two uppercase hexidecimal digits * proceeded by "%". Directories in the path are separated by "/". * * ``offset`` is a 0-based byte offset to the start of the code object. For a * file URI, it is from the start of the file specified by the ``file_path``, * and if omitted defaults to 0. For a memory URI, it is the memory address * and is required. * * ``size`` is the number of bytes in the code object. For a file URI, if * omitted it defaults to the size of the file. It is required for a memory * URI. * * ``process_id`` is the identity of the process owning the memory. For Linux * it is the C unsigned integral decimal literal for the process ID (PID). * * For example: * * file:///dir1/dir2/file1 * file:///dir3/dir4/file2#offset=0x2000&size=3000 * memory://1234#offset=0x20000&size=3000 */ HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI = 12, } hsa_ven_amd_loader_loaded_code_object_info_t; /** * @brief Get the current value of an attribute for a given loaded code * object. * * @param[in] loaded_code_object Loaded code object. * * @param[in] attribute Attribute to query. * * @param[out] value Pointer to an application-allocated buffer where to store * the value of the attribute. If the buffer passed by the application is not * large enough to hold the value of @p attribute, the behavior is undefined. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT The loaded code object is * invalid. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p attribute is an invalid * loaded code object attribute, or @p value is NULL. */ hsa_status_t hsa_ven_amd_loader_loaded_code_object_get_info( hsa_loaded_code_object_t loaded_code_object, hsa_ven_amd_loader_loaded_code_object_info_t attribute, void *value); //===----------------------------------------------------------------------===// /** * @brief Create a code object reader to operate on a file with size and offset. * * @param[in] file File descriptor. The file must have been opened by * application with at least read permissions prior calling this function. The * file must contain a vendor-specific code object. * * The file is owned and managed by the application; the lifetime of the file * descriptor must exceed that of any associated code object reader. * * @param[in] size Size of the code object embedded in @p file. * * @param[in] offset 0-based offset relative to the beginning of the @p file * that denotes the beginning of the code object embedded within the @p file. * * @param[out] code_object_reader Memory location to store the newly created * code object reader handle. Must not be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_FILE @p file is not opened with at least * read permissions. This condition may also be reported as * ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT_READER by the * ::hsa_executable_load_agent_code_object function. * * @retval ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT The bytes starting at offset * do not form a valid code object. If file size is 0. Or offset > file size. * This condition may also be reported as * ::HSA_STATUS_ERROR_INVALID_CODE_OBJECT by the * ::hsa_executable_load_agent_code_object function. * * @retval ::HSA_STATUS_ERROR_OUT_OF_RESOURCES The HSA runtime failed to * allocate the required resources. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p code_object_reader is NULL. */ hsa_status_t hsa_ven_amd_loader_code_object_reader_create_from_file_with_offset_size( hsa_file_t file, size_t offset, size_t size, hsa_code_object_reader_t *code_object_reader); //===----------------------------------------------------------------------===// /** * @brief Iterate over the available executables, and invoke an * application-defined callback on every iteration. While * ::hsa_ven_amd_loader_iterate_executables is executing any calls to * ::hsa_executable_create, ::hsa_executable_create_alt, or * ::hsa_executable_destroy will be blocked. * * @param[in] callback Callback to be invoked once per executable. The HSA * runtime passes two arguments to the callback: the executable and the * application data. If @p callback returns a status other than * ::HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and * ::hsa_ven_amd_loader_iterate_executables returns that status value. If * @p callback invokes ::hsa_executable_create, ::hsa_executable_create_alt, or * ::hsa_executable_destroy then the behavior is undefined. * * @param[in] data Application data that is passed to @p callback on every * iteration. May be NULL. * * @retval ::HSA_STATUS_SUCCESS The function has been executed successfully. * * @retval ::HSA_STATUS_ERROR_NOT_INITIALIZED The HSA runtime has not been * initialized. * * @retval ::HSA_STATUS_ERROR_INVALID_ARGUMENT @p callback is NULL. */ hsa_status_t hsa_ven_amd_loader_iterate_executables( hsa_status_t (*callback)( hsa_executable_t executable, void *data), void *data); //===----------------------------------------------------------------------===// /** * @brief Extension version. */ #define hsa_ven_amd_loader 001003 /** * @brief Extension function table version 1.00. */ typedef struct hsa_ven_amd_loader_1_00_pfn_s { hsa_status_t (*hsa_ven_amd_loader_query_host_address)( const void *device_address, const void **host_address); hsa_status_t (*hsa_ven_amd_loader_query_segment_descriptors)( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors); hsa_status_t (*hsa_ven_amd_loader_query_executable)( const void *device_address, hsa_executable_t *executable); } hsa_ven_amd_loader_1_00_pfn_t; /** * @brief Extension function table version 1.01. */ typedef struct hsa_ven_amd_loader_1_01_pfn_s { hsa_status_t (*hsa_ven_amd_loader_query_host_address)( const void *device_address, const void **host_address); hsa_status_t (*hsa_ven_amd_loader_query_segment_descriptors)( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors); hsa_status_t (*hsa_ven_amd_loader_query_executable)( const void *device_address, hsa_executable_t *executable); hsa_status_t (*hsa_ven_amd_loader_executable_iterate_loaded_code_objects)( hsa_executable_t executable, hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data); hsa_status_t (*hsa_ven_amd_loader_loaded_code_object_get_info)( hsa_loaded_code_object_t loaded_code_object, hsa_ven_amd_loader_loaded_code_object_info_t attribute, void *value); } hsa_ven_amd_loader_1_01_pfn_t; /** * @brief Extension function table version 1.02. */ typedef struct hsa_ven_amd_loader_1_02_pfn_s { hsa_status_t (*hsa_ven_amd_loader_query_host_address)( const void *device_address, const void **host_address); hsa_status_t (*hsa_ven_amd_loader_query_segment_descriptors)( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors); hsa_status_t (*hsa_ven_amd_loader_query_executable)( const void *device_address, hsa_executable_t *executable); hsa_status_t (*hsa_ven_amd_loader_executable_iterate_loaded_code_objects)( hsa_executable_t executable, hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data); hsa_status_t (*hsa_ven_amd_loader_loaded_code_object_get_info)( hsa_loaded_code_object_t loaded_code_object, hsa_ven_amd_loader_loaded_code_object_info_t attribute, void *value); hsa_status_t (*hsa_ven_amd_loader_code_object_reader_create_from_file_with_offset_size)( hsa_file_t file, size_t offset, size_t size, hsa_code_object_reader_t *code_object_reader); } hsa_ven_amd_loader_1_02_pfn_t; /** * @brief Extension function table version 1.03. */ typedef struct hsa_ven_amd_loader_1_03_pfn_s { hsa_status_t (*hsa_ven_amd_loader_query_host_address)( const void *device_address, const void **host_address); hsa_status_t (*hsa_ven_amd_loader_query_segment_descriptors)( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors); hsa_status_t (*hsa_ven_amd_loader_query_executable)( const void *device_address, hsa_executable_t *executable); hsa_status_t (*hsa_ven_amd_loader_executable_iterate_loaded_code_objects)( hsa_executable_t executable, hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data); hsa_status_t (*hsa_ven_amd_loader_loaded_code_object_get_info)( hsa_loaded_code_object_t loaded_code_object, hsa_ven_amd_loader_loaded_code_object_info_t attribute, void *value); hsa_status_t (*hsa_ven_amd_loader_code_object_reader_create_from_file_with_offset_size)( hsa_file_t file, size_t offset, size_t size, hsa_code_object_reader_t *code_object_reader); hsa_status_t (*hsa_ven_amd_loader_iterate_executables)( hsa_status_t (*callback)( hsa_executable_t executable, void *data), void *data); } hsa_ven_amd_loader_1_03_pfn_t; #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* HSA_VEN_AMD_LOADER_H */ ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/000077500000000000000000000000001420110115200201715ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/amd_elf_image.cpp000066400000000000000000001651151420110115200234370ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "core/inc/amd_elf_image.hpp" #include "amd_hsa_code_util.hpp" #include #include #include #include #include #include #include #include #include #ifdef _WIN32 #include #define alignof __alignof #endif // _WIN32 #include #ifndef _WIN32 #define _open open #define _close close #define _tempnam tempnam #include #include #endif #if defined(USE_MEMFILE) #include "memfile.h" #define OpenTemp(f) mem_open(NULL, 0, 0) #define CloseTemp(f) mem_close(f) #define _read(f, b, l) mem_read((f), (b), (l)) #define _write(f, b, l) mem_write((f), (b), (l)) #define _lseek(f, l, w) mem_lseek((f), (l), (w)) #define _ftruncate(f, l) mem_ftruncate((f), (size_t)(l)) #define sendfile(o, i, p, s) mem_sendfile((o), (i), (p), (s)) #else // USE_MEMFILE #define OpenTemp(f) amd::hsa::OpenTempFile(f); #define CloseTemp(f) amd::hsa::CloseTempFile(f); #ifndef _WIN32 #define _read read #define _write write #define _lseek lseek #define _ftruncate ftruncate #include #else #define _ftruncate _chsize #endif // !_WIN32 #endif // !USE_MEMFILE #if !defined(BSD_LIBELF) #define elf_setshstrndx elfx_update_shstrndx #endif #define NOTE_RECORD_ALIGNMENT 4 using rocr::amd::hsa::alignUp; namespace rocr { namespace amd { namespace elf { class FileImage { public: FileImage(); ~FileImage(); bool create(); bool readFrom(const std::string& filename); bool copyFrom(const void* data, size_t size); bool writeTo(const std::string& filename); bool copyTo(void** buffer, size_t* size = 0); bool copyTo(void* buffer, size_t size); size_t getSize(); std::string output() { return out.str(); } int fd() { return d; } private: int d; std::ostringstream out; bool error(const char* msg); bool perror(const char *msg); std::string werror(); }; FileImage::FileImage() : d(-1) { } FileImage::~FileImage() { if (d != -1) { CloseTemp(d); } } bool FileImage::error(const char* msg) { out << "Error: " << msg << std::endl; return false; } bool FileImage::perror(const char* msg) { out << "Error: " << msg << ": " << strerror(errno) << std::endl; return false; } #ifdef _WIN32 std::string FileImage::werror() { LPVOID lpMsgBuf; DWORD dw = GetLastError(); FormatMessage( FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, NULL, dw, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), (LPTSTR)&lpMsgBuf, 0, NULL); std::string result((LPTSTR)lpMsgBuf); LocalFree(lpMsgBuf); return result; } #endif // _WIN32 bool FileImage::create() { d = OpenTemp("amdelf"); if (d == -1) { return error("Failed to open temporary file for elf image"); } return true; } bool FileImage::readFrom(const std::string& filename) { #ifdef _WIN32 std::unique_ptr buffer(new char[32 * 1024 * 1024]); HANDLE in = CreateFile(filename.c_str(), GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (in == INVALID_HANDLE_VALUE) { out << "Failed to open " << filename << ": " << werror() << std::endl; return false; } DWORD read; unsigned write; int written; do { if (!ReadFile(in, buffer.get(), sizeof(buffer), &read, NULL)) { out << "Failed to read " << filename << ": " << werror() << std::endl; CloseHandle(in); return false; } if (read > 0) { write = read; do { written = _write(d, buffer.get(), write); if (written < 0) { out << "Failed to write image file: " << werror() << std::endl; CloseHandle(in); } write -= written; } while (write > 0); } } while (read > 0); if (_lseek(d, 0L, SEEK_SET) < 0) { return perror("lseek(0) failed"); } CloseHandle(in); return true; #else // _WIN32 int in = _open(filename.c_str(), O_RDONLY); if (in < 0) { return perror("open failed"); } if (_lseek(in, 0L, SEEK_END) < 0) { return perror("lseek failed"); } off_t size; if ((size = _lseek(in, 0L, SEEK_CUR)) < 0) { return perror("lseek(2) failed"); } if (_lseek(in, 0L, SEEK_SET) < 0) { return perror("lseek(3) failed"); } if (_lseek(d, 0L, SEEK_SET) < 0) { return perror("lseek(3) failed"); } ssize_t written; do { written = sendfile(d, in, NULL, size); if (written < 0) { _close(in); return perror("sendfile failed"); } size -= written; } while (size > 0); _close(in); if (_lseek(d, 0L, SEEK_SET) < 0) { return perror("lseek(0) failed"); } return true; #endif // _WIN32 } bool FileImage::copyFrom(const void* data, size_t size) { assert(d != -1); if (_lseek(d, 0L, SEEK_SET) < 0) { return perror("lseek failed"); } if (_ftruncate(d, 0) < 0) { return perror("ftruncate failed"); } int written, offset = 0; while (size > 0) { written = _write(d, (const char*) data + offset, size); if (written < 0) { return perror("write failed"); } size -= written; offset += written; } if (_lseek(d, 0L, SEEK_SET) < 0) { return perror("lseek failed"); } return true; } size_t FileImage::getSize() { assert(d != -1); if (_lseek(d, 0L, SEEK_END) < 0) { return perror("lseek failed"); } long seek = 0; if ((seek = _lseek(d, 0L, SEEK_CUR)) < 0) { return perror("lseek(2) failed"); } if (_lseek(d, 0L, SEEK_SET) < 0) { return perror("lseek(3) failed"); } return seek; } bool FileImage::copyTo(void** buffer, size_t* size) { size_t size1 = getSize(); void* buffer1 = malloc(size1); if (_read(d, buffer1, size1) < 0) { free(buffer1); return perror("read failed"); } *buffer = buffer1; if (size) { *size = size1; } return true; } bool FileImage::copyTo(void* buffer, size_t size) { size_t size1 = getSize(); if (size < size1) { return error("Buffer size is not enough"); } if (_read(d, buffer, size1) < 0) { return perror("read failed"); } return true; } bool FileImage::writeTo(const std::string& filename) { bool res = false; size_t size = 0; void *buffer = nullptr; if (copyTo(&buffer, &size)) { res = true; std::ofstream out(filename.c_str(), std::ios::binary); out.write((char*)buffer, size); } free(buffer); return res; } class Buffer { public: typedef unsigned char byte_type; typedef size_t size_type; Buffer(); Buffer(const byte_type *src, size_type size, size_type align = 0); virtual ~Buffer(); const byte_type* raw() const { return this->isConst() ? ptr_ : data_.data(); } size_type align() const { return align_; } size_type size() const { return this->isConst() ? size_ : data_.size(); } bool isConst() const { return 0 != size_; } bool isEmpty() { return size() == 0; } bool hasRaw(const byte_type *src) const { return (src >= this->raw()) && (src < this->raw() + this->size()); } template bool has(const T *src) const { return this->hasRaw((const byte_type*)src); } bool has(size_type offset) const { return offset < this->size(); } template size_type getOffset(const T *src) const { return this->getRawOffset((const byte_type*)src); } template T get(size_type offset) const { return (T)this->getRaw(offset); } size_type addString(const std::string &str, size_type align = 0); size_type addStringLength(const std::string &str, size_type align = 0); size_type nextOffset(size_type align) const { return alignUp(this->size(), align); } template size_type add(const T *src, size_type size, size_type align) { return this->addRaw((const byte_type*)src, size, align); } template size_type add(const T &src, size_type align = 0) { return this->addRaw((const byte_type*)&src, sizeof(T), align == 0 ? alignof(T) : align); } size_type align(size_type align); template size_type reserve() { Buffer::size_type offset = this->align(alignof(T)); data_.insert(data_.end(), sizeof(T), 0x0); return offset; } private: size_type getRawOffset(const byte_type *src) const; const byte_type* getRaw(size_type offset) const; size_type addRaw(const byte_type *src, size_type size, size_type align); std::vector data_; const byte_type *ptr_; size_type size_; size_type align_; }; Buffer::Buffer() : ptr_(nullptr) , size_(0) , align_(0) { } Buffer::Buffer(const Buffer::byte_type *src, Buffer::size_type size, Buffer::size_type align) : ptr_(src) , size_(size) , align_(align) { } Buffer::~Buffer() { } Buffer::size_type Buffer::getRawOffset(const Buffer::byte_type *src) const { assert(this->has(src)); return src - this->raw(); } const Buffer::byte_type* Buffer::getRaw(Buffer::size_type offset) const { assert(this->has(offset)); return this->raw() + offset; } Buffer::size_type Buffer::addRaw(const Buffer::byte_type *src, Buffer::size_type size, Buffer::size_type align) { assert(!this->isConst()); assert(nullptr != src); assert(0 != size); assert(0 != align); Buffer::size_type offset = this->align(align); data_.insert(data_.end(), src, src + size); return offset; } Buffer::size_type Buffer::addString(const std::string &str, size_type align) { return this->add(str.c_str(), str.length() + 1, align == 0 ? alignof(char) : align); } Buffer::size_type Buffer::addStringLength(const std::string &str, size_type align) { return this->add((uint32_t)(str.length() + 1), align == 0 ? alignof(uint32_t) : align); } Buffer::size_type Buffer::align(Buffer::size_type align) { assert(!this->isConst()); assert(0 != align); Buffer::size_type offset = alignUp(this->size(), align); align_ = (std::max)(align_, align); data_.insert(data_.end(), offset - this->size(), 0x0); return offset; } class GElfImage; class GElfSegment; class GElfSection : public virtual Section { public: GElfSection(GElfImage* elf); bool push(const char* name, uint32_t shtype, uint64_t shflags, uint16_t shlink, uint32_t info, uint32_t align, uint64_t entsize = 0); bool pull0(); bool pull(uint16_t ndx); virtual bool pullData() { return true; } bool push(); uint16_t getSectionIndex() const override; uint32_t type() const override { return hdr.sh_type; } std::string Name() const override; uint64_t offset() const override { return hdr.sh_offset; } uint64_t addr() const override { return hdr.sh_addr; } bool updateAddr(uint64_t addr) override; uint64_t addralign() const override { return data0.size() == 0 ? data.align() : data0.align(); } uint64_t flags() const override { return hdr.sh_flags; } uint64_t size() const override { return data0.size() == 0 ? data.size() : data0.size(); } uint64_t nextDataOffset(uint64_t align) const override; uint64_t addData(const void *src, uint64_t size, uint64_t align) override; bool getData(uint64_t offset, void* dest, uint64_t size) override; bool hasRelocationSection() const override { return reloc_sec != 0; } RelocationSection* relocationSection(SymbolTable* symtab = 0) override; Segment* segment() override { return seg; } RelocationSection* asRelocationSection() override { return 0; } bool setMemSize(uint64_t s) override { memsize_ = s; return true; } uint64_t memSize() const override { return memsize_ ? memsize_ : size(); } bool setAlign(uint64_t a) override { align_ = a; return true; } uint64_t memAlign() const override { return align_ ? align_ : addralign(); } protected: GElfImage* elf; Segment* seg; GElf_Shdr hdr; Buffer data0, data; uint64_t memsize_; uint64_t align_; RelocationSection *reloc_sec; size_t ndxscn; friend class GElfSymbol; friend class GElfSegment; friend class GElfImage; }; class GElfSegment : public Segment { public: GElfSegment(GElfImage* elf, uint16_t index); GElfSegment(GElfImage* elf, uint16_t index, uint32_t type, uint32_t flags, uint64_t paddr = 0); bool push(uint64_t vaddr); bool pull(); uint64_t type() const override { return phdr.p_type; } uint64_t memSize() const override { return phdr.p_memsz; } uint64_t align() const override { return phdr.p_align; } uint64_t imageSize() const override { return phdr.p_filesz; } uint64_t vaddr() const override { return phdr.p_vaddr; } uint64_t flags() const override { return phdr.p_flags; } uint64_t offset() const override { return phdr.p_offset; } const char* data() const override; uint16_t getSegmentIndex() override; bool updateAddSection(Section *section) override; private: GElfImage* elf; uint16_t index; GElf_Phdr phdr; std::vector sections; }; class GElfStringTable : public GElfSection, public StringTable { public: GElfStringTable(GElfImage* elf); bool push(const char* name, uint32_t shtype, uint64_t shflags); bool pullData() override; const char* addString(const std::string& s) override; size_t addString1(const std::string& s) override; const char* getString(size_t ndx) override; size_t getStringIndex(const char* name) override; uint16_t getSectionIndex() const override { return GElfSection::getSectionIndex(); } uint32_t type() const override { return GElfSection::type(); } std::string Name() const override { return GElfSection::Name(); } uint64_t addr() const override { return GElfSection::addr(); } uint64_t offset() const override { return GElfSection::offset(); } bool updateAddr(uint64_t addr) override { return GElfSection::updateAddr(addr); } uint64_t addralign() const override { return GElfSection::addralign(); } uint64_t flags() const override { return GElfSection::flags(); } uint64_t size() const override { return GElfSection::size(); } Segment* segment() override { return GElfSection::segment(); } uint64_t nextDataOffset(uint64_t align) const override { return GElfSection::nextDataOffset(align); } uint64_t addData(const void *src, uint64_t size, uint64_t align) override { return GElfSection::addData(src, size, align); } bool getData(uint64_t offset, void* dest, uint64_t size) override { return GElfSection::getData(offset, dest, size); } bool hasRelocationSection() const override { return GElfSection::hasRelocationSection(); } RelocationSection* relocationSection(SymbolTable* symtab) override { return GElfSection::relocationSection(); } RelocationSection* asRelocationSection() override { return 0; } uint64_t memSize() const override { return GElfSection::memSize(); } bool setMemSize(uint64_t s) override { return GElfSection::setMemSize(s); } uint64_t memAlign() const override { return GElfSection::memAlign(); } bool setAlign(uint64_t a) override { return GElfSection::setAlign(a); } }; class GElfSymbolTable; class GElfSymbol : public Symbol { public: GElfSymbol(GElfSymbolTable* symtab, Buffer &data, size_t index); bool push(const std::string& name, uint64_t value, uint64_t size, unsigned char type, unsigned char binding, uint16_t shndx, unsigned char other); uint32_t index() override { return eindex / sizeof(GElf_Rela); } uint32_t type() override { return GELF_ST_TYPE(Sym()->st_info); } uint32_t binding() override { return GELF_ST_BIND(Sym()->st_info); } uint64_t size() override { return Sym()->st_size; } uint64_t value() override { return Sym()->st_value; } unsigned char other() override { return Sym()->st_other; } std::string name() override; Section* section() override; void setValue(uint64_t value) override { Sym()->st_value = value; } void setSize(uint64_t size) override { Sym()->st_size = size; } private: GElf_Sym* Sym() { return edata.get(eindex); } GElfSymbolTable* symtab; Buffer &edata; size_t eindex; friend class GElfSymbolTable; }; class GElfSymbolTable : public GElfSection, public SymbolTable { private: Symbol* addSymbolInternal(Section* section, const std::string& name, uint64_t value, uint64_t size, unsigned char type, unsigned char binding, unsigned char other = 0); GElfStringTable* strtab; std::vector> symbols; friend class GElfSymbol; public: GElfSymbolTable(GElfImage* elf); bool push(const char* name, GElfStringTable* strtab); bool pullData() override; uint16_t getSectionIndex() const override { return GElfSection::getSectionIndex(); } uint32_t type() const override { return GElfSection::type(); } std::string Name() const override { return GElfSection::Name(); } uint64_t offset() const override { return GElfSection::offset(); } uint64_t addr() const override { return GElfSection::addr(); } bool updateAddr(uint64_t addr) override { return GElfSection::updateAddr(addr); } uint64_t addralign() const override { return GElfSection::addralign(); } uint64_t flags() const override { return GElfSection::flags(); } uint64_t size() const override { return GElfSection::size(); } Segment* segment() override { return GElfSection::segment(); } uint64_t nextDataOffset(uint64_t align) const override { return GElfSection::nextDataOffset(align); } uint64_t addData(const void *src, uint64_t size, uint64_t align) override { return GElfSection::addData(src, size, align); } bool getData(uint64_t offset, void* dest, uint64_t size) override { return GElfSection::getData(offset, dest, size); } bool hasRelocationSection() const override { return GElfSection::hasRelocationSection(); } RelocationSection* relocationSection(SymbolTable* symtab) override { return GElfSection::relocationSection(); } Symbol* addSymbol(Section* section, const std::string& name, uint64_t value, uint64_t size, unsigned char type, unsigned char binding, unsigned char other = 0) override; size_t symbolCount() override; Symbol* symbol(size_t i) override; RelocationSection* asRelocationSection() override { return 0; } uint64_t memSize() const override { return GElfSection::memSize(); } bool setMemSize(uint64_t s) override { return GElfSection::setMemSize(s); } uint64_t memAlign() const override { return GElfSection::memAlign(); } bool setAlign(uint64_t a) override { return GElfSection::setAlign(a); } }; class GElfNoteSection : public GElfSection, public NoteSection { public: GElfNoteSection(GElfImage* elf); bool push(const std::string& name); uint16_t getSectionIndex() const override { return GElfSection::getSectionIndex(); } uint32_t type() const override { return GElfSection::type(); } std::string Name() const override { return GElfSection::Name(); } uint64_t addr() const override { return GElfSection::addr(); } bool updateAddr(uint64_t addr) override { return GElfSection::updateAddr(addr); } uint64_t offset() const override { return GElfSection::offset(); } uint64_t addralign() const override { return GElfSection::addralign(); } uint64_t flags() const override { return GElfSection::flags(); } uint64_t size() const override { return GElfSection::size(); } Segment* segment() override { return GElfSection::segment(); } uint64_t nextDataOffset(uint64_t align) const override { return GElfSection::nextDataOffset(align); } uint64_t addData(const void *src, uint64_t size, uint64_t align) override { return GElfSection::addData(src, size, align); } bool getData(uint64_t offset, void* dest, uint64_t size) override { return GElfSection::getData(offset, dest, size); } bool hasRelocationSection() const override { return GElfSection::hasRelocationSection(); } RelocationSection* relocationSection(SymbolTable* symtab) override { return GElfSection::relocationSection(); } bool addNote(const std::string& name, uint32_t type, const void* desc, uint32_t desc_size) override; bool getNote(const std::string& name, uint32_t type, void** desc, uint32_t* desc_size) override; RelocationSection* asRelocationSection() override { return 0; } uint64_t memSize() const override { return GElfSection::memSize(); } bool setMemSize(uint64_t s) override { return GElfSection::setMemSize(s); } uint64_t memAlign() const override { return GElfSection::memAlign(); } bool setAlign(uint64_t a) override { return GElfSection::setAlign(a); } }; class GElfRelocationSection; class GElfRelocation : public Relocation { private: GElf_Rela *Rela() { return edata.get(eindex); } GElfRelocationSection* rsection; Buffer &edata; size_t eindex; public: GElfRelocation(GElfRelocationSection* rsection_, Buffer &edata_, size_t eindex_) : rsection(rsection_), edata(edata_), eindex(eindex_) { } bool push(uint32_t type, Symbol* symbol, uint64_t offset, int64_t addend); RelocationSection* section() override; uint32_t type() override { return GELF_R_TYPE(Rela()->r_info); } uint32_t symbolIndex() override { return GELF_R_SYM(Rela()->r_info); } Symbol* symbol() override; uint64_t offset() override { return Rela()->r_offset; } int64_t addend() override { return Rela()->r_addend; } }; class GElfRelocationSection : public GElfSection, public RelocationSection { private: Section* section; GElfSymbolTable* symtab; std::vector> relocations; public: GElfRelocationSection(GElfImage* elf, Section* targetSection = 0, GElfSymbolTable* symtab_ = 0); bool push(const std::string& name); bool pullData() override; uint16_t getSectionIndex() const override { return GElfSection::getSectionIndex(); } uint32_t type() const override { return GElfSection::type(); } std::string Name() const override { return GElfSection::Name(); } uint64_t addr() const override { return GElfSection::addr(); } uint64_t offset() const override { return GElfSection::offset(); } bool updateAddr(uint64_t addr) override { return GElfSection::updateAddr(addr); } uint64_t addralign() const override { return GElfSection::addralign(); } uint64_t flags() const override { return GElfSection::flags(); } uint64_t size() const override { return GElfSection::size(); } Segment* segment() override { return GElfSection::segment(); } uint64_t nextDataOffset(uint64_t align) const override { return GElfSection::nextDataOffset(align); } uint64_t addData(const void *src, uint64_t size, uint64_t align) override { return GElfSection::addData(src, size, align); } bool getData(uint64_t offset, void* dest, uint64_t size) override { return GElfSection::getData(offset, dest, size); } bool hasRelocationSection() const override { return GElfSection::hasRelocationSection(); } RelocationSection* relocationSection(SymbolTable* symtab) override { return GElfSection::relocationSection(); } RelocationSection* asRelocationSection() override { return this; } size_t relocationCount() const override { return relocations.size(); } Relocation* relocation(size_t i) override { return relocations[i].get(); } Relocation* addRelocation(uint32_t type, Symbol* symbol, uint64_t offset, int64_t addend) override; Section* targetSection() override { return section; } uint64_t memSize() const override { return GElfSection::memSize(); } bool setMemSize(uint64_t s) override { return GElfSection::setMemSize(s); } uint64_t memAlign() const override { return GElfSection::memAlign(); } bool setAlign(uint64_t a) override { return GElfSection::setAlign(a); } friend class GElfRelocation; }; class GElfImage : public Image { public: GElfImage(int elfclass); ~GElfImage(); bool initNew(uint16_t machine, uint16_t type, uint8_t os_abi = 0, uint8_t abi_version = 0, uint32_t e_flags = 0) override; bool loadFromFile(const std::string& filename) override; bool saveToFile(const std::string& filename) override; bool initFromBuffer(const void* buffer, size_t size) override; bool initAsBuffer(const void* buffer, size_t size) override; bool close(); bool writeTo(const std::string& filename) override; bool copyToBuffer(void** buf, size_t* size = 0) override; bool copyToBuffer(void* buf, size_t size) override; const char* data() override { assert(buffer); return buffer; } uint64_t size() override; bool push(); bool Freeze() override; bool Validate() override; uint16_t Machine() override { return ehdr.e_machine; } uint16_t Type() override { return ehdr.e_type; } uint32_t EFlags() override { return ehdr.e_flags; } uint32_t ABIVersion() override { return (uint32_t)(ehdr.e_ident[EI_ABIVERSION]); } uint32_t EClass() override { return (uint32_t)(ehdr.e_ident[EI_CLASS]); } uint32_t OsAbi() override { return (uint32_t)(ehdr.e_ident[EI_OSABI]); } GElfStringTable* shstrtab() override; GElfStringTable* strtab() override; GElfSymbolTable* getSymtab(uint16_t index) override { return static_cast(section(index)); } GElfStringTable* addStringTable(const std::string& name) override; GElfStringTable* getStringTable(uint16_t index) override; GElfSymbolTable* addSymbolTable(const std::string& name, StringTable* stab = 0) override; GElfSymbolTable* symtab() override; GElfSegment* segment(size_t i) override { return segments[i].get(); } Segment* segmentByVAddr(uint64_t vaddr) override; size_t sectionCount() override { return sections.size(); } GElfSection* section(size_t i) override { return sections[i].get(); } Section* sectionByVAddr(uint64_t vaddr) override; uint16_t machine() const; uint16_t etype() const; int eclass() const { return elfclass; } bool elfError(const char* msg); GElfNoteSection* note() override; GElfNoteSection* addNoteSection(const std::string& name) override; size_t segmentCount() override { return segments.size(); } Segment* initSegment(uint32_t type, uint32_t flags, uint64_t paddr = 0) override; bool addSegments() override; Section* addSection(const std::string &name, uint32_t type, uint64_t flags = 0, uint64_t entsize = 0, Segment* segment = 0) override; RelocationSection* addRelocationSection(Section* sec, SymbolTable* symtab); RelocationSection* relocationSection(Section* sec, SymbolTable* symtab = 0) override; private: bool frozen; int elfclass; FileImage img; const char* buffer; size_t bufferSize; Elf* e; GElf_Ehdr ehdr; GElfStringTable* shstrtabSection; GElfStringTable* strtabSection; GElfSymbolTable* symtabSection; GElfNoteSection* noteSection; std::vector> segments; std::vector> sections; bool imgError(); const char *elfError(); bool elfBegin(Elf_Cmd cmd); bool elfEnd(); bool push0(); bool pullElf(); friend class GElfSection; friend class GElfSymbolTable; friend class GElfNoteSection; friend class GElfRelocationSection; friend class GElfSegment; friend class GElfSymbol; }; GElfSegment::GElfSegment(GElfImage* elf_, uint16_t index_) : elf(elf_), index(index_) { memset(&phdr, 0, sizeof(phdr)); } GElfSegment::GElfSegment(GElfImage* elf_, uint16_t index_, uint32_t type, uint32_t flags, uint64_t paddr) : elf(elf_), index(index_) { memset(&phdr, 0, sizeof(phdr)); phdr.p_type = type; phdr.p_flags = flags; phdr.p_paddr = paddr; } const char* GElfSegment::data() const { return (const char*) elf->data() + phdr.p_offset; } bool GElfImage::Freeze() { assert(!frozen); if (!push()) { return false; } frozen = true; return true; } bool GElfImage::Validate() { if (ELFMAG0 != ehdr.e_ident[EI_MAG0] || ELFMAG1 != ehdr.e_ident[EI_MAG1] || ELFMAG2 != ehdr.e_ident[EI_MAG2] || ELFMAG3 != ehdr.e_ident[EI_MAG3]) { out << "Invalid ELF magic" << std::endl; return false; } if (EV_CURRENT != ehdr.e_version) { out << "Invalid ELF version" << std::endl; return false; } return true; } bool GElfSegment::push(uint64_t vaddr) { phdr.p_align = 0; phdr.p_offset = 0; if (!sections.empty()) { phdr.p_offset = sections[0]->offset(); } for (Section* section : sections) { phdr.p_align = (std::max)(phdr.p_align, section->memAlign()); } phdr.p_vaddr = alignUp(vaddr, (std::max)(phdr.p_align, (uint64_t) 1)); phdr.p_filesz = 0; phdr.p_memsz = 0; for (Section* section : sections) { phdr.p_memsz = alignUp(phdr.p_memsz, (std::max)(section->memAlign(), (uint64_t) 1)); phdr.p_filesz = alignUp(phdr.p_filesz, (std::max)(section->memAlign(), (uint64_t) 1)); if (!section->updateAddr(phdr.p_vaddr + phdr.p_memsz)) { return false; } phdr.p_filesz += (section->type() == SHT_NOBITS) ? 0 : section->size(); phdr.p_memsz += section->memSize(); } if (!gelf_update_phdr(elf->e, index, &phdr)) { return elf->elfError("gelf_update_phdr failed"); } return true; } bool GElfSegment::pull() { if (!gelf_getphdr(elf->e, index, &phdr)) { return elf->elfError("gelf_getphdr failed"); } return true; } uint16_t GElfSegment::getSegmentIndex() { return index; } bool GElfSegment::updateAddSection(Section *section) { sections.push_back(section); return true; } GElfSection::GElfSection(GElfImage* elf_) : elf(elf_), memsize_(0), align_(0), reloc_sec(nullptr), ndxscn(0) { } uint16_t GElfSection::getSectionIndex() const { return (uint16_t)ndxscn; } std::string GElfSection::Name() const { return std::string(elf->shstrtab()->getString(hdr.sh_name)); } bool GElfSection::updateAddr(uint64_t addr) { Elf_Scn *scn = elf_getscn(elf->e, ndxscn); assert(scn); if (!gelf_getshdr(scn, &hdr)) { return elf->elfError("gelf_get_shdr failed"); } hdr.sh_addr = addr; if (!gelf_update_shdr(scn, &hdr)) { return elf->elfError("gelf_update_shdr failed"); } return true; } bool GElfSection::push(const char* name, uint32_t shtype, uint64_t shflags, uint16_t shlink, uint32_t info, uint32_t align, uint64_t entsize) { Elf_Scn *scn = elf_newscn(elf->e); if (!scn) { return false; } ndxscn = elf_ndxscn(scn); if (!gelf_getshdr(scn, &hdr)) { return elf->elfError("gelf_get_shdr failed"); } align = (std::max)(align, (uint32_t) 8); hdr.sh_name = elf->shstrtab()->addString1(name); hdr.sh_type = shtype; hdr.sh_flags = shflags; hdr.sh_link = shlink; hdr.sh_addr = 0; hdr.sh_info = info; hdr.sh_addralign = align; hdr.sh_entsize = entsize; if (!gelf_update_shdr(scn, &hdr)) { return elf->elfError("gelf_update_shdr failed"); } return true; } bool GElfSection::pull0() { Elf_Scn *scn = elf_getscn(elf->e, ndxscn); if (!scn) { return false; } if (!gelf_getshdr(scn, &hdr)) { return elf->elfError("gelf_get_shdr failed"); } return true; } bool GElfSection::pull(uint16_t ndx) { ndxscn = (size_t) ndx; if (!pull0()) { return false; } Elf_Scn *scn = elf_getscn(elf->e, ndx); if (!scn) { return false; } Elf_Data *edata0 = elf_getdata(scn, NULL); if (edata0) { data0 = Buffer((const Buffer::byte_type*)edata0->d_buf, edata0->d_size, edata0->d_align); } seg = elf->segmentByVAddr(hdr.sh_addr); return true; } bool GElfSection::push() { Elf_Scn *scn = elf_getscn(elf->e, ndxscn); assert(scn); Elf_Data *edata = nullptr; edata = elf_newdata(scn); if (!edata) { return elf->elfError("elf_newdata failed"); } if (hdr.sh_type == SHT_NOBITS) { edata->d_buf = 0; edata->d_size = memsize_; if (align_ != 0) { edata->d_align = align_; } } else { edata->d_buf = (void*)data.raw(); edata->d_size = data.size(); if (data.align() != 0) { edata->d_align = data.align(); } } edata->d_align = (std::max)(edata->d_align, (uint64_t) 8); switch (hdr.sh_type) { case SHT_RELA: edata->d_type = ELF_T_RELA; break; case SHT_SYMTAB: edata->d_type = ELF_T_SYM; break; default: edata->d_type = ELF_T_BYTE; break; } edata->d_version = EV_CURRENT; if (!gelf_getshdr(scn, &hdr)) { return elf->elfError("gelf_get_shdr failed"); } hdr.sh_size = edata->d_size; hdr.sh_addralign = edata->d_align; if (!gelf_update_shdr(scn, &hdr)) { return elf->elfError("gelf_update_shdr failed"); } return true; } uint64_t GElfSection::nextDataOffset(uint64_t align) const { return data.nextOffset(align); } uint64_t GElfSection::addData(const void *src, uint64_t size, uint64_t align) { return data.add(src, size, align); } bool GElfSection::getData(uint64_t offset, void* dest, uint64_t size) { Elf_Data* edata = 0; uint64_t coffset = 0; uint64_t csize = 0; Elf_Scn *scn = elf_getscn(elf->e, ndxscn); assert(scn); if ((edata = elf_getdata(scn, edata)) != 0) { if (coffset <= offset && offset <= coffset + edata->d_size) { csize = (std::min)(size, edata->d_size - offset); memcpy(dest, (const char*) edata->d_buf + offset - coffset, csize); coffset += csize; dest = (char*) dest + csize; size -= csize; if (!size) { return true; } } } return false; } RelocationSection* GElfSection::relocationSection(SymbolTable* symtab) { if (!reloc_sec) { reloc_sec = elf->addRelocationSection(this, symtab); } return reloc_sec; } GElfStringTable::GElfStringTable(GElfImage* elf) : GElfSection(elf) { } bool GElfStringTable::push(const char* name, uint32_t shtype, uint64_t shflags) { if (!GElfSection::push(name, shtype, shflags, SHN_UNDEF, 0, 0)) { return false; } return true; } bool GElfStringTable::pullData() { return true; } const char* GElfStringTable::addString(const std::string& s) { if (data0.size() == 0 && data.size() == 0) { data.add('\0'); } return data.get(data.addString(s)); } size_t GElfStringTable::addString1(const std::string& s) { if (data0.size() == 0 && data.size() == 0) { data.add('\0'); } return data.addString(s); } const char* GElfStringTable::getString(size_t ndx) { if (data0.has(ndx)) { return data0.get(ndx); } else if (data.has(ndx)) { return data.get(ndx); } return nullptr; } size_t GElfStringTable::getStringIndex(const char* s) { if (data0.has(s)) { return data0.getOffset(s); } else if (data.has(s)) { return data.getOffset(s); } else { assert(false); return 0; } } GElfSymbol::GElfSymbol(GElfSymbolTable* symtab_, Buffer &data_, size_t index_) : symtab(symtab_), edata(data_), eindex(index_) { } Section* GElfSymbol::section() { if (Sym()->st_shndx != SHN_UNDEF) { return symtab->elf->section(Sym()->st_shndx); } return 0; } bool GElfSymbol::push(const std::string& name, uint64_t value, uint64_t size, unsigned char type, unsigned char binding, uint16_t shndx, unsigned char other) { Sym()->st_name = symtab->strtab->addString1(name.c_str()); Sym()->st_value = value; Sym()->st_size = size; Sym()->st_info = GELF_ST_INFO(binding, type); Sym()->st_shndx = shndx; Sym()->st_other = other; return true; } std::string GElfSymbol::name() { return symtab->strtab->getString(Sym()->st_name); } GElfSymbolTable::GElfSymbolTable(GElfImage* elf) : GElfSection(elf), strtab(0) { } bool GElfSymbolTable::push(const char* name, GElfStringTable* strtab) { if (!strtab) { strtab = elf->strtab(); } this->strtab = strtab; if (!GElfSection::push(name, SHT_SYMTAB, 0, strtab->getSectionIndex(), 0, 0, sizeof(Elf64_Sym))) { return false; } return true; } bool GElfSymbolTable::pullData() { strtab = elf->getStringTable(hdr.sh_link); for (size_t i = 0; i < data0.size() / sizeof(GElf_Sym); ++i) { symbols.push_back(std::unique_ptr(new GElfSymbol(this, data0, i * sizeof(GElf_Sym)))); } return true; } Symbol* GElfSymbolTable::addSymbolInternal(Section* section, const std::string& name, uint64_t value, uint64_t size, unsigned char type, unsigned char binding, unsigned char other) { GElfSymbol *sym = new (std::nothrow) GElfSymbol(this, data, data.reserve()); uint16_t shndx = section ? section->getSectionIndex() : (uint16_t) SHN_UNDEF; if (!sym->push(name, value, size, type, binding, shndx, other)) { delete sym; return nullptr; } symbols.push_back(std::unique_ptr(sym)); return sym; } Symbol* GElfSymbolTable::addSymbol(Section* section, const std::string& name, uint64_t value, uint64_t size, unsigned char type, unsigned char binding, unsigned char other) { if (symbols.size() == 0) { this->addSymbolInternal(nullptr, "", 0, 0, 0, 0, 0); } return this->addSymbolInternal(section, name, value, size, type, binding, other); } size_t GElfSymbolTable::symbolCount() { return symbols.size(); } Symbol* GElfSymbolTable::symbol(size_t i) { return symbols[i].get(); } GElfNoteSection::GElfNoteSection(GElfImage* elf) : GElfSection(elf) { } bool GElfNoteSection::push(const std::string& name) { return GElfSection::push(name.c_str(), SHT_NOTE, 0, 0, 0, 8); } bool GElfNoteSection::addNote(const std::string& name, uint32_t type, const void* desc, uint32_t desc_size) { data.addStringLength(name, NOTE_RECORD_ALIGNMENT); data.add(desc_size, NOTE_RECORD_ALIGNMENT); data.add(type, NOTE_RECORD_ALIGNMENT); data.addString(name, NOTE_RECORD_ALIGNMENT); data.align(NOTE_RECORD_ALIGNMENT); if (desc_size > 0) { assert(desc); data.add(desc, desc_size, NOTE_RECORD_ALIGNMENT); data.align(NOTE_RECORD_ALIGNMENT); } return true; } bool GElfNoteSection::getNote(const std::string& name, uint32_t type, void** desc, uint32_t* desc_size) { Elf_Data* data = 0; Elf_Scn *scn = elf_getscn(elf->e, ndxscn); assert(scn); while ((data = elf_getdata(scn, data)) != 0) { uint32_t note_offset = 0; while (note_offset < data->d_size) { char* notec = (char *) data->d_buf + note_offset; Elf64_Nhdr* note = (Elf64_Nhdr*) notec; if (type == note->n_type) { std::string note_name = GetNoteString(note->n_namesz, notec + sizeof(Elf64_Nhdr)); if (name == note_name) { *desc = notec + sizeof(Elf64_Nhdr) + alignUp(note->n_namesz, 4); *desc_size = note->n_descsz; return true; } } note_offset += sizeof(Elf64_Nhdr) + alignUp(note->n_namesz, 4) + alignUp(note->n_descsz, 4); } } return false; } bool GElfRelocation::push(uint32_t type, Symbol* symbol, uint64_t offset, int64_t addend) { Rela()->r_info = GELF_R_INFO((uint64_t) symbol->index(), type); Rela()->r_offset = offset; Rela()->r_addend = addend; return true; } RelocationSection* GElfRelocation::section() { return rsection; } Symbol* GElfRelocation::symbol() { return rsection->symtab->symbol(symbolIndex()); } GElfRelocationSection::GElfRelocationSection(GElfImage* elf, Section* section_, GElfSymbolTable* symtab_) : GElfSection(elf), section(section_), symtab(symtab_) { } bool GElfRelocationSection::push(const std::string& name) { return GElfSection::push(name.c_str(), SHT_RELA, 0, symtab->getSectionIndex(), section->getSectionIndex(), 0, sizeof(Elf64_Rela)); } Relocation* GElfRelocationSection::addRelocation(uint32_t type, Symbol* symbol, uint64_t offset, int64_t addend) { GElfRelocation *rela = new (std::nothrow) GElfRelocation(this, data, data.reserve()); if (!rela || !rela->push(type, symbol, offset, addend)) { delete rela; return nullptr; } relocations.push_back(std::unique_ptr(rela)); return rela; } bool GElfRelocationSection::pullData() { section = elf->section(hdr.sh_info); symtab = elf->getSymtab(hdr.sh_link); Elf_Scn *lScn = elf_getscn(elf->e, ndxscn); assert(lScn); Elf_Data *lData = elf_getdata(lScn, nullptr); assert(lData); data0 = Buffer((const Buffer::byte_type*)lData->d_buf, lData->d_size, lData->d_align); for (size_t i = 0; i < data0.size() / sizeof(GElf_Rela); ++i) { relocations.push_back(std::unique_ptr(new GElfRelocation(this, data0, i * sizeof(GElf_Rela)))); } return true; } GElfImage::GElfImage(int elfclass_) : frozen(true), elfclass(elfclass_), buffer(0), bufferSize(0), e(0), shstrtabSection(0), strtabSection(0), symtabSection(0), noteSection(0) { if (EV_NONE == elf_version(EV_CURRENT)) { assert(false); } } GElfImage::~GElfImage() { elf_end(e); } bool GElfImage::imgError() { out << img.output(); return false; } const char *GElfImage::elfError() { return elf_errmsg(-1); } bool GElfImage::elfBegin(Elf_Cmd cmd) { if ((e = elf_begin(img.fd(), cmd, NULL #ifdef AMD_LIBELF , NULL #endif )) == NULL) { out << "elf_begin failed: " << elfError() << std::endl; return false; } return true; } bool GElfImage::initNew(uint16_t machine, uint16_t type, uint8_t os_abi, uint8_t abi_version, uint32_t e_flags) { if (!img.create()) { return imgError(); } if (!elfBegin(ELF_C_WRITE)) { return false; } if (!gelf_newehdr(e, elfclass)) { return elfError("gelf_newehdr failed"); } if (!gelf_getehdr(e, &ehdr)) { return elfError("gelf_getehdr failed"); } ehdr.e_ident[EI_DATA] = ELFDATA2LSB; ehdr.e_ident[EI_VERSION] = EV_CURRENT; ehdr.e_ident[EI_OSABI] = os_abi; ehdr.e_ident[EI_ABIVERSION] = abi_version; ehdr.e_machine = machine; ehdr.e_type = type; ehdr.e_version = EV_CURRENT; ehdr.e_flags = e_flags; if (!gelf_update_ehdr(e, &ehdr)) { return elfError("gelf_updateehdr failed"); } sections.push_back(std::unique_ptr()); if (!shstrtab()->push(".shstrtab", SHT_STRTAB, SHF_STRINGS)) { return elfError("Failed to create shstrtab"); } ehdr.e_shstrndx = shstrtab()->getSectionIndex(); if (!gelf_update_ehdr(e, &ehdr)) { return elfError("gelf_updateehdr failed"); } if (!strtab()->push(".strtab", SHT_STRTAB, SHF_STRINGS)) { return elfError("Failed to create strtab"); } frozen = false; return true; } bool GElfImage::loadFromFile(const std::string& filename) { if (!img.create()) { return imgError(); } if (!img.readFrom(filename)) { return imgError(); } if (!elfBegin(ELF_C_RDWR)) { return false; } return pullElf(); } bool GElfImage::saveToFile(const std::string& filename) { if (buffer) { std::ofstream out(filename.c_str(), std::ios::binary); if (out.fail()) { return false; } out.write(buffer, bufferSize); return !out.fail(); } else { if (!push()) { return false; } return img.writeTo(filename); } } bool GElfImage::initFromBuffer(const void* buffer, size_t size) { if (size == 0) { size = ElfSize(buffer); } if (!img.create()) { return imgError(); } if (!img.copyFrom(buffer, size)) { return imgError(); } if (!elfBegin(ELF_C_RDWR)) { return false; } return pullElf(); } bool GElfImage::initAsBuffer(const void* buffer, size_t size) { if (size == 0) { size = ElfSize(buffer); } if ((e = elf_memory(reinterpret_cast(const_cast(buffer)), size #ifdef AMD_LIBELF , NULL #endif )) == NULL) { out << "elf_begin(buffer) failed: " << elfError() << std::endl; return false; } this->buffer = reinterpret_cast(buffer); this->bufferSize = size; return pullElf(); } bool GElfImage::pullElf() { if (!gelf_getehdr(e, &ehdr)) { return elfError("gelf_getehdr failed"); } segments.reserve(ehdr.e_phnum); for (size_t i = 0; i < ehdr.e_phnum; ++i) { GElfSegment* segment = new GElfSegment(this, i); segment->pull(); segments.push_back(std::unique_ptr(segment)); } shstrtabSection = new GElfStringTable(this); if (!shstrtabSection->pull(ehdr.e_shstrndx)) { return false; } Elf_Scn* scn = 0; for (unsigned n = 0; n < ehdr.e_shnum; ++n) { scn = elf_getscn(e, n); if (n == ehdr.e_shstrndx) { sections.push_back(std::unique_ptr(shstrtabSection)); continue; } GElf_Shdr shdr; if (!gelf_getshdr(scn, &shdr)) { return elfError("Failed to get shdr"); } GElfSection* section = 0; if (shdr.sh_type == SHT_NOTE) { section = new GElfNoteSection(this); } else if (shdr.sh_type == SHT_RELA) { section = new GElfRelocationSection(this); } else if (shdr.sh_type == SHT_STRTAB) { section = new GElfStringTable(this); } else if (shdr.sh_type == SHT_SYMTAB || shdr.sh_type == SHT_DYNSYM) { section = new GElfSymbolTable(this); } else if (shdr.sh_type == SHT_NULL) { section = 0; sections.push_back(std::unique_ptr()); } else { section = new GElfSection(this); } if (section) { sections.push_back(std::unique_ptr(section)); if (!section->pull(n)) { return false; } } } for (size_t n = 1; n < sections.size(); ++n) { GElfSection* section = sections[n].get(); if (section->type() == SHT_STRTAB) { if (!section->pullData()) { return false; } } } for (size_t n = 1; n < sections.size(); ++n) { GElfSection* section = sections[n].get(); if (section->type() == SHT_SYMTAB || section->type() == SHT_DYNSYM) { if (!section->pullData()) { return false; } } } for (size_t n = 1; n < sections.size(); ++n) { GElfSection* section = sections[n].get(); if (section->type() != SHT_STRTAB && section->type() != SHT_SYMTAB && section->type() != SHT_DYNSYM) { if (!section->pullData()) { return false; } } } for (size_t i = 1; i < sections.size(); ++i) { if (i == ehdr.e_shstrndx || i == ehdr.e_shstrndx) { continue; } std::unique_ptr& section = sections[i]; if (section->Name() == ".strtab") { strtabSection = static_cast(section.get()); } if (section->Name() == ".symtab") { symtabSection = static_cast(section.get()); } if (section->Name() == ".note") { noteSection = static_cast(section.get()); } } size_t phnum; if (elf_getphdrnum(e, &phnum) < 0) { return elfError("elf_getphdrnum failed"); } for (size_t i = 0; i < phnum; ++i) { segments.push_back(std::unique_ptr(new GElfSegment(this, i))); if (!segments[i]->pull()) { return false; } } return true; } bool GElfImage::elfError(const char* msg) { out << "Error: " << msg << ": " << elfError() << std::endl; return false; } uint64_t GElfImage::size() { if (buffer) { return ElfSize(buffer); } else { return img.getSize(); } } bool GElfImage::push0() { assert(e); for (std::unique_ptr& section : sections) { if (section && !section->push()) { return false; } } for (std::unique_ptr& section : sections) { if (section && !section->pull0()) { return false; } } if (!segments.empty()) { if (!gelf_newphdr(e, segments.size())) { return elfError("gelf_newphdr failed"); } } if (elf_update(e, ELF_C_NULL) < 0) { return elfError("elf_update (1.1) failed"); } if (!segments.empty()) { for (std::unique_ptr& section : sections) { // Update section offsets. if (section && !section->pull0()) { return false; } } uint64_t vaddr = 0; for (std::unique_ptr& segment : segments) { if (!segment->push(vaddr)) { return false; } vaddr = segment->vaddr() + segment->memSize(); } } return true; } bool GElfImage::push() { if (!push0()) { return false; } if (elf_update(e, ELF_C_WRITE) < 0) { return elfError("elf_update (2) failed"); } return true; } Segment* GElfImage::segmentByVAddr(uint64_t vaddr) { for (std::unique_ptr& seg : segments) { if (seg->vaddr() <= vaddr && vaddr < seg->vaddr() + seg->memSize()) { return seg.get(); } } return 0; } Section* GElfImage::sectionByVAddr(uint64_t vaddr) { for (size_t n = 1; n < sections.size(); ++n) { if (sections[n]->addr() <= vaddr && vaddr < sections[n]->addr() + sections[n]->size()) { return sections[n].get(); } } return nullptr; } bool GElfImage::elfEnd() { return false; } bool GElfImage::writeTo(const std::string& filename) { if (!img.writeTo(filename)) { return imgError(); } return true; } bool GElfImage::copyToBuffer(void** buf, size_t* size) { if (buffer) { *buf = malloc(bufferSize); memcpy(*buf, buffer, bufferSize); if (size) { *size = bufferSize; } return true; } else { return img.copyTo(buf, size); } } bool GElfImage::copyToBuffer(void* buf, size_t size) { if (buffer) { if (size < bufferSize) { return false; } memcpy(buf, buffer, bufferSize); return true; } else { return img.copyTo(buf, size); } } GElfStringTable* GElfImage::addStringTable(const std::string& name) { GElfStringTable* stab = new GElfStringTable(this); sections.push_back(std::unique_ptr(stab)); return stab; } GElfStringTable* GElfImage::getStringTable(uint16_t index) { return static_cast(sections[index].get()); } GElfSymbolTable* GElfImage::addSymbolTable(const std::string& name, StringTable* stab) { if (!stab) { stab = strtab(); } const char* name0 = shstrtab()->addString(name); GElfSymbolTable* symtab = new GElfSymbolTable(this); symtab->push(name0, static_cast(stab)); sections.push_back(std::unique_ptr(symtab)); return symtab; } GElfStringTable* GElfImage::shstrtab() { if (!shstrtabSection) { shstrtabSection = addStringTable(".shstrtab"); } return shstrtabSection; } GElfStringTable* GElfImage::strtab() { if (!strtabSection) { strtabSection = addStringTable(".shstrtab"); } return strtabSection; } GElfSymbolTable* GElfImage::symtab() { if (!symtabSection) { symtabSection = addSymbolTable(".symtab", strtab()); } return symtabSection; } GElfNoteSection* GElfImage::note() { if (!noteSection) { noteSection = addNoteSection(".note"); } return noteSection; } GElfNoteSection* GElfImage::addNoteSection(const std::string& name) { GElfNoteSection* note = new GElfNoteSection(this); note->push(name); sections.push_back(std::unique_ptr(note)); return note; } Segment* GElfImage::initSegment(uint32_t type, uint32_t flags, uint64_t paddr) { GElfSegment *seg = new (std::nothrow) GElfSegment(this, segments.size(), type, flags, paddr); segments.push_back(std::unique_ptr(seg)); return seg; } bool GElfImage::addSegments() { return true; } Section* GElfImage::addSection(const std::string &name, uint32_t type, uint64_t flags, uint64_t entsize, Segment* segment) { GElfSection *section = new (std::nothrow) GElfSection(this); if (!section || !section->push(name.c_str(), type, flags, 0, 0, 0, entsize)) { delete section; return nullptr; } if (segment) { if (!segment->updateAddSection(section)) { delete section; return nullptr; } } sections.push_back(std::unique_ptr(section)); return section; } RelocationSection* GElfImage::addRelocationSection(Section* sec, SymbolTable* symtab) { std::string section_name = ".rela" + sec->Name(); if (!symtab) { symtab = this->symtab(); } GElfRelocationSection *rsec = new GElfRelocationSection(this, sec, (GElfSymbolTable*) symtab); if (!rsec || !rsec->push(section_name)) { delete rsec; return nullptr; } sections.push_back(std::unique_ptr(rsec)); return rsec; } RelocationSection* GElfImage::relocationSection(Section* sec, SymbolTable* symtab) { return sec->relocationSection(symtab); } uint16_t GElfImage::machine() const { return ehdr.e_machine; } uint16_t GElfImage::etype() const { return ehdr.e_type; } Image* NewElf32Image() { return new GElfImage(ELFCLASS32); } Image* NewElf64Image() { return new GElfImage(ELFCLASS64); } uint64_t ElfSize(const void* emi) { const Elf64_Ehdr *ehdr = (const Elf64_Ehdr*) emi; if (NULL == ehdr || EV_CURRENT != ehdr->e_version) { return false; } const Elf64_Shdr *shdr = (const Elf64_Shdr*)((char*)emi + ehdr->e_shoff); if (NULL == shdr) { return false; } uint64_t max_offset = ehdr->e_shoff; uint64_t total_size = max_offset + ehdr->e_shentsize * ehdr->e_shnum; for (uint16_t i = 0; i < ehdr->e_shnum; ++i) { uint64_t cur_offset = static_cast(shdr[i].sh_offset); if (max_offset < cur_offset) { max_offset = cur_offset; total_size = max_offset; if (SHT_NOBITS != shdr[i].sh_type) { total_size += static_cast(shdr[i].sh_size); } } } return total_size; } std::string GetNoteString(uint32_t s_size, const char* s) { if (!s_size) { return ""; } if (s[s_size-1] == '\0') { return std::string(s, s_size-1); } else { return std::string(s, s_size); } } } // namespace elf } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/amd_hsa_code.cpp000066400000000000000000002046101420110115200232660ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include #include #include #include #include "core/inc/amd_hsa_code.hpp" #include "amd_hsa_code_util.hpp" #include #include "inc/amd_hsa_elf.h" #include #include #include #include #ifdef SP3_STATIC_LIB #include "sp3.h" #endif // SP3_STATIC_LIB #ifndef _WIN32 #define _alloca alloca #endif namespace rocr { namespace amd { namespace hsa { namespace code { using amd::elf::GetNoteString; bool Symbol::IsDeclaration() const { return elfsym->type() == STT_COMMON; } bool Symbol::IsDefinition() const { return !IsDeclaration(); } bool Symbol::IsAgent() const { return elfsym->section()->flags() & SHF_AMDGPU_HSA_AGENT ? true : false; } hsa_symbol_linkage_t Symbol::Linkage() const { return elfsym->binding() == STB_GLOBAL ? HSA_SYMBOL_LINKAGE_PROGRAM : HSA_SYMBOL_LINKAGE_MODULE; } hsa_variable_allocation_t Symbol::Allocation() const { return IsAgent() ? HSA_VARIABLE_ALLOCATION_AGENT : HSA_VARIABLE_ALLOCATION_PROGRAM; } hsa_variable_segment_t Symbol::Segment() const { return elfsym->section()->flags() & SHF_AMDGPU_HSA_READONLY ? HSA_VARIABLE_SEGMENT_READONLY : HSA_VARIABLE_SEGMENT_GLOBAL; } uint64_t Symbol::Size() const { return elfsym->size(); } uint32_t Symbol::Size32() const { assert(elfsym->size() < UINT32_MAX); return (uint32_t) Size(); } uint32_t Symbol::Alignment() const { assert(elfsym->section()->addralign() < UINT32_MAX); return uint32_t(elfsym->section()->addralign()); } bool Symbol::IsConst() const { return elfsym->section()->flags() & SHF_WRITE ? true : false; } hsa_status_t Symbol::GetInfo(hsa_code_symbol_info_t attribute, void *value) { assert(value); switch (attribute) { case HSA_CODE_SYMBOL_INFO_TYPE: { *((hsa_symbol_kind_t*)value) = Kind(); break; } case HSA_CODE_SYMBOL_INFO_NAME_LENGTH: { *((uint32_t*)value) = GetSymbolName().size(); break; } case HSA_CODE_SYMBOL_INFO_NAME: { std::string SymbolName = GetSymbolName(); memset(value, 0x0, SymbolName.size()); memcpy(value, SymbolName.c_str(), SymbolName.size()); break; } case HSA_CODE_SYMBOL_INFO_MODULE_NAME_LENGTH: { *((uint32_t*)value) = GetModuleName().size(); break; } case HSA_CODE_SYMBOL_INFO_MODULE_NAME: { std::string ModuleName = GetModuleName(); memset(value, 0x0, ModuleName.size()); memcpy(value, ModuleName.c_str(), ModuleName.size()); break; } case HSA_CODE_SYMBOL_INFO_LINKAGE: { *((hsa_symbol_linkage_t*)value) = Linkage(); break; } case HSA_CODE_SYMBOL_INFO_IS_DEFINITION: { *((bool*)value) = IsDefinition(); break; } default: { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } } return HSA_STATUS_SUCCESS; } std::string Symbol::GetModuleName() const { std::string FullName = Name(); return FullName.rfind(":") != std::string::npos ? FullName.substr(0, FullName.find(":")) : ""; } std::string Symbol::GetSymbolName() const { std::string FullName = Name(); return FullName.rfind(":") != std::string::npos ? FullName.substr(FullName.rfind(":") + 1) : FullName; } hsa_code_symbol_t Symbol::ToHandle(Symbol* sym) { hsa_code_symbol_t s; s.handle = reinterpret_cast(sym); return s; } Symbol* Symbol::FromHandle(hsa_code_symbol_t s) { return reinterpret_cast(s.handle); } KernelSymbol::KernelSymbol(amd::elf::Symbol* elfsym_, const amd_kernel_code_t* akc) : Symbol(elfsym_) , kernarg_segment_size(0) , kernarg_segment_alignment(0) , group_segment_size(0) , private_segment_size(0) , is_dynamic_callstack(0) { if (akc) { kernarg_segment_size = (uint32_t) akc->kernarg_segment_byte_size; kernarg_segment_alignment = (uint32_t) (1 << akc->kernarg_segment_alignment); group_segment_size = uint32_t(akc->workgroup_group_segment_byte_size); private_segment_size = uint32_t(akc->workitem_private_segment_byte_size); is_dynamic_callstack = AMD_HSA_BITS_GET(akc->kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_IS_DYNAMIC_CALLSTACK) ? true : false; } } hsa_status_t KernelSymbol::GetInfo(hsa_code_symbol_info_t attribute, void *value) { assert(value); switch (attribute) { case HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE: { *((uint32_t*)value) = kernarg_segment_size; break; } case HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT: { *((uint32_t*)value) = kernarg_segment_alignment; break; } case HSA_CODE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE: { *((uint32_t*)value) = group_segment_size; break; } case HSA_CODE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE: { *((uint32_t*)value) = private_segment_size; break; } case HSA_CODE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK: { *((bool*)value) = is_dynamic_callstack; break; } default: { return Symbol::GetInfo(attribute, value); } } return HSA_STATUS_SUCCESS; } hsa_status_t VariableSymbol::GetInfo(hsa_code_symbol_info_t attribute, void *value) { assert(value); switch (attribute) { case HSA_CODE_SYMBOL_INFO_VARIABLE_ALLOCATION: { *((hsa_variable_allocation_t*)value) = Allocation(); break; } case HSA_CODE_SYMBOL_INFO_VARIABLE_SEGMENT: { *((hsa_variable_segment_t*)value) = Segment(); break; } case HSA_CODE_SYMBOL_INFO_VARIABLE_ALIGNMENT: { *((uint32_t*)value) = Alignment(); break; } case HSA_CODE_SYMBOL_INFO_VARIABLE_SIZE: { *((uint32_t*)value) = Size(); break; } case HSA_CODE_SYMBOL_INFO_VARIABLE_IS_CONST: { *((bool*)value) = IsConst(); break; } default: { return Symbol::GetInfo(attribute, value); } } return HSA_STATUS_SUCCESS; } AmdHsaCode::AmdHsaCode(bool combineDataSegments_) : img(nullptr), combineDataSegments(combineDataSegments_), hsatext(0), imageInit(0), samplerInit(0), debugInfo(0), debugLine(0), debugAbbrev(0) { for (unsigned i = 0; i < AMDGPU_HSA_SEGMENT_LAST; ++i) { for (unsigned j = 0; j < 2; ++j) { hsaSegments[i][j] = 0; } } for (unsigned i = 0; i < AMDGPU_HSA_SECTION_LAST; ++i) { hsaSections[i] = 0; } } AmdHsaCode::~AmdHsaCode() { for (Symbol* sym : symbols) { delete sym; } } bool AmdHsaCode::PullElf() { uint32_t majorVersion, minorVersion; if (!GetCodeObjectVersion(&majorVersion, &minorVersion)) { return false; } if (majorVersion >= 2) { return PullElfV2(); } else { return PullElfV1(); } } bool AmdHsaCode::PullElfV1() { for (size_t i = 0; i < img->segmentCount(); ++i) { Segment* s = img->segment(i); if (s->type() == PT_AMDGPU_HSA_LOAD_GLOBAL_PROGRAM || s->type() == PT_AMDGPU_HSA_LOAD_GLOBAL_AGENT || s->type() == PT_AMDGPU_HSA_LOAD_READONLY_AGENT || s->type() == PT_AMDGPU_HSA_LOAD_CODE_AGENT) { dataSegments.push_back(s); } } for (size_t i = 0; i < img->sectionCount(); ++i) { Section* sec = img->section(i); if (!sec) { continue; } if ((sec->type() == SHT_PROGBITS || sec->type() == SHT_NOBITS) && (sec->flags() & (SHF_AMDGPU_HSA_AGENT | SHF_AMDGPU_HSA_GLOBAL | SHF_AMDGPU_HSA_READONLY | SHF_AMDGPU_HSA_CODE))) { dataSections.push_back(sec); } else if (sec->type() == SHT_RELA) { relocationSections.push_back(sec->asRelocationSection()); } if (sec->Name() == ".hsatext") { hsatext = sec; } } for (size_t i = 0; i < img->symtab()->symbolCount(); ++i) { amd::elf::Symbol* elfsym = img->symtab()->symbol(i); Symbol* sym = 0; switch (elfsym->type()) { case STT_AMDGPU_HSA_KERNEL: { amd::elf::Section* sec = elfsym->section(); amd_kernel_code_t akc; if (!sec) { out << "Failed to find section for symbol " << elfsym->name() << std::endl; return false; } if (!(sec->flags() & (SHF_AMDGPU_HSA_AGENT | SHF_AMDGPU_HSA_CODE | SHF_EXECINSTR))) { out << "Invalid code section for symbol " << elfsym->name() << std::endl; return false; } if (!sec->getData(elfsym->value(), &akc, sizeof(amd_kernel_code_t))) { out << "Failed to get AMD Kernel Code for symbol " << elfsym->name() << std::endl; return false; } sym = new KernelSymbol(elfsym, &akc); break; } case STT_OBJECT: case STT_COMMON: sym = new VariableSymbol(elfsym); break; default: break; // Skip unknown symbols. } if (sym) { symbols.push_back(sym); } } return true; } bool AmdHsaCode::LoadFromFile(const std::string& filename) { if (!img) { img.reset(amd::elf::NewElf64Image()); } if (!img->loadFromFile(filename)) { return ElfImageError(); } if (!PullElf()) { return ElfImageError(); } return true; } bool AmdHsaCode::SaveToFile(const std::string& filename) { return img->saveToFile(filename) || ElfImageError(); } bool AmdHsaCode::WriteToBuffer(void* buffer) { return img->copyToBuffer(buffer, ElfSize()) || ElfImageError(); } bool AmdHsaCode::InitFromBuffer(const void* buffer, size_t size) { if (!img) { img.reset(amd::elf::NewElf64Image()); } if (!img->initFromBuffer(buffer, size)) { return ElfImageError(); } if (!PullElf()) { return ElfImageError(); } return true; } bool AmdHsaCode::InitAsBuffer(const void* buffer, size_t size) { if (!img) { img.reset(amd::elf::NewElf64Image()); } if (!img->initAsBuffer(buffer, size)) { return ElfImageError(); } if (!PullElf()) { return ElfImageError(); } return true; } bool AmdHsaCode::InitAsHandle(hsa_code_object_t code_object) { void *elfmemrd = reinterpret_cast(code_object.handle); if (!elfmemrd) { return false; } return InitAsBuffer(elfmemrd, 0); } bool AmdHsaCode::InitNew(bool xnack) { if (!img) { img.reset(amd::elf::NewElf64Image()); uint32_t flags = 0; if (xnack) { flags |= ELF::EF_AMDGPU_FEATURE_XNACK_V2; } return img->initNew(ELF::EM_AMDGPU, ET_EXEC, ELF::ELFOSABI_AMDGPU_HSA, ELF::ELFABIVERSION_AMDGPU_HSA_V2, flags) || ElfImageError(); // FIXME: elfutils libelf does not allow program headers in ET_REL file type, so change it later in finalizer. } return false; } bool AmdHsaCode::Freeze() { return img->Freeze() || ElfImageError(); } hsa_code_object_t AmdHsaCode::GetHandle() { hsa_code_object_t code_object; code_object.handle = reinterpret_cast(img->data()); return code_object; } const char* AmdHsaCode::ElfData() { return img->data(); } uint64_t AmdHsaCode::ElfSize() { return img->size(); } bool AmdHsaCode::Validate() { if (!img->Validate()) { return ElfImageError(); } if (img->Machine() != ELF::EM_AMDGPU) { out << "ELF error: Invalid machine" << std::endl; return false; } return true; } void AmdHsaCode::AddAmdNote(uint32_t type, const void* desc, uint32_t desc_size) { img->note()->addNote("AMD", type, desc, desc_size); } void AmdHsaCode::AddNoteCodeObjectVersion(uint32_t major, uint32_t minor) { amdgpu_hsa_note_code_object_version_t desc; desc.major_version = major; desc.minor_version = minor; AddAmdNote(NT_AMD_HSA_CODE_OBJECT_VERSION, &desc, sizeof(desc)); } bool AmdHsaCode::GetCodeObjectVersion(uint32_t* major, uint32_t* minor) { switch (img->ABIVersion()) { case ELF::ELFABIVERSION_AMDGPU_HSA_V2: amdgpu_hsa_note_code_object_version_t* desc; if (GetAmdNote(NT_AMD_HSA_CODE_OBJECT_VERSION, &desc)) { *major = desc->major_version; *minor = desc->minor_version; return *major <= 2; } return false; case ELF::ELFABIVERSION_AMDGPU_HSA_V3: *major = 3; *minor = 0; return true; case ELF::ELFABIVERSION_AMDGPU_HSA_V4: *major = 4; *minor = 0; return true; } return false; } bool AmdHsaCode::GetNoteCodeObjectVersion(std::string& version) { amdgpu_hsa_note_code_object_version_t* desc; if (!GetAmdNote(NT_AMD_HSA_CODE_OBJECT_VERSION, &desc)) { return false; } version.clear(); version += std::to_string(desc->major_version); version += "."; version += std::to_string(desc->minor_version); return true; } void AmdHsaCode::AddNoteHsail(uint32_t hsail_major, uint32_t hsail_minor, hsa_profile_t profile, hsa_machine_model_t machine_model, hsa_default_float_rounding_mode_t rounding_mode) { amdgpu_hsa_note_hsail_t desc; memset(&desc, 0, sizeof(desc)); desc.hsail_major_version = hsail_major; desc.hsail_minor_version = hsail_minor; desc.profile = uint8_t(profile); desc.machine_model = uint8_t(machine_model); desc.default_float_round = uint8_t(rounding_mode); AddAmdNote(NT_AMD_HSA_HSAIL, &desc, sizeof(desc)); } bool AmdHsaCode::GetNoteHsail(uint32_t* hsail_major, uint32_t* hsail_minor, hsa_profile_t* profile, hsa_machine_model_t* machine_model, hsa_default_float_rounding_mode_t* default_float_round) { amdgpu_hsa_note_hsail_t *desc; if (!GetAmdNote(NT_AMD_HSA_HSAIL, &desc)) { return false; } *hsail_major = desc->hsail_major_version; *hsail_minor = desc->hsail_minor_version; *profile = (hsa_profile_t) desc->profile; *machine_model = (hsa_machine_model_t) desc->machine_model; *default_float_round = (hsa_default_float_rounding_mode_t) desc->default_float_round; return true; } void AmdHsaCode::AddNoteIsa(const std::string& vendor_name, const std::string& architecture_name, uint32_t major, uint32_t minor, uint32_t stepping) { size_t size = sizeof(amdgpu_hsa_note_producer_t) + vendor_name.length() + architecture_name.length() + 1; amdgpu_hsa_note_isa_t* desc = (amdgpu_hsa_note_isa_t*) _alloca(size); memset(desc, 0, size); desc->vendor_name_size = vendor_name.length()+1; desc->architecture_name_size = architecture_name.length()+1; desc->major = major; desc->minor = minor; desc->stepping = stepping; memcpy(desc->vendor_and_architecture_name, vendor_name.c_str(), vendor_name.length() + 1); memcpy(desc->vendor_and_architecture_name + desc->vendor_name_size, architecture_name.c_str(), architecture_name.length() + 1); AddAmdNote(NT_AMD_HSA_ISA_VERSION, desc, size); } bool AmdHsaCode::GetNoteIsa(std::string& vendor_name, std::string& architecture_name, uint32_t* major_version, uint32_t* minor_version, uint32_t* stepping) { amdgpu_hsa_note_isa_t *desc; if (!GetAmdNote(NT_AMD_HSA_ISA_VERSION, &desc)) { return false; } vendor_name = GetNoteString(desc->vendor_name_size, desc->vendor_and_architecture_name); architecture_name = GetNoteString(desc->architecture_name_size, desc->vendor_and_architecture_name + vendor_name.length() + 1); *major_version = desc->major; *minor_version = desc->minor; *stepping = desc->stepping; return true; } // TODO: Move isa registry into the loader. static bool GetMachInfo(unsigned mach, std::string &name, bool &sramecc_supported, bool &xnack_supported) { switch (mach) { case ELF::EF_AMDGPU_MACH_AMDGCN_GFX600: name = "gfx600"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX601: name = "gfx601"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX602: name = "gfx602"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX701: name = "gfx701"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX702: name = "gfx702"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX703: name = "gfx703"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX704: name = "gfx704"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX705: name = "gfx705"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX801: name = "gfx801"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX802: name = "gfx802"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX803: name = "gfx803"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX805: name = "gfx805"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX810: name = "gfx810"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX900: name = "gfx900"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX902: name = "gfx902"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX904: name = "gfx904"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX906: name = "gfx906"; xnack_supported = true; sramecc_supported = true; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX908: name = "gfx908"; xnack_supported = true; sramecc_supported = true; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX909: name = "gfx909"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX90A: name = "gfx90a"; xnack_supported = true; sramecc_supported = true; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX90C: name = "gfx90c"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1010: name = "gfx1010"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1011: name = "gfx1011"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1012: name = "gfx1012"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1013: name = "gfx1013"; xnack_supported = true; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1030: name = "gfx1030"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1031: name = "gfx1031"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1032: name = "gfx1032"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1033: name = "gfx1033"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1034: name = "gfx1034"; xnack_supported = false; sramecc_supported = false; break; case ELF::EF_AMDGPU_MACH_AMDGCN_GFX1035: name = "gfx1035"; xnack_supported = false; sramecc_supported = false; break; default: return false; } return true; } // This fuction is also copied to the Code Object Manager library. static std::string ConvertOldTargetNameToNew(const std::string &old_name, bool is_finalizer, uint32_t e_flags) { assert(!old_name.empty() && "Expecting non-empty old name"); unsigned mach = 0; if (old_name == "AMD:AMDGPU:6:0:0") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX600; else if (old_name == "AMD:AMDGPU:6:0:1") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX601; else if (old_name == "AMD:AMDGPU:6:0:2") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX602; else if (old_name == "AMD:AMDGPU:7:0:0") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX700; else if (old_name == "AMD:AMDGPU:7:0:1") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX701; else if (old_name == "AMD:AMDGPU:7:0:2") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX702; else if (old_name == "AMD:AMDGPU:7:0:3") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX703; else if (old_name == "AMD:AMDGPU:7:0:4") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX704; else if (old_name == "AMD:AMDGPU:7:0:5") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX705; else if (old_name == "AMD:AMDGPU:8:0:1") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX801; else if (old_name == "AMD:AMDGPU:8:0:0" || old_name == "AMD:AMDGPU:8:0:2") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX802; else if (old_name == "AMD:AMDGPU:8:0:3" || old_name == "AMD:AMDGPU:8:0:4") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX803; else if (old_name == "AMD:AMDGPU:8:0:5") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX805; else if (old_name == "AMD:AMDGPU:8:1:0") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX810; else if (old_name == "AMD:AMDGPU:9:0:0" || old_name == "AMD:AMDGPU:9:0:1") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX900; else if (old_name == "AMD:AMDGPU:9:0:2" || old_name == "AMD:AMDGPU:9:0:3") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX902; else if (old_name == "AMD:AMDGPU:9:0:4" || old_name == "AMD:AMDGPU:9:0:5") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX904; else if (old_name == "AMD:AMDGPU:9:0:6" || old_name == "AMD:AMDGPU:9:0:7") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX906; else if (old_name == "AMD:AMDGPU:9:0:12") mach = ELF::EF_AMDGPU_MACH_AMDGCN_GFX90C; else { // Code object v2 only supports asics up to gfx906 plus gfx90c. Do NOT // add handling of new asics into this if-else-if* block. return ""; } std::string name; bool sramecc_supported = false; bool xnack_supported = false; if (!GetMachInfo(mach, name, sramecc_supported, xnack_supported)) return ""; // Only "AMD:AMDGPU:9:0:6" and "AMD:AMDGPU:9:0:7" supports SRAMECC for // code object V2, and it must be OFF. if (sramecc_supported) name += ":sramecc-"; if (is_finalizer) { if (e_flags & ELF::EF_AMDGPU_FEATURE_XNACK_V2) name += ":xnack+"; else if (xnack_supported) name += ":xnack-"; } else { if (old_name == "AMD:AMDGPU:8:0:1") name += ":xnack+"; else if (old_name == "AMD:AMDGPU:8:1:0") name += ":xnack+"; else if (old_name == "AMD:AMDGPU:9:0:1") name += ":xnack+"; else if (old_name == "AMD:AMDGPU:9:0:3") name += ":xnack+"; else if (old_name == "AMD:AMDGPU:9:0:5") name += ":xnack+"; else if (old_name == "AMD:AMDGPU:9:0:7") name += ":xnack+"; else if (xnack_supported) name += ":xnack-"; } return name; } bool AmdHsaCode::GetIsa(std::string& isa_name) { isa_name.clear(); uint32_t code_object_major_version = 0; uint32_t code_object_minor_version = 0; switch (img->EClass()) { case ELFCLASS64: // There is no e_machine and/or OS ABI for R600 so rely on checking // the ELFCLASS to determine if AMDGCN versus R600. AMDHSA always uses // ELFCLASS64 and R600 always uses ELFCLASS32. isa_name += "amdgcn"; break; default: return false; } if (img->Machine() != ELF::EM_AMDGPU) return false; isa_name += "-amd-"; if (!GetCodeObjectVersion(&code_object_major_version, &code_object_minor_version)) { return false; } if (code_object_major_version >= 3) { switch (img->OsAbi()) { case ELF::ELFOSABI_AMDGPU_HSA: isa_name += "amdhsa"; break; default: // Only support AMDHSA in the ROCm runtime. return false; } isa_name += "--"; unsigned mach = img->EFlags() & ELF::EF_AMDGPU_MACH; std::string target_name; bool xnack_supported = false; bool sramecc_supported = false; if (!GetMachInfo(mach, target_name, sramecc_supported, xnack_supported)) return false; if (code_object_major_version == 3) { if (img->EFlags() & ELF::EF_AMDGPU_FEATURE_SRAMECC_V3) target_name += ":sramecc+"; else if (sramecc_supported) target_name += ":sramecc-"; if (img->EFlags() & ELF::EF_AMDGPU_FEATURE_XNACK_V3) target_name += ":xnack+"; else if (xnack_supported) target_name += ":xnack-"; } else if (code_object_major_version == 4) { switch (img->EFlags() & ELF::EF_AMDGPU_FEATURE_SRAMECC_V4) { case ELF::EF_AMDGPU_FEATURE_SRAMECC_OFF_V4: target_name += ":sramecc-"; break; case ELF::EF_AMDGPU_FEATURE_SRAMECC_ON_V4: target_name += ":sramecc+"; break; } switch (img->EFlags() & ELF::EF_AMDGPU_FEATURE_XNACK_V4) { case ELF::EF_AMDGPU_FEATURE_XNACK_OFF_V4: target_name += ":xnack-"; break; case ELF::EF_AMDGPU_FEATURE_XNACK_ON_V4: target_name += ":xnack+"; break; } } else { return false; } isa_name += target_name; return true; } else { std::string vendor_name, architecture_name; uint32_t major_version, minor_version, stepping; if (!GetNoteIsa(vendor_name, architecture_name, &major_version, &minor_version, &stepping)) return false; isa_name += "amdhsa--"; std::string target_name = vendor_name + ':' + architecture_name + ':' + std::to_string(major_version) + ':' + std::to_string(minor_version) + ':' + std::to_string(stepping); amdgpu_hsa_note_hsail_t *hsail_note; bool is_finalizer = GetAmdNote(NT_AMD_HSA_HSAIL, &hsail_note); target_name = ConvertOldTargetNameToNew(target_name, is_finalizer, img->EFlags()); if (target_name.empty()) return false; isa_name += target_name; return true; } } void AmdHsaCode::AddNoteProducer(uint32_t major, uint32_t minor, const std::string& producer) { size_t size = sizeof(amdgpu_hsa_note_producer_t) + producer.length(); amdgpu_hsa_note_producer_t* desc = (amdgpu_hsa_note_producer_t*) _alloca(size); memset(desc, 0, size); desc->producer_name_size = producer.length(); desc->producer_major_version = major; desc->producer_minor_version = minor; memcpy(desc->producer_name, producer.c_str(), producer.length() + 1); AddAmdNote(NT_AMD_HSA_PRODUCER, desc, size); } bool AmdHsaCode::GetNoteProducer(uint32_t* major, uint32_t* minor, std::string& producer_name) { amdgpu_hsa_note_producer_t* desc; if (!GetAmdNote(NT_AMD_HSA_PRODUCER, &desc)) { return false; } *major = desc->producer_major_version; *minor = desc->producer_minor_version; producer_name = GetNoteString(desc->producer_name_size, desc->producer_name); return true; } void AmdHsaCode::AddNoteProducerOptions(const std::string& options) { size_t size = sizeof(amdgpu_hsa_note_producer_options_t) + options.length(); amdgpu_hsa_note_producer_options_t *desc = (amdgpu_hsa_note_producer_options_t*) _alloca(size); desc->producer_options_size = options.length(); memcpy(desc->producer_options, options.c_str(), options.length() + 1); AddAmdNote(NT_AMD_HSA_PRODUCER_OPTIONS, desc, size); } void AmdHsaCode::AddNoteProducerOptions(int32_t call_convention, const hsa_ext_control_directives_t& user_directives, const std::string& user_options) { using namespace code_options; std::ostringstream ss; ss << space << "-hsa_call_convention=" << call_convention << control_directives(user_directives); if (!user_options.empty()) { ss << space << user_options; } AddNoteProducerOptions(ss.str()); } bool AmdHsaCode::GetNoteProducerOptions(std::string& options) { amdgpu_hsa_note_producer_options_t* desc; if (!GetAmdNote(NT_AMD_HSA_PRODUCER_OPTIONS, &desc)) { return false; } options = GetNoteString(desc->producer_options_size, desc->producer_options); return true; } hsa_status_t AmdHsaCode::GetInfo(hsa_code_object_info_t attribute, void *value) { assert(value); switch (attribute) { case HSA_CODE_OBJECT_INFO_VERSION: { std::string version; if (!GetNoteCodeObjectVersion(version)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } char *svalue = (char*)value; memset(svalue, 0x0, 64); memcpy(svalue, version.c_str(), (std::min)(size_t(63), version.length())); break; } case HSA_CODE_OBJECT_INFO_ISA: { // TODO: Currently returns string representation instead of hsa_isa_t // which is unavailable here. std::string isa; if (!GetIsa(isa)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } char *svalue = (char*)value; memset(svalue, 0x0, 64); memcpy(svalue, isa.c_str(), (std::min)(size_t(63), isa.length())); break; } case HSA_CODE_OBJECT_INFO_MACHINE_MODEL: case HSA_CODE_OBJECT_INFO_PROFILE: case HSA_CODE_OBJECT_INFO_DEFAULT_FLOAT_ROUNDING_MODE: { uint32_t hsail_major, hsail_minor; hsa_profile_t profile; hsa_machine_model_t machine_model; hsa_default_float_rounding_mode_t default_float_round; if (!GetNoteHsail(&hsail_major, &hsail_minor, &profile, &machine_model, &default_float_round)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } switch (attribute) { case HSA_CODE_OBJECT_INFO_MACHINE_MODEL: *((hsa_machine_model_t*)value) = machine_model; break; case HSA_CODE_OBJECT_INFO_PROFILE: *((hsa_profile_t*)value) = profile; break; case HSA_CODE_OBJECT_INFO_DEFAULT_FLOAT_ROUNDING_MODE: *((hsa_default_float_rounding_mode_t*)value) = default_float_round; break; default: break; } break; } default: assert(false); return HSA_STATUS_ERROR_INVALID_ARGUMENT; } return HSA_STATUS_SUCCESS; } hsa_status_t AmdHsaCode::GetSymbol(const char *module_name, const char *symbol_name, hsa_code_symbol_t *s) { std::string mname = MangleSymbolName(module_name ? module_name : "", symbol_name); for (Symbol* sym : symbols) { if (sym->Name() == mname) { *s = Symbol::ToHandle(sym); return HSA_STATUS_SUCCESS; } } return HSA_STATUS_ERROR_INVALID_SYMBOL_NAME; } hsa_status_t AmdHsaCode::IterateSymbols(hsa_code_object_t code_object, hsa_status_t (*callback)( hsa_code_object_t code_object, hsa_code_symbol_t symbol, void* data), void* data) { for (Symbol* sym : symbols) { hsa_code_symbol_t s = Symbol::ToHandle(sym); hsa_status_t status = callback(code_object, s, data); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } Section* AmdHsaCode::ImageInitSection() { if (!imageInit) { imageInit = img->addSection( ".hsaimage_imageinit", SHT_PROGBITS, SHF_MERGE, sizeof(amdgpu_hsa_image_descriptor_t)); } return imageInit; } void AmdHsaCode::AddImageInitializer(Symbol* image, uint64_t destOffset, const amdgpu_hsa_image_descriptor_t& desc) { uint64_t offset = ImageInitSection()->addData(&desc, sizeof(desc), 8); amd::elf::Symbol* imageInit = img->symtab()->addSymbol(ImageInitSection(), "", offset, 0, STT_AMDGPU_HSA_METADATA, STB_LOCAL); image->elfSym()->section()->relocationSection()->addRelocation(R_AMDGPU_INIT_IMAGE, imageInit, image->elfSym()->value() + destOffset, 0); } void AmdHsaCode::AddImageInitializer( Symbol* image, uint64_t destOffset, amdgpu_hsa_metadata_kind16_t kind, amdgpu_hsa_image_geometry8_t geometry, amdgpu_hsa_image_channel_order8_t channel_order, amdgpu_hsa_image_channel_type8_t channel_type, uint64_t width, uint64_t height, uint64_t depth, uint64_t array) { amdgpu_hsa_image_descriptor_t desc; desc.size = (uint16_t) sizeof(amdgpu_hsa_image_descriptor_t); desc.kind = kind; desc.geometry = geometry; desc.channel_order = channel_order; desc.channel_type = channel_type; desc.width = width; desc.height = height; desc.depth = depth; desc.array = array; AddImageInitializer(image, destOffset, desc); } Section* AmdHsaCode::SamplerInitSection() { if (!samplerInit) { samplerInit = img->addSection( ".hsaimage_samplerinit", SHT_PROGBITS, SHF_MERGE, sizeof(amdgpu_hsa_sampler_descriptor_t)); } return samplerInit; } void AmdHsaCode::AddSamplerInitializer(Symbol* sampler, uint64_t destOffset, const amdgpu_hsa_sampler_descriptor_t& desc) { uint64_t offset = SamplerInitSection()->addData(&desc, sizeof(desc), 8); amd::elf::Symbol* samplerInit = img->symtab()->addSymbol(SamplerInitSection(), "", offset, 0, STT_AMDGPU_HSA_METADATA, STB_LOCAL); sampler->elfSym()->section()->relocationSection()->addRelocation(R_AMDGPU_INIT_SAMPLER, samplerInit, sampler->elfSym()->value() + destOffset, 0); } void AmdHsaCode::AddSamplerInitializer(Symbol* sampler, uint64_t destOffset, amdgpu_hsa_sampler_coord8_t coord, amdgpu_hsa_sampler_filter8_t filter, amdgpu_hsa_sampler_addressing8_t addressing) { amdgpu_hsa_sampler_descriptor_t desc; desc.size = (uint16_t) sizeof(amdgpu_hsa_sampler_descriptor_t); desc.kind = AMDGPU_HSA_METADATA_KIND_INIT_SAMP; desc.coord = coord; desc.filter = filter; desc.addressing = addressing; AddSamplerInitializer(sampler, destOffset, desc); } void AmdHsaCode::AddInitVarWithAddress(bool large, Symbol* dest, uint64_t destOffset, Symbol* addrOf, uint64_t addrAddend) { uint32_t rtype = large ? R_AMDGPU_64 : R_AMDGPU_32_LOW; dest->elfSym()->section()->relocationSection()->addRelocation(rtype, addrOf->elfSym(), dest->elfSym()->value() + destOffset, addrAddend); } uint64_t AmdHsaCode::NextKernelCodeOffset() const { return HsaText()->nextDataOffset(256); } bool AmdHsaCode::AddKernelCode(KernelSymbol* sym, const void* code, size_t size) { assert(nullptr != sym); uint64_t offset = HsaText()->addData(code, size, 256); sym->setValue(offset); sym->setSize(size); return true; } Section* AmdHsaCode::AddEmptySection() { dataSections.push_back(nullptr); return nullptr; } Section* AmdHsaCode::AddCodeSection(Segment* segment) { if (nullptr == img) { return nullptr; } Section *sec = img->addSection( ".hsatext", SHT_PROGBITS, SHF_ALLOC | SHF_EXECINSTR | SHF_WRITE | SHF_AMDGPU_HSA_CODE | SHF_AMDGPU_HSA_AGENT, 0, segment); dataSections.push_back(sec); hsatext = sec; return sec; } Section* AmdHsaCode::AddDataSection(const std::string &name, uint32_t type, uint64_t flags, Segment* segment) { if (nullptr == img) { return nullptr; } Section *sec = img->addSection(name, type, flags, 0, segment); dataSections.push_back(sec); return sec; } void AmdHsaCode::InitHsaSectionSegment(amdgpu_hsa_elf_section_t section, bool combineSegments) { InitHsaSegment(AmdHsaElfSectionSegment(section), combineSegments || !IsAmdHsaElfSectionROData(section)); } Section* AmdHsaCode::HsaDataSection(amdgpu_hsa_elf_section_t sec, bool combineSegments) { if (!hsaSections[sec]) { bool writable = combineSegments || !IsAmdHsaElfSectionROData(sec); Segment* segment = HsaSegment(AmdHsaElfSectionSegment(sec), writable); assert(segment); // Expected to be init the segment via InitHsaSegment. Section* section; switch (sec) { case AMDGPU_HSA_RODATA_GLOBAL_PROGRAM: section = AddDataSection(".hsarodata_global_program", SHT_PROGBITS, SHF_ALLOC | SHF_AMDGPU_HSA_GLOBAL, segment); break; case AMDGPU_HSA_RODATA_GLOBAL_AGENT: section = AddDataSection(".hsarodata_global_agent", SHT_PROGBITS, SHF_ALLOC | SHF_AMDGPU_HSA_GLOBAL | SHF_AMDGPU_HSA_AGENT, segment); break; case AMDGPU_HSA_RODATA_READONLY_AGENT: section = AddDataSection(".hsarodata_readonly_agent", SHT_PROGBITS, SHF_ALLOC | SHF_AMDGPU_HSA_READONLY | SHF_AMDGPU_HSA_AGENT, segment); break; case AMDGPU_HSA_DATA_GLOBAL_PROGRAM: section = AddDataSection(".hsadata_global_program", SHT_PROGBITS, SHF_ALLOC | SHF_WRITE | SHF_AMDGPU_HSA_GLOBAL, segment); break; case AMDGPU_HSA_DATA_GLOBAL_AGENT: section = AddDataSection(".hsadata_global_agent", SHT_PROGBITS, SHF_ALLOC | SHF_WRITE | SHF_AMDGPU_HSA_GLOBAL | SHF_AMDGPU_HSA_AGENT, segment); break; case AMDGPU_HSA_DATA_READONLY_AGENT: section = AddDataSection(".hsadata_readonly_agent", SHT_PROGBITS, SHF_ALLOC | SHF_WRITE | SHF_AMDGPU_HSA_READONLY | SHF_AMDGPU_HSA_AGENT, segment); break; case AMDGPU_HSA_BSS_GLOBAL_PROGRAM: section = AddDataSection(".hsabss_global_program", SHT_NOBITS, SHF_ALLOC | SHF_WRITE | SHF_AMDGPU_HSA_GLOBAL, segment); break; case AMDGPU_HSA_BSS_GLOBAL_AGENT: section = AddDataSection(".hsabss_global_agent", SHT_NOBITS, SHF_ALLOC | SHF_WRITE | SHF_AMDGPU_HSA_GLOBAL | SHF_AMDGPU_HSA_AGENT, segment); break; case AMDGPU_HSA_BSS_READONLY_AGENT: section = AddDataSection(".hsabss_readonly_agent", SHT_NOBITS, SHF_ALLOC | SHF_WRITE | SHF_AMDGPU_HSA_READONLY | SHF_AMDGPU_HSA_AGENT, segment); break; default: assert(false); return 0; } hsaSections[sec] = section; } return hsaSections[sec]; } void AmdHsaCode::InitHsaSegment(amdgpu_hsa_elf_segment_t segment, bool writable) { if (!hsaSegments[segment][writable]) { uint32_t flags = PF_R; if (writable) { flags |= PF_W; } if (segment == AMDGPU_HSA_SEGMENT_CODE_AGENT) { flags |= PF_X; } uint32_t type = PT_LOOS + segment; assert(segment < AMDGPU_HSA_SEGMENT_LAST); hsaSegments[segment][writable] = img->initSegment(type, flags); } } bool AmdHsaCode::AddHsaSegments() { if (!img->addSegments()) { return ElfImageError(); } return true; } Segment* AmdHsaCode::HsaSegment(amdgpu_hsa_elf_segment_t segment, bool writable) { return hsaSegments[segment][writable]; } Symbol* AmdHsaCode::AddExecutableSymbol(const std::string &name, unsigned char type, unsigned char binding, unsigned char other, Section *section) { if (nullptr == img) { return nullptr; } if (!section) { section = HsaText(); } symbols.push_back(new KernelSymbol(img->symtab()->addSymbol(section, name, 0, 0, type, binding, other), nullptr)); return symbols.back(); } Symbol* AmdHsaCode::AddVariableSymbol(const std::string &name, unsigned char type, unsigned char binding, unsigned char other, Section *section, uint64_t value, uint64_t size) { if (nullptr == img) { return nullptr; } symbols.push_back(new VariableSymbol(img->symtab()->addSymbol(section, name, value, size, type, binding, other))); return symbols.back(); } void AmdHsaCode::AddSectionSymbols() { if (nullptr == img) { return; } for (size_t i = 0; i < dataSections.size(); ++i) { if (dataSections[i] && dataSections[i]->flags() & SHF_ALLOC) { symbols.push_back(new VariableSymbol(img->symtab()->addSymbol(dataSections[i], "__hsa_section" + dataSections[i]->Name(), 0, 0, STT_SECTION, STB_LOCAL))); } } } Symbol* AmdHsaCode::GetSymbolByElfIndex(size_t index) { for (auto &s : symbols) { if (s && index == s->Index()) { return s; } } return nullptr; } Symbol* AmdHsaCode::FindSymbol(const std::string &n) { for (auto &s : symbols) { if (s && n == s->Name()) { return s; } } return nullptr; } void AmdHsaCode::AddData(amdgpu_hsa_elf_section_t s, const void* data, size_t size) { // getDataSection(s)->addData(data, size); } Section* AmdHsaCode::DebugInfo() { if (!debugInfo) { debugInfo = img->addSection(".debug_info", SHT_PROGBITS); } return debugInfo; } Section* AmdHsaCode::DebugLine() { if (!debugLine) { debugLine = img->addSection(".debug_line", SHT_PROGBITS); } return debugLine; } Section* AmdHsaCode::DebugAbbrev() { if (!debugAbbrev) { debugAbbrev = img->addSection(".debug_abbrev", SHT_PROGBITS); } return debugAbbrev; } Section* AmdHsaCode::AddHsaHlDebug(const std::string& name, const void* data, size_t size) { Section* section = img->addSection(name, SHT_PROGBITS, SHF_OS_NONCONFORMING); section->addData(data, size, 1); return section; } bool AmdHsaCode::PrintToFile(const std::string& filename) { std::ofstream out(filename); if (out.fail()) { return false; } Print(out); return out.fail(); } void AmdHsaCode::Print(std::ostream& out) { PrintNotes(out); out << std::endl; PrintSegments(out); out << std::endl; PrintSections(out); out << std::endl; PrintSymbols(out); out << std::endl; PrintMachineCode(out); out << std::endl; out << "AMD HSA Code Object End" << std::endl; } void AmdHsaCode::PrintNotes(std::ostream& out) { { uint32_t major_version, minor_version; if (GetCodeObjectVersion(&major_version, &minor_version)) { out << "AMD HSA Code Object" << std::endl << " Version " << major_version << "." << minor_version << std::endl; } } { uint32_t hsail_major, hsail_minor; hsa_profile_t profile; hsa_machine_model_t machine_model; hsa_default_float_rounding_mode_t rounding_mode; if (GetNoteHsail(&hsail_major, &hsail_minor, &profile, &machine_model, &rounding_mode)) { out << "HSAIL " << std::endl << " Version: " << hsail_major << "." << hsail_minor << std::endl << " Profile: " << HsaProfileToString(profile) << " Machine model: " << HsaMachineModelToString(machine_model) << " Default float rounding: " << HsaFloatRoundingModeToString(rounding_mode) << std::endl; } } { std::string vendor_name, architecture_name; uint32_t major_version, minor_version, stepping; if (GetNoteIsa(vendor_name, architecture_name, &major_version, &minor_version, &stepping)) { out << "ISA" << std::endl << " Vendor " << vendor_name << " Arch " << architecture_name << " Version " << major_version << ":" << minor_version << ":" << stepping << std::endl; } } { std::string producer_name, producer_options; uint32_t major, minor; if (GetNoteProducer(&major, &minor, producer_name)) { out << "Producer '" << producer_name << "' " << "Version " << major << ":" << minor << std::endl; } } { std::string producer_options; if (GetNoteProducerOptions(producer_options)) { out << "Producer options" << std::endl << " '" << producer_options << "'" << std::endl; } } } void AmdHsaCode::PrintSegments(std::ostream& out) { out << "Segments (total " << DataSegmentCount() << "):" << std::endl; for (size_t i = 0; i < DataSegmentCount(); ++i) { PrintSegment(out, DataSegment(i)); } } void AmdHsaCode::PrintSections(std::ostream& out) { out << "Data Sections (total " << DataSectionCount() << "):" << std::endl; for (size_t i = 0; i < DataSectionCount(); ++i) { PrintSection(out, DataSection(i)); } out << std::endl; out << "Relocation Sections (total " << RelocationSectionCount() << "):" << std::endl; for (size_t i = 0; i < RelocationSectionCount(); ++i) { PrintSection(out, GetRelocationSection(i)); } } void AmdHsaCode::PrintSymbols(std::ostream& out) { out << "Symbols (total " << SymbolCount() << "):" << std::endl; for (size_t i = 0; i < SymbolCount(); ++i) { PrintSymbol(out, GetSymbol(i)); } } void AmdHsaCode::PrintMachineCode(std::ostream& out) { if (HasHsaText()) { out << std::dec; for (size_t i = 0; i < SymbolCount(); ++i) { Symbol* sym = GetSymbol(i); if (sym->IsKernelSymbol() && sym->IsDefinition()) { amd_kernel_code_t kernel_code; HsaText()->getData(sym->SectionOffset(), &kernel_code, sizeof(amd_kernel_code_t)); out << "AMD Kernel Code for " << sym->Name() << ": " << std::endl << std::dec; PrintAmdKernelCode(out, &kernel_code); out << std::endl; } } std::vector isa(HsaText()->size(), 0); HsaText()->getData(0, isa.data(), HsaText()->size()); out << "Disassembly:" << std::endl; PrintDisassembly(out, isa.data(), HsaText()->size(), 0); out << std::endl << std::dec; } else { out << "Machine code section is not present" << std::endl << std::endl; } } void AmdHsaCode::PrintSegment(std::ostream& out, Segment* segment) { out << " Segment (" << segment->getSegmentIndex() << ")" << std::endl; out << " Type: " << AmdPTLoadToString(segment->type()) << " " << " Flags: " << "0x" << std::hex << std::setw(8) << std::setfill('0') << segment->flags() << std::dec << std::endl << " Image Size: " << segment->imageSize() << " " << " Memory Size: " << segment->memSize() << " " << " Align: " << segment->align() << " " << " VAddr: " << segment->vaddr() << std::endl; out << std::dec; } void AmdHsaCode::PrintSection(std::ostream& out, Section* section) { out << " Section " << section->Name() << " (Index " << section->getSectionIndex() << ")" << std::endl; out << " Type: " << section->type() << " " << " Flags: " << "0x" << std::hex << std::setw(8) << std::setfill('0') << section->flags() << std::dec << std::endl << " Size: " << section->size() << " " << " Address: " << section->addr() << " " << " Align: " << section->addralign() << std::endl; out << std::dec; if (section->flags() & SHF_AMDGPU_HSA_CODE) { // Printed separately. return; } switch (section->type()) { case SHT_NOBITS: return; case SHT_RELA: PrintRelocationData(out, section->asRelocationSection()); return; default: PrintRawData(out, section); } } void AmdHsaCode::PrintRawData(std::ostream& out, Section* section) { out << " Data:" << std::endl; unsigned char *sdata = (unsigned char*)alloca(section->size()); section->getData(0, sdata, section->size()); PrintRawData(out, sdata, section->size()); } void AmdHsaCode::PrintRawData(std::ostream& out, const unsigned char *data, size_t size) { out << std::hex << std::setfill('0'); for (size_t i = 0; i < size; i += 16) { out << " " << std::setw(7) << i << ":"; for (size_t j = 0; j < 16; j += 1) { uint32_t value = i + j < size ? (uint32_t)data[i + j] : 0; if (j % 2 == 0) { out << ' '; } out << std::setw(2) << value; } out << " "; for (size_t j = 0; i + j < size && j < 16; j += 1) { char value = (char)data[i + j] >= 32 && (char)data[i + j] <= 126 ? (char)data[i + j] : '.'; out << value; } out << std::endl; } out << std::dec; } void AmdHsaCode::PrintRelocationData(std::ostream& out, RelocationSection* section) { if (section->targetSection()) { out << " Relocation Entries for " << section->targetSection()->Name() << " Section (total " << section->relocationCount() << "):" << std::endl; } else { // Dynamic relocations do not have a target section, they work with // virtual addresses. out << " Dynamic Relocation Entries (total " << section->relocationCount() << "):" << std::endl; } for (size_t i = 0; i < section->relocationCount(); ++i) { out << " Relocation (Index " << i << "):" << std::endl; out << " Type: " << section->relocation(i)->type() << std::endl; out << " Symbol: " << section->relocation(i)->symbol()->name() << std::endl; out << " Offset: " << section->relocation(i)->offset() << " Addend: " << section->relocation(i)->addend() << std::endl; } out << std::dec; } void AmdHsaCode::PrintSymbol(std::ostream& out, Symbol* sym) { out << " Symbol " << sym->Name() << " (Index " << sym->Index() << "):" << std::endl; if (sym->IsKernelSymbol() || sym->IsVariableSymbol()) { out << " Section: " << sym->GetSection()->Name() << " "; out << " Section Offset: " << sym->SectionOffset() << std::endl; out << " VAddr: " << sym->VAddr() << " "; out << " Size: " << sym->Size() << " "; out << " Alignment: " << sym->Alignment() << std::endl; out << " Kind: " << HsaSymbolKindToString(sym->Kind()) << " "; out << " Linkage: " << HsaSymbolLinkageToString(sym->Linkage()) << " "; out << " Definition: " << (sym->IsDefinition() ? "TRUE" : "FALSE") << std::endl; } if (sym->IsVariableSymbol()) { out << " Allocation: " << HsaVariableAllocationToString(sym->Allocation()) << " "; out << " Segment: " << HsaVariableSegmentToString(sym->Segment()) << " "; out << " Constant: " << (sym->IsConst() ? "TRUE" : "FALSE") << std::endl; } out << std::dec; } void AmdHsaCode::PrintMachineCode(std::ostream& out, KernelSymbol* sym) { assert(HsaText()); amd_kernel_code_t kernel_code; HsaText()->getData(sym->SectionOffset(), &kernel_code, sizeof(amd_kernel_code_t)); out << "AMD Kernel Code for " << sym->Name() << ": " << std::endl << std::dec; PrintAmdKernelCode(out, &kernel_code); out << std::endl; std::vector isa(HsaText()->size(), 0); HsaText()->getData(0, isa.data(), HsaText()->size()); uint64_t isa_offset = sym->SectionOffset() + kernel_code.kernel_code_entry_byte_offset; out << "Disassembly for " << sym->Name() << ": " << std::endl; PrintDisassembly(out, isa.data(), HsaText()->size(), isa_offset); out << std::endl << std::dec; } void AmdHsaCode::PrintDisassembly(std::ostream& out, const unsigned char *isa, size_t size, uint32_t isa_offset) { #ifdef SP3_STATIC_LIB // Default asic is ci. std::string asic = "CI"; std::string vendor_name, architecture_name; uint32_t major_version, minor_version, stepping; if (GetNoteIsa(vendor_name, architecture_name, &major_version, &minor_version, &stepping)) { if (major_version == 7) { asic = "CI"; } else if (major_version == 8) { asic = "VI"; } else if (major_version == 9) { asic = "GFX9"; } else if (major_version == 10) { asic = "GFX10"; } else { assert(!"unknown compute capability"); } } struct sp3_context *dis_state = sp3_new(); sp3_setasic(dis_state, asic.c_str()); sp3_vma *dis_vma = sp3_vm_new_ptr(0, size / 4, (const uint32_t*)isa); std::vector comments(HsaText()->size() / 4, 0); for (size_t i = 0; i < SymbolCount(); ++i) { Symbol* sym = GetSymbol(i); if (sym->IsKernelSymbol() && sym->IsDefinition()) { comments[sym->SectionOffset() / 4] = COMMENT_AMD_KERNEL_CODE_T_BEGIN; comments[(sym->SectionOffset() + 252) / 4] = COMMENT_AMD_KERNEL_CODE_T_END; amd_kernel_code_t kernel_code; HsaText()->getData(sym->SectionOffset(), &kernel_code, sizeof(amd_kernel_code_t)); comments[(kernel_code.kernel_code_entry_byte_offset + sym->SectionOffset()) / 4] = COMMENT_KERNEL_ISA_BEGIN; } } sp3_vma *comment_vma = sp3_vm_new_ptr(0, comments.size(), (const uint32_t*)comments.data()); sp3_setcomments(dis_state, comment_vma, CommentTopCallBack, CommentRightCallBack, this); // When isa_offset == 0 disassembly full hsatext section. // Otherwise disassembly only from this offset till endpgm instruction. char *text = sp3_disasm( dis_state, dis_vma, isa_offset / 4, nullptr, SP3_SHTYPE_CS, nullptr, (unsigned)(size / 4), isa_offset == 0 ? SP3DIS_FORCEVALID | SP3DIS_COMMENTS : SP3DIS_COMMENTS); enum class IsaState { UNKNOWN, AMD_KERNEL_CODE_T_BEGIN, AMD_KERNEL_CODE_T, AMD_KERNEL_CODE_T_END, ISA_BEGIN, ISA, PADDING, }; std::string line; char *text_ptr = text; IsaState state = IsaState::UNKNOWN; uint32_t offset = 0; uint32_t padding_end = 0; std::string padding; while (text_ptr && text_ptr[0] != '\0') { line.clear(); while (text_ptr[0] != '\0' && text_ptr[0] != '\n') { line.push_back(text_ptr[0]); ++text_ptr; } ltrim(line); if (text_ptr[0] == '\n') { ++text_ptr; } switch (state) { case IsaState::UNKNOWN: assert(line != "// amd_kernel_code_t end"); padding.clear(); if (line == "// amd_kernel_code_t begin") { state = IsaState::AMD_KERNEL_CODE_T_BEGIN; } else if (line == "// isa begin") { state = IsaState::ISA_BEGIN; } else if (line == "end") { out << line << std::endl; } else if (line.find("v_cndmask_b32 v0, s0, v0, vcc") != std::string::npos) { padding += " " + line + "\n"; offset = ParseInstructionOffset(line); padding_end = ParseInstructionOffset(line); state = IsaState::PADDING; } else if (line != "shader (null)") { out << " " << line << std::endl; } break; case IsaState::AMD_KERNEL_CODE_T_BEGIN: assert(line != "// amd_kernel_code_t begin"); assert(line != "// amd_kernel_code_t end"); assert(line != "// isa begin"); assert(line != "end"); padding.clear(); offset = ParseInstructionOffset(line); state = IsaState::AMD_KERNEL_CODE_T; break; case IsaState::AMD_KERNEL_CODE_T: assert(line != "// amd_kernel_code_t begin"); assert(line != "// isa begin"); assert(line != "end"); assert(padding.empty()); if (line == "// amd_kernel_code_t end") { state = IsaState::AMD_KERNEL_CODE_T_END; } break; case IsaState::AMD_KERNEL_CODE_T_END: assert(line != "// amd_kernel_code_t begin"); assert(line != "// amd_kernel_code_t end"); assert(line != "// isa begin"); assert(line != "end"); assert(padding.empty()); for (size_t i = 0; i < SymbolCount(); ++i) { Symbol* sym = GetSymbol(i); if (sym->IsKernelSymbol() && sym->IsDefinition() && sym->SectionOffset() == offset) { std::ostream::fmtflags flags = out.flags(); char fill = out.fill(); out << " //" << std::endl; out << " // amd_kernel_code_t for " << sym->Name() << " (" << std::hex << std::setw(12) << std::setfill('0') << std::right << offset << " - " << std::setw(12) << (offset + 256) << ')' << std::endl; out << " //" << std::endl; out << std::setfill(fill); out.flags(flags); break; } } state = IsaState::UNKNOWN; break; case IsaState::ISA_BEGIN: assert(line != "// amd_kernel_code_t begin"); assert(line != "// amd_kernel_code_t end"); assert(line != "// isa begin"); padding.clear(); offset = ParseInstructionOffset(line); for (size_t i = 0; i < SymbolCount(); ++i) { Symbol* sym = GetSymbol(i); if (sym->IsKernelSymbol() && sym->IsDefinition()) { amd_kernel_code_t kernel_code; HsaText()->getData(sym->SectionOffset(), &kernel_code, sizeof(amd_kernel_code_t)); if ((sym->SectionOffset() + kernel_code.kernel_code_entry_byte_offset) == offset) { out << " //" << std::endl; out << " // " << sym->Name() << ':' << std::endl; out << " //" << std::endl; break; } } } if (line == "end") { out << line << std::endl; state = IsaState::UNKNOWN; } else { out << " " << line << std::endl; state = IsaState::ISA; } break; case IsaState::ISA: assert(line != "// amd_kernel_code_t end"); if (!padding.empty()) { out << padding; out.flush(); padding.clear(); } if (line == "// amd_kernel_code_t begin") { state = IsaState::AMD_KERNEL_CODE_T_BEGIN; } else if (line == "// isa begin") { state = IsaState::ISA_BEGIN; } else if (line == "end") { out << line << std::endl; state = IsaState::UNKNOWN; } else if (line.find("v_cndmask_b32 v0, s0, v0, vcc") != std::string::npos) { padding += " " + line + "\n"; offset = ParseInstructionOffset(line); padding_end = offset; state = IsaState::PADDING; } else { out << " " << line << std::endl; } break; case IsaState::PADDING: assert(line != "// amd_kernel_code_t end"); if (line.find("v_cndmask_b32 v0, s0, v0, vcc") != std::string::npos) { padding += " " + line + "\n"; padding_end = ParseInstructionOffset(line); } else if (line == "// amd_kernel_code_t begin" || line == "// isa begin" || line == "end") { padding.clear(); std::ostream::fmtflags flags = out.flags(); char fill = out.fill(); out << " //" << std::endl; out << " // padding (" << std::hex << std::setw(12) << std::setfill('0') << std::right << offset << " - " << std::setw(12) << (padding_end + 4) << ')' << std::endl; out << " //" << std::endl; out << std::setfill(fill); out.flags(flags); if (line == "// amd_kernel_code_t begin") { state = IsaState::AMD_KERNEL_CODE_T_BEGIN; } else if (line == "// isa begin") { state = IsaState::ISA_BEGIN; } else if (line == "end") { out << line << std::endl; state = IsaState::UNKNOWN; } } else { padding += " " + line + "\n"; state = IsaState::ISA; } break; default: assert(false); break; } } sp3_free(text); sp3_close(dis_state); sp3_vm_free(dis_vma); sp3_vm_free(comment_vma); #else PrintRawData(out, isa, size); #endif // SP3_STATIC_LIB out << std::dec; } std::string AmdHsaCode::MangleSymbolName(const std::string& module_name, const std::string symbol_name) { if (module_name.empty()) { return symbol_name; } else { return module_name + "::" + symbol_name; } } bool AmdHsaCode::ElfImageError() { out << img->output(); return false; } AmdHsaCode* AmdHsaCodeManager::FromHandle(hsa_code_object_t c) { CodeMap::iterator i = codeMap.find(c.handle); if (i == codeMap.end()) { AmdHsaCode* code = new AmdHsaCode(); const void* buffer = reinterpret_cast(c.handle); if (!code->InitAsBuffer(buffer, 0)) { delete code; return 0; } codeMap[c.handle] = code; return code; } return i->second; } bool AmdHsaCodeManager::Destroy(hsa_code_object_t c) { CodeMap::iterator i = codeMap.find(c.handle); if (i == codeMap.end()) { // Currently, we do not always create map entry for every code object buffer. return true; } delete i->second; codeMap.erase(i); return true; } bool AmdHsaCode::PullElfV2() { for (size_t i = 0; i < img->segmentCount(); ++i) { Segment* s = img->segment(i); if (s->type() == PT_LOAD) { dataSegments.push_back(s); } } for (size_t i = 0; i < img->sectionCount(); ++i) { Section* sec = img->section(i); if (!sec) { continue; } if ((sec->type() == SHT_PROGBITS || sec->type() == SHT_NOBITS) && !(sec->flags() & SHF_EXECINSTR)) { dataSections.push_back(sec); } else if (sec->type() == SHT_RELA) { relocationSections.push_back(sec->asRelocationSection()); } if (sec->Name() == ".text") { hsatext = sec; } } for (size_t i = 0; i < img->symtab()->symbolCount(); ++i) { amd::elf::Symbol* elfsym = img->symtab()->symbol(i); Symbol* sym = 0; switch (elfsym->type()) { case STT_AMDGPU_HSA_KERNEL: { amd::elf::Section* sec = elfsym->section(); amd_kernel_code_t akc; if (!sec) { out << "Failed to find section for symbol " << elfsym->name() << std::endl; return false; } if (!(sec->flags() & (SHF_ALLOC | SHF_EXECINSTR))) { out << "Invalid code section for symbol " << elfsym->name() << std::endl; return false; } if (!sec->getData(elfsym->value() - sec->addr(), &akc, sizeof(amd_kernel_code_t))) { out << "Failed to get AMD Kernel Code for symbol " << elfsym->name() << std::endl; return false; } sym = new KernelSymbolV2(elfsym, &akc); break; } case STT_OBJECT: case STT_COMMON: sym = new VariableSymbolV2(elfsym); break; default: break; // Skip unknown symbols. } if (sym) { symbols.push_back(sym); } } return true; } KernelSymbolV2::KernelSymbolV2(amd::elf::Symbol* elfsym_, const amd_kernel_code_t* akc) : KernelSymbol(elfsym_, akc) { } } // namespace code } // namespace hsa } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/amd_hsa_code_util.cpp000066400000000000000000001134401420110115200243230ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "amd_hsa_code_util.hpp" #include #include #include #include #include #include #include #ifdef _WIN32 #include #include #include #else // _WIN32 #include #include #include #include #include #endif // _WIN32 #include "inc/Brig.h" namespace { auto eq = " = "; std::ostream& attr1(std::ostream& out) { out << " " << std::left << std::setw(60) << std::setfill(' '); return out; } std::ostream& attr2(std::ostream& out) { out << " " << std::left << std::setw(58) << std::setfill(' '); return out; } } // namespace anonymous namespace rocr { namespace amd { namespace hsa { namespace common { bool IsAccessibleMemoryAddress(uint64_t address) { if (0 == address) { return false; } #if defined(_WIN32) || defined(_WIN64) MEMORY_BASIC_INFORMATION memory_info; if (!VirtualQuery(reinterpret_cast(address), &memory_info, sizeof(memory_info))) { return false; } int32_t is_accessible = ((memory_info.Protect & PAGE_READONLY) || (memory_info.Protect & PAGE_READWRITE) || (memory_info.Protect & PAGE_WRITECOPY) || (memory_info.Protect & PAGE_EXECUTE_READ) || (memory_info.Protect & PAGE_EXECUTE_READWRITE) || (memory_info.Protect & PAGE_EXECUTE_WRITECOPY)); if (memory_info.Protect & PAGE_GUARD) { is_accessible = 0; } if (memory_info.Protect & PAGE_NOACCESS) { is_accessible = 0; } return is_accessible > 0; #else int32_t random_fd = 0; ssize_t bytes_written = 0; if (-1 == (random_fd = open("/dev/random", O_WRONLY))) { return true; // Skip check if /dev/random is not available. } bytes_written = write(random_fd, (void*)address, 1); if (-1 == close(random_fd)) { return false; } return bytes_written == 1; #endif // _WIN32 || _WIN64 } } // namespace common std::string HsaSymbolKindToString(hsa_symbol_kind_t kind) { switch (kind) { case HSA_SYMBOL_KIND_VARIABLE: return "VARIABLE"; case HSA_SYMBOL_KIND_INDIRECT_FUNCTION: return "INDIRECT_FUNCTION"; case HSA_SYMBOL_KIND_KERNEL: return "KERNEL"; default: return "UNKNOWN"; } } std::string HsaSymbolLinkageToString(hsa_symbol_linkage_t linkage) { switch (linkage) { case HSA_SYMBOL_LINKAGE_MODULE: return "MODULE"; case HSA_SYMBOL_LINKAGE_PROGRAM: return "PROGRAM"; default: return "UNKNOWN"; } } std::string HsaVariableAllocationToString(hsa_variable_allocation_t allocation) { switch (allocation) { case HSA_VARIABLE_ALLOCATION_AGENT: return "AGENT"; case HSA_VARIABLE_ALLOCATION_PROGRAM: return "PROGRAM"; default: return "UNKNOWN"; } } std::string HsaVariableSegmentToString(hsa_variable_segment_t segment) { switch (segment) { case HSA_VARIABLE_SEGMENT_GLOBAL: return "GLOBAL"; case HSA_VARIABLE_SEGMENT_READONLY: return "READONLY"; default: return "UNKNOWN"; } } std::string HsaProfileToString(hsa_profile_t profile) { switch (profile) { case HSA_PROFILE_BASE: return "BASE"; case HSA_PROFILE_FULL: return "FULL"; default: return "UNKNOWN"; } } std::string HsaMachineModelToString(hsa_machine_model_t model) { switch (model) { case HSA_MACHINE_MODEL_SMALL: return "SMALL"; case HSA_MACHINE_MODEL_LARGE: return "LARGE"; default: return "UNKNOWN"; } } std::string HsaFloatRoundingModeToString(hsa_default_float_rounding_mode_t mode) { switch (mode) { case HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT: return "DEFAULT"; case HSA_DEFAULT_FLOAT_ROUNDING_MODE_ZERO: return "ZERO"; case HSA_DEFAULT_FLOAT_ROUNDING_MODE_NEAR: return "NEAR"; default: return "UNKNOWN"; } } std::string AmdMachineKindToString(amd_machine_kind16_t machine) { switch (machine) { case AMD_MACHINE_KIND_UNDEFINED: return "UNDEFINED"; case AMD_MACHINE_KIND_AMDGPU: return "AMDGPU"; default: return "UNKNOWN"; } } std::string AmdFloatRoundModeToString(amd_float_round_mode_t round_mode) { switch (round_mode) { case AMD_FLOAT_ROUND_MODE_NEAREST_EVEN: return "NEAREST_EVEN"; case AMD_FLOAT_ROUND_MODE_PLUS_INFINITY: return "PLUS_INFINITY"; case AMD_FLOAT_ROUND_MODE_MINUS_INFINITY: return "MINUS_INFINITY"; case AMD_FLOAT_ROUND_MODE_ZERO: return "ZERO"; default: return "UNKNOWN"; } } std::string AmdFloatDenormModeToString(amd_float_denorm_mode_t denorm_mode) { switch (denorm_mode) { case AMD_FLOAT_DENORM_MODE_FLUSH_SOURCE_OUTPUT: return "FLUSH_SOURCE_OUTPUT"; case AMD_FLOAT_DENORM_MODE_FLUSH_OUTPUT: return "FLUSH_OUTPUT"; case AMD_FLOAT_DENORM_MODE_FLUSH_SOURCE: return "FLUSH_SOURCE"; case AMD_FLOAT_DENORM_MODE_NO_FLUSH: return "FLUSH_NONE"; default: return "UNKNOWN"; } } std::string AmdSystemVgprWorkitemIdToString(amd_system_vgpr_workitem_id_t system_vgpr_workitem_id) { switch (system_vgpr_workitem_id) { case AMD_SYSTEM_VGPR_WORKITEM_ID_X: return "X"; case AMD_SYSTEM_VGPR_WORKITEM_ID_X_Y: return "X, Y"; case AMD_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z: return "X, Y, Z"; default: return "UNKNOWN"; } } std::string AmdElementByteSizeToString(amd_element_byte_size_t element_byte_size) { switch (element_byte_size) { case AMD_ELEMENT_BYTE_SIZE_2: return "WORD (2 bytes)"; case AMD_ELEMENT_BYTE_SIZE_4: return "DWORD (4 bytes)"; case AMD_ELEMENT_BYTE_SIZE_8: return "QWORD (8 bytes)"; case AMD_ELEMENT_BYTE_SIZE_16: return "16 bytes"; default: return "UNKNOWN"; } } std::string AmdExceptionKindToString(amd_exception_kind16_t exceptions) { std::string e; if (exceptions & AMD_EXCEPTION_KIND_INVALID_OPERATION) { e += ", INVALID_OPERATON"; exceptions &= ~AMD_EXCEPTION_KIND_INVALID_OPERATION; } if (exceptions & AMD_EXCEPTION_KIND_DIVISION_BY_ZERO) { e += ", DIVISION_BY_ZERO"; exceptions &= ~AMD_EXCEPTION_KIND_DIVISION_BY_ZERO; } if (exceptions & AMD_EXCEPTION_KIND_OVERFLOW) { e += ", OVERFLOW"; exceptions &= ~AMD_EXCEPTION_KIND_OVERFLOW; } if (exceptions & AMD_EXCEPTION_KIND_UNDERFLOW) { e += ", UNDERFLOW"; exceptions &= ~AMD_EXCEPTION_KIND_UNDERFLOW; } if (exceptions & AMD_EXCEPTION_KIND_INEXACT) { e += ", INEXACT"; exceptions &= ~AMD_EXCEPTION_KIND_INEXACT; } if (exceptions) { e += ", UNKNOWN"; } if (!e.empty()) { e = "[" + e.erase(0, 2) + "]"; } return e; } std::string AmdPowerTwoToString(amd_powertwo8_t p) { return std::to_string(1 << (unsigned) p); } amdgpu_hsa_elf_segment_t AmdHsaElfSectionSegment(amdgpu_hsa_elf_section_t sec) { switch (sec) { case AMDGPU_HSA_RODATA_GLOBAL_PROGRAM: case AMDGPU_HSA_DATA_GLOBAL_PROGRAM: case AMDGPU_HSA_BSS_GLOBAL_PROGRAM: return AMDGPU_HSA_SEGMENT_GLOBAL_PROGRAM; case AMDGPU_HSA_RODATA_GLOBAL_AGENT: case AMDGPU_HSA_DATA_GLOBAL_AGENT: case AMDGPU_HSA_BSS_GLOBAL_AGENT: return AMDGPU_HSA_SEGMENT_GLOBAL_AGENT; case AMDGPU_HSA_RODATA_READONLY_AGENT: case AMDGPU_HSA_DATA_READONLY_AGENT: case AMDGPU_HSA_BSS_READONLY_AGENT: return AMDGPU_HSA_SEGMENT_READONLY_AGENT; default: assert(false); return AMDGPU_HSA_SEGMENT_LAST; } } bool IsAmdHsaElfSectionROData(amdgpu_hsa_elf_section_t sec) { switch (sec) { case AMDGPU_HSA_RODATA_GLOBAL_PROGRAM: case AMDGPU_HSA_RODATA_GLOBAL_AGENT: case AMDGPU_HSA_RODATA_READONLY_AGENT: default: return false; } } std::string AmdHsaElfSegmentToString(amdgpu_hsa_elf_segment_t seg) { switch (seg) { case AMDGPU_HSA_SEGMENT_GLOBAL_PROGRAM: return "GLOBAL_PROGRAM"; case AMDGPU_HSA_SEGMENT_GLOBAL_AGENT: return "GLOBAL_AGENT"; case AMDGPU_HSA_SEGMENT_READONLY_AGENT: return "READONLY_AGENT"; case AMDGPU_HSA_SEGMENT_CODE_AGENT: return "CODE_AGENT"; default: return "UNKNOWN"; } } std::string AmdPTLoadToString(uint64_t type) { if (PT_LOOS <= type && type < PT_LOOS + AMDGPU_HSA_SEGMENT_LAST) { return AmdHsaElfSegmentToString((amdgpu_hsa_elf_segment_t) (type - PT_LOOS)); } else { return "UNKNOWN (" + std::to_string(type) + ")"; } } void PrintAmdKernelCode(std::ostream& out, const amd_kernel_code_t *akc) { uint32_t is_debug_enabled = AMD_HSA_BITS_GET(akc->kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_IS_DEBUG_ENABLED); out << attr1 << "amd_kernel_code_version_major" << eq << akc->amd_kernel_code_version_major << std::endl; out << attr1 << "amd_kernel_code_version_minor" << eq << akc->amd_kernel_code_version_minor << std::endl; out << attr1 << "amd_machine_kind" << eq << AmdMachineKindToString(akc->amd_machine_kind) << std::endl; out << attr1 << "amd_machine_version_major" << eq << (uint32_t)akc->amd_machine_version_major << std::endl; out << attr1 << "amd_machine_version_minor" << eq << (uint32_t)akc->amd_machine_version_minor << std::endl; out << attr1 << "amd_machine_version_stepping" << eq << (uint32_t)akc->amd_machine_version_stepping << std::endl; out << attr1 << "kernel_code_entry_byte_offset" << eq << akc->kernel_code_entry_byte_offset << std::endl; if (akc->kernel_code_prefetch_byte_offset) { out << attr1 << "kernel_code_prefetch_byte_offset" << eq << akc->kernel_code_prefetch_byte_offset << std::endl; } if (akc->kernel_code_prefetch_byte_size) { out << attr1 << "kernel_code_prefetch_byte_size" << eq << akc->kernel_code_prefetch_byte_size << std::endl; } out << attr1 << "max_scratch_backing_memory_byte_size" << eq << akc->max_scratch_backing_memory_byte_size << std::endl; PrintAmdComputePgmRsrcOne(out, akc->compute_pgm_rsrc1); PrintAmdComputePgmRsrcTwo(out, akc->compute_pgm_rsrc2); PrintAmdKernelCodeProperties(out, akc->kernel_code_properties); if (akc->workitem_private_segment_byte_size) { out << attr1 << "workitem_private_segment_byte_size" << eq << akc->workitem_private_segment_byte_size << std::endl; } if (akc->workgroup_group_segment_byte_size) { out << attr1 << "workgroup_group_segment_byte_size" << eq << akc->workgroup_group_segment_byte_size << std::endl; } if (akc->gds_segment_byte_size) { out << attr1 << "gds_segment_byte_size" << eq << akc->gds_segment_byte_size << std::endl; } if (akc->kernarg_segment_byte_size) { out << attr1 << "kernarg_segment_byte_size" << eq << akc->kernarg_segment_byte_size << std::endl; } if (akc->workgroup_fbarrier_count) { out << attr1 << "workgroup_fbarrier_count" << eq << akc->workgroup_fbarrier_count << std::endl; } out << attr1 << "wavefront_sgpr_count" << eq << (uint32_t)akc->wavefront_sgpr_count << std::endl; out << attr1 << "workitem_vgpr_count" << eq << (uint32_t)akc->workitem_vgpr_count << std::endl; if (akc->reserved_vgpr_count > 0) { out << attr1 << "reserved_vgpr_first" << eq << (uint32_t)akc->reserved_vgpr_first << std::endl; out << attr1 << "reserved_vgpr_count" << eq << (uint32_t)akc->reserved_vgpr_count << std::endl; } if (akc->reserved_sgpr_count > 0) { out << attr1 << "reserved_sgpr_first" << eq << (uint32_t)akc->reserved_sgpr_first << std::endl; out << attr1 << "reserved_sgpr_count" << eq << (uint32_t)akc->reserved_sgpr_count << std::endl; } if (is_debug_enabled && (akc->debug_wavefront_private_segment_offset_sgpr != uint16_t(-1))) { out << attr1 << "debug_wavefront_private_segment_offset_sgpr" << eq << (uint32_t)akc->debug_wavefront_private_segment_offset_sgpr << std::endl; } if (is_debug_enabled && (akc->debug_private_segment_buffer_sgpr != uint16_t(-1))) { out << attr1 << "debug_private_segment_buffer_sgpr" << eq << (uint32_t)akc->debug_private_segment_buffer_sgpr << ":" << (uint32_t)(akc->debug_private_segment_buffer_sgpr + 3) << std::endl; } if (akc->kernarg_segment_alignment) { out << attr1 << "kernarg_segment_alignment" << eq << AmdPowerTwoToString(akc->kernarg_segment_alignment) << " (" << (uint32_t) akc->kernarg_segment_alignment << ")" << std::endl; } if (akc->group_segment_alignment) { out << attr1 << "group_segment_alignment" << eq << AmdPowerTwoToString(akc->group_segment_alignment) << " (" << (uint32_t) akc->group_segment_alignment << ")" << std::endl; } if (akc->private_segment_alignment) { out << attr1 << "private_segment_alignment" << eq << AmdPowerTwoToString(akc->private_segment_alignment) << " (" << (uint32_t) akc->private_segment_alignment << ")" << std::endl; } out << attr1 << "wavefront_size" << eq << AmdPowerTwoToString(akc->wavefront_size) << " (" << (uint32_t) akc->wavefront_size << ")" << std::endl; PrintAmdControlDirectives(out, akc->control_directives); } void PrintAmdComputePgmRsrcOne(std::ostream& out, amd_compute_pgm_rsrc_one32_t compute_pgm_rsrc1) { out << " COMPUTE_PGM_RSRC1 (0x" << std::hex << std::setw(8) << std::setfill('0') << compute_pgm_rsrc1 << "):" << std::endl; out << std::dec; uint32_t granulated_workitem_vgpr_count = AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WORKITEM_VGPR_COUNT); out << attr2 << "granulated_workitem_vgpr_count" << eq << granulated_workitem_vgpr_count << std::endl; uint32_t granulated_wavefront_sgpr_count = AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WAVEFRONT_SGPR_COUNT); out << attr2 << "granulated_wavefront_sgpr_count" << eq << granulated_wavefront_sgpr_count << std::endl; uint32_t priority = AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_PRIORITY); out << attr2 << "priority" << eq << priority << std::endl; uint32_t float_round_mode_32 = AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_ROUND_MODE_32); out << attr2 << "float_round_mode_32" << eq << AmdFloatRoundModeToString((amd_float_round_mode_t)float_round_mode_32) << std::endl; uint32_t float_round_mode_16_64 = AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_ROUND_MODE_16_64); out << attr2 << "float_round_mode_16_64" << eq << AmdFloatRoundModeToString((amd_float_round_mode_t)float_round_mode_16_64) << std::endl; uint32_t float_denorm_mode_32 = AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_DENORM_MODE_32); out << attr2 << "float_denorm_mode_32" << eq << AmdFloatDenormModeToString((amd_float_denorm_mode_t)float_denorm_mode_32) << std::endl; uint32_t float_denorm_mode_16_64 = AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_FLOAT_DENORM_MODE_16_64); out << attr2 << "float_denorm_mode_16_64" << eq << AmdFloatDenormModeToString((amd_float_denorm_mode_t)float_denorm_mode_16_64) << std::endl; if (AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_PRIV)) { out << attr2 << "priv" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_ENABLE_DX10_CLAMP)) { out << attr2 << "enable_dx10_clamp" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_DEBUG_MODE)) { out << attr2 << "debug_mode" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_ENABLE_IEEE_MODE)) { out << attr2 << "enable_ieee_mode" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_BULKY)) { out << attr2 << "bulky" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc1, AMD_COMPUTE_PGM_RSRC_ONE_CDBG_USER)) { out << attr2 << "cdbg_user" << eq << "TRUE" << std::endl; } } void PrintAmdComputePgmRsrcTwo(std::ostream& out, amd_compute_pgm_rsrc_two32_t compute_pgm_rsrc2) { out << " COMPUTE_PGM_RSRC2 (0x" << std::hex << std::setw(8) << std::setfill('0') << compute_pgm_rsrc2 << "):" << std::endl; out << std::dec; if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_PRIVATE_SEGMENT_WAVE_BYTE_OFFSET)) { out << attr2 << "enable_sgpr_private_segment_wave_byte_offset" << eq << "TRUE" << std::endl; } uint32_t user_sgpr_count = AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_USER_SGPR_COUNT); out << attr2 << "user_sgpr_count" << eq << user_sgpr_count << std::endl; if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_TRAP_HANDLER)) { out << attr2 << "enable_trap_handler" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_ID_X)) { out << attr2 << "enable_sgpr_workgroup_id_x" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_ID_Y)) { out << attr2 << "enable_sgpr_workgroup_id_y" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_ID_Z)) { out << attr2 << "enable_sgpr_workgroup_id_z" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_SGPR_WORKGROUP_INFO)) { out << attr2 << "enable_sgpr_workgroup_info" << eq << "TRUE" << std::endl; } uint32_t enable_vgpr_workitem_id = AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_VGPR_WORKITEM_ID); out << attr2 << "enable_vgpr_workitem_id" << eq << AmdSystemVgprWorkitemIdToString((amd_system_vgpr_workitem_id_t)enable_vgpr_workitem_id) << std::endl; if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_ADDRESS_WATCH)) { out << attr2 << "enable_exception_address_watch" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_MEMORY_VIOLATION)) { out << attr2 << "enable_exception_memory_violation" << eq << "TRUE" << std::endl; } uint32_t granulated_lds_size = AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_GRANULATED_LDS_SIZE); out << attr2 << "granulated_lds_size" << eq << granulated_lds_size << std::endl; if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION)) { out << attr2 << "enable_exception_ieee_754_fp_invalid_operation" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_FP_DENORMAL_SOURCE)) { out << attr2 << "enable_exception_fp_denormal_source" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO)) { out << attr2 << "enable_exception_ieee_754_fp_division_by_zero" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW)) { out << attr2 << "enable_exception_ieee_754_fp_overflow" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW)) { out << attr2 << "enable_exception_ieee_754_fp_underflow" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_IEEE_754_FP_INEXACT)) { out << attr2 << "enable_exception_ieee_754_fp_inexact" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(compute_pgm_rsrc2, AMD_COMPUTE_PGM_RSRC_TWO_ENABLE_EXCEPTION_INT_DIVISION_BY_ZERO)) { out << attr2 << "enable_exception_int_division_by_zero" << eq << "TRUE" << std::endl; } } void PrintAmdKernelCodeProperties(std::ostream& out, amd_kernel_code_properties32_t kernel_code_properties) { out << " KERNEL_CODE_PROPERTIES (0x" << std::hex << std::setw(8) << std::setfill('0') << kernel_code_properties << "):" << std::endl; out << std::dec; if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER)) { out << attr2 << "enable_sgpr_private_segment_buffer" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_DISPATCH_PTR)) { out << attr2 << "enable_sgpr_dispatch_ptr" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_QUEUE_PTR)) { out << attr2 << "enable_sgpr_queue_ptr" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_KERNARG_SEGMENT_PTR)) { out << attr2 << "enable_sgpr_kernarg_segment_ptr" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_DISPATCH_ID)) { out << attr2 << "enable_sgpr_dispatch_id" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_FLAT_SCRATCH_INIT)) { out << attr2 << "enable_sgpr_flat_scratch_init" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE)) { out << attr2 << "enable_sgpr_private_segment_size" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X)) { out << attr2 << "enable_sgpr_grid_workgroup_count_x" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y)) { out << attr2 << "enable_sgpr_grid_workgroup_count_y" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z)) { out << attr2 << "enable_sgpr_grid_workgroup_count_z" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_ENABLE_ORDERED_APPEND_GDS)) { out << attr2 << "enable_ordered_append_gds" << eq << "TRUE" << std::endl; } uint32_t private_element_size = AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_PRIVATE_ELEMENT_SIZE); out << attr2 << "private_element_size" << eq << AmdElementByteSizeToString((amd_element_byte_size_t)private_element_size) << std::endl; if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_IS_PTR64)) { out << attr2 << "is_ptr64" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_IS_DYNAMIC_CALLSTACK)) { out << attr2 << "is_dynamic_callstack" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_IS_DEBUG_ENABLED)) { out << attr2 << "is_debug_enabled" << eq << "TRUE" << std::endl; } if (AMD_HSA_BITS_GET(kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_IS_XNACK_ENABLED)) { out << attr2 << "is_xnack_enabled" << eq << "TRUE" << std::endl; } } void PrintAmdControlDirectives(std::ostream& out, const amd_control_directives_t &control_directives) { if (!control_directives.enabled_control_directives) { return; } out << " CONTROL_DIRECTIVES:" << std::endl; if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_ENABLE_BREAK_EXCEPTIONS) { out << attr2 << "enable_break_exceptions" << eq << AmdExceptionKindToString(control_directives.enable_break_exceptions).c_str() << std::endl; } if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_ENABLE_DETECT_EXCEPTIONS) { out << attr2 << "enable_detect_exceptions" << eq << AmdExceptionKindToString(control_directives.enable_detect_exceptions).c_str() << std::endl; } if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_MAX_DYNAMIC_GROUP_SIZE) { out << attr2 << "max_dynamic_group_size" << eq << control_directives.max_dynamic_group_size << std::endl; } if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_MAX_FLAT_GRID_SIZE) { out << attr2 << "max_flat_grid_size" << eq << control_directives.max_flat_grid_size << std::endl; } if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_MAX_FLAT_WORKGROUP_SIZE) { out << attr2 << "max_flat_workgroup_size" << eq << control_directives.max_flat_workgroup_size << std::endl; } if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_REQUIRED_DIM) { out << attr2 << "required_dim" << eq << (uint32_t)control_directives.required_dim << std::endl; } if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_REQUIRED_GRID_SIZE) { out << attr2 << "required_grid_size" << eq << "(" << control_directives.required_grid_size[0] << ", " << control_directives.required_grid_size[1] << ", " << control_directives.required_grid_size[2] << ")" << std::endl; } if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_REQUIRED_WORKGROUP_SIZE) { out << attr2 << "required_workgroup_size" << eq << "(" << control_directives.required_workgroup_size[0] << ", " << control_directives.required_workgroup_size[1] << ", " << control_directives.required_workgroup_size[2] << ")" << std::endl; } if (control_directives.enabled_control_directives & AMD_ENABLED_CONTROL_DIRECTIVE_REQUIRE_NO_PARTIAL_WORKGROUPS) { out << attr2 << "require_no_partial_workgroups" << eq << "TRUE" << std::endl; } } namespace code_options { std::ostream& space(std::ostream& out) { if (out.tellp()) { out << " "; } return out; } std::ostream& operator<<(std::ostream& out, const control_directive& d) { out << space << "-hsa_control_directive:" << d.name << "="; return out; } const char *BrigExceptionString(BrigExceptions32_t e) { switch (e) { case BRIG_EXCEPTIONS_INVALID_OPERATION: return "INVALID_OPERATION"; case BRIG_EXCEPTIONS_DIVIDE_BY_ZERO: return "DIVIDE_BY_ZERO"; case BRIG_EXCEPTIONS_OVERFLOW: return "OVERFLOW"; case BRIG_EXCEPTIONS_INEXACT: return "INEXACT"; default: assert(false); return ""; } } std::ostream& operator<<(std::ostream& out, const exceptions_mask& e) { bool first = true; for (BrigExceptions32_t be = BRIG_EXCEPTIONS_INVALID_OPERATION; be < BRIG_EXCEPTIONS_FIRST_USER_DEFINED; ++be) { if (e.mask & be) { if (first) { first = false; } else { out << ","; } out << BrigExceptionString(be); } } return out; } std::ostream& operator<<(std::ostream& out, const control_directives& cd) { const hsa_ext_control_directives_t& d = cd.d; uint64_t mask = d.control_directives_mask; if (!mask) { return out; } if (mask & BRIG_CONTROL_ENABLEBREAKEXCEPTIONS) { out << control_directive("ENABLEBREAKEXCEPTIONS") << exceptions_mask(d.break_exceptions_mask); } if (mask & BRIG_CONTROL_ENABLEDETECTEXCEPTIONS) { out << control_directive("ENABLEDETECTEXCEPTIONS") << exceptions_mask(d.detect_exceptions_mask); } if (mask & BRIG_CONTROL_MAXDYNAMICGROUPSIZE) { out << control_directive("MAXDYNAMICGROUPSIZE") << d.max_dynamic_group_size; } if (mask & BRIG_CONTROL_MAXFLATGRIDSIZE) { out << control_directive("MAXFLATGRIDSIZE") << d.max_flat_grid_size; } if (mask & BRIG_CONTROL_MAXFLATWORKGROUPSIZE) { out << control_directive("MAXFLATWORKGROUPSIZE") << d.max_flat_workgroup_size; } if (mask & BRIG_CONTROL_REQUIREDDIM) { out << control_directive("REQUIREDDIM") << d.required_dim; } if (mask & BRIG_CONTROL_REQUIREDGRIDSIZE) { out << control_directive("REQUIREDGRIDSIZE") << d.required_grid_size[0] << "," << d.required_grid_size[1] << "," << d.required_grid_size[2]; } if (mask & BRIG_CONTROL_REQUIREDWORKGROUPSIZE) { out << control_directive("REQUIREDWORKGROUPSIZE") << d.required_workgroup_size.x << "," << d.required_workgroup_size.y << "," << d.required_workgroup_size.z; } return out; } } const char* hsaerr2str(hsa_status_t status) { switch ((unsigned) status) { case HSA_STATUS_SUCCESS: return "HSA_STATUS_SUCCESS: The function has been executed successfully."; case HSA_STATUS_INFO_BREAK: return "HSA_STATUS_INFO_BREAK: A traversal over a list of " "elements has been interrupted by the application before " "completing."; case HSA_STATUS_ERROR: return "HSA_STATUS_ERROR: A generic error has occurred."; case HSA_STATUS_ERROR_INVALID_ARGUMENT: return "HSA_STATUS_ERROR_INVALID_ARGUMENT: One of the actual " "arguments does not meet a precondition stated in the " "documentation of the corresponding formal argument."; case HSA_STATUS_ERROR_INVALID_QUEUE_CREATION: return "HSA_STATUS_ERROR_INVALID_QUEUE_CREATION: The requested " "queue creation is not valid."; case HSA_STATUS_ERROR_INVALID_ALLOCATION: return "HSA_STATUS_ERROR_INVALID_ALLOCATION: The requested " "allocation is not valid."; case HSA_STATUS_ERROR_INVALID_AGENT: return "HSA_STATUS_ERROR_INVALID_AGENT: The agent is invalid."; case HSA_STATUS_ERROR_INVALID_REGION: return "HSA_STATUS_ERROR_INVALID_REGION: The memory region is invalid."; case HSA_STATUS_ERROR_INVALID_SIGNAL: return "HSA_STATUS_ERROR_INVALID_SIGNAL: The signal is invalid."; case HSA_STATUS_ERROR_INVALID_QUEUE: return "HSA_STATUS_ERROR_INVALID_QUEUE: The queue is invalid."; case HSA_STATUS_ERROR_OUT_OF_RESOURCES: return "HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to " "allocate the necessary resources. This error may also " "occur when the core runtime library needs to spawn " "threads or create internal OS-specific events."; case HSA_STATUS_ERROR_INVALID_PACKET_FORMAT: return "HSA_STATUS_ERROR_INVALID_PACKET_FORMAT: The AQL packet " "is malformed."; case HSA_STATUS_ERROR_RESOURCE_FREE: return "HSA_STATUS_ERROR_RESOURCE_FREE: An error has been " "detected while releasing a resource."; case HSA_STATUS_ERROR_NOT_INITIALIZED: return "HSA_STATUS_ERROR_NOT_INITIALIZED: An API other than " "hsa_init has been invoked while the reference count of " "the HSA runtime is zero."; case HSA_STATUS_ERROR_REFCOUNT_OVERFLOW: return "HSA_STATUS_ERROR_REFCOUNT_OVERFLOW: The maximum " "reference count for the object has been reached."; case HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS: return "HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS: The arguments passed to " "a functions are not compatible."; case HSA_STATUS_ERROR_INVALID_INDEX: return "The index is invalid."; case HSA_STATUS_ERROR_INVALID_ISA: return "The instruction set architecture is invalid."; case HSA_STATUS_ERROR_INVALID_CODE_OBJECT: return "The code object is invalid."; case HSA_STATUS_ERROR_INVALID_EXECUTABLE: return "The executable is invalid."; case HSA_STATUS_ERROR_FROZEN_EXECUTABLE: return "The executable is frozen."; case HSA_STATUS_ERROR_INVALID_SYMBOL_NAME: return "There is no symbol with the given name."; case HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED: return "The variable is already defined."; case HSA_STATUS_ERROR_VARIABLE_UNDEFINED: return "The variable is undefined."; case HSA_EXT_STATUS_ERROR_INVALID_PROGRAM: return "HSA_EXT_STATUS_ERROR_INVALID_PROGRAM: Invalid program"; case HSA_EXT_STATUS_ERROR_INVALID_MODULE: return "HSA_EXT_STATUS_ERROR_INVALID_MODULE: Invalid module"; case HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE: return "HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE: Incompatible module"; case HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED: return "HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED: Module already " "included"; case HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH: return "HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH: Symbol mismatch"; case HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED: return "HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED: Finalization failed"; case HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH: return "HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH: Directive mismatch"; default: return "Unknown HSA status"; } } bool ReadFileIntoBuffer(const std::string& filename, std::vector& buffer) { std::ifstream file(filename, std::ios::binary); if (!file) { return false; } file.seekg(0, std::ios::end); std::streamsize size = file.tellg(); file.seekg(0, std::ios::beg); buffer.resize((size_t) size); if (!file.read(buffer.data(), size)) { return false; } return true; } #ifndef _WIN32 #define _tempnam tempnam #define _close close #define _getpid getpid #define _open open #endif // _WIN32 int OpenTempFile(const char* prefix) { unsigned c = 0; std::string tname = prefix; tname += "_"; tname += std::to_string(_getpid()); tname += "_"; while (c++ < 20) { // Loop because several threads can generate same filename. #ifdef _WIN32 char dir[MAX_PATH+1]; if (!GetTempPath(sizeof(dir), dir)) { return -1; } #else // _WIN32 char *dir = NULL; #endif // _WIN32 char *name = _tempnam(dir, tname.c_str()); if (!name) { return -1; } #ifdef _WIN32 HANDLE h = CreateFile( name, GENERIC_READ | GENERIC_WRITE, 0, // No sharing NULL, CREATE_NEW, FILE_ATTRIBUTE_TEMPORARY | FILE_FLAG_DELETE_ON_CLOSE, NULL); free(name); if (h == INVALID_HANDLE_VALUE) { continue; } return _open_osfhandle((intptr_t)h, 0); #else // _WIN32 int d = _open(name, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR); if (d < 0) { free(name); continue; } if (unlink(name) < 0) { free(name); _close(d); return -1; } free(name); return d; #endif // _WIN32 } return -1; } void CloseTempFile(int fd) { _close(fd); } const char * CommentTopCallBack(void *ctx, int type) { static const char* amd_kernel_code_t_begin = "amd_kernel_code_t begin"; static const char* amd_kernel_code_t_end = "amd_kernel_code_t end"; static const char* isa_begin = "isa begin"; switch(type) { case COMMENT_AMD_KERNEL_CODE_T_BEGIN: return amd_kernel_code_t_begin; case COMMENT_AMD_KERNEL_CODE_T_END: return amd_kernel_code_t_end; case COMMENT_KERNEL_ISA_BEGIN: return isa_begin; default: assert(false); return ""; } } const char * CommentRightCallBack(void *ctx, int type) { return nullptr; } uint32_t ParseInstructionOffset(const std::string& instruction) { // instruction format: opcode op1, op2 ... // offset: binopcode std::string::size_type n = instruction.find("//"); assert(n != std::string::npos); std::string comment = instruction.substr(n); n = comment.find(':'); assert(n != std::string::npos); comment.erase(n); assert(comment.size() > 3); comment.erase(0, 3); return strtoul(comment.c_str(), nullptr, 16); } bool IsNotSpace(char c) { return !isspace(static_cast(c)); } void ltrim(std::string &str) { str.erase(str.begin(), std::find_if(str.begin(), str.end(), IsNotSpace)); } std::string DumpFileName(const std::string& dir, const char* prefix, const char* ext, unsigned n, unsigned i) { std::ostringstream ss; if (!dir.empty()) { ss << dir << "/"; } ss << prefix << std::setfill('0') << std::setw(3) << n; if (i) { ss << "_" << i; } if (ext) { ss << "." << ext; } return ss.str(); } } // namespace hsa } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/amd_hsa_code_util.hpp000066400000000000000000000164441420110115200243360ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_CODE_UTIL_HPP_ #define AMD_HSA_CODE_UTIL_HPP_ #include #include #include #include #ifdef _WIN32 #include #else // _WIN32 #include #endif // _WIN32 #include "inc/amd_hsa_kernel_code.h" #include "inc/amd_hsa_elf.h" #include "inc/hsa.h" #include "inc/hsa_ext_finalize.h" #define hsa_error(e) static_cast(e) #define release_assert(e) \ if (!(e)) { \ std::cerr << __FILE__ << ":"; \ std::cerr << __LINE__ << ":"; \ std::cerr << " Assertion `" << #e << "' failed." << std::endl; \ std::abort(); \ } \ namespace rocr { namespace amd { namespace hsa { std::string HsaSymbolKindToString(hsa_symbol_kind_t kind); std::string HsaSymbolLinkageToString(hsa_symbol_linkage_t linkage); std::string HsaVariableAllocationToString(hsa_variable_allocation_t allocation); std::string HsaVariableSegmentToString(hsa_variable_segment_t segment); std::string HsaProfileToString(hsa_profile_t profile); std::string HsaMachineModelToString(hsa_machine_model_t model); std::string HsaFloatRoundingModeToString(hsa_default_float_rounding_mode_t mode); std::string AmdMachineKindToString(amd_machine_kind16_t machine); std::string AmdFloatRoundModeToString(amd_float_round_mode_t round_mode); std::string AmdFloatDenormModeToString(amd_float_denorm_mode_t denorm_mode); std::string AmdSystemVgprWorkitemIdToString(amd_system_vgpr_workitem_id_t system_vgpr_workitem_id); std::string AmdElementByteSizeToString(amd_element_byte_size_t element_byte_size); std::string AmdExceptionKindToString(amd_exception_kind16_t exceptions); std::string AmdPowerTwoToString(amd_powertwo8_t p); amdgpu_hsa_elf_segment_t AmdHsaElfSectionSegment(amdgpu_hsa_elf_section_t sec); bool IsAmdHsaElfSectionROData(amdgpu_hsa_elf_section_t sec); std::string AmdHsaElfSegmentToString(amdgpu_hsa_elf_segment_t seg); std::string AmdPTLoadToString(uint64_t type); void PrintAmdKernelCode(std::ostream& out, const amd_kernel_code_t *akc); void PrintAmdComputePgmRsrcOne(std::ostream& out, amd_compute_pgm_rsrc_one32_t compute_pgm_rsrc1); void PrintAmdComputePgmRsrcTwo(std::ostream& out, amd_compute_pgm_rsrc_two32_t compute_pgm_rsrc2); void PrintAmdKernelCodeProperties(std::ostream& out, amd_kernel_code_properties32_t kernel_code_properties); void PrintAmdControlDirectives(std::ostream& out, const amd_control_directives_t &control_directives); namespace code_options { // Space between options (not at the beginning). std::ostream& space(std::ostream& out); // Control directive option without value. struct control_directive { const char *name; control_directive(const char* name_) : name(name_) { } }; std::ostream& operator<<(std::ostream& out, const control_directive& d); // Exceptions mask string. struct exceptions_mask { uint16_t mask; exceptions_mask(uint16_t mask_) : mask(mask_) { } }; std::ostream& operator<<(std::ostream& out, const exceptions_mask& e); // Control directives options. struct control_directives { const hsa_ext_control_directives_t& d; control_directives(const hsa_ext_control_directives_t& d_) : d(d_) { } }; std::ostream& operator<<(std::ostream& out, const control_directives& cd); } const char* hsaerr2str(hsa_status_t status); bool ReadFileIntoBuffer(const std::string& filename, std::vector& buffer); // Create new empty temporary file that will be deleted when closed. int OpenTempFile(const char* prefix); void CloseTempFile(int fd); // Helper comment types for isa disassembler enum DumpIsaCommentType { COMMENT_AMD_KERNEL_CODE_T_BEGIN = 1, COMMENT_AMD_KERNEL_CODE_T_END, COMMENT_KERNEL_ISA_BEGIN, }; // Callbacks to create helper comments for isa disassembler const char * CommentTopCallBack(void *ctx, int type); const char * CommentRightCallBack(void *ctx, int type); // Parse disassembler instruction line to find offset uint32_t ParseInstructionOffset(const std::string& instruction); // Trim whitespaces from start of string void ltrim(std::string &str); // Helper function that allocates an aligned memory. inline void* alignedMalloc(size_t size, size_t alignment) { #if defined(_WIN32) return ::_aligned_malloc(size, alignment); #else void * ptr = NULL; alignment = (std::max)(alignment, sizeof(void*)); if (0 == ::posix_memalign(&ptr, alignment, size)) { return ptr; } return NULL; #endif } // Helper function that frees an aligned memory. inline void alignedFree(void *ptr) { #if defined(_WIN32) ::_aligned_free(ptr); #else free(ptr); #endif } inline uint64_t alignUp(uint64_t num, uint64_t align) { assert(align); assert((align & (align - 1)) == 0); return (num + align - 1) & ~(align - 1); } inline uint32_t alignUp(uint32_t num, uint32_t align) { assert(align); assert((align & (align - 1)) == 0); return (num + align - 1) & ~(align - 1); } std::string DumpFileName(const std::string& dir, const char* prefix, const char* ext, unsigned n, unsigned i = 0); } // namespace hsa } // namespace amd } // namespace rocr #endif // AMD_HSA_CODE_UTIL_HPP_ ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/amd_hsa_locks.cpp000066400000000000000000000060331420110115200234660ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "amd_hsa_locks.hpp" namespace rocr { namespace amd { namespace hsa { namespace common { void ReaderWriterLock::ReaderLock() { internal_lock_.lock(); while (0 < writers_count_) { readers_condition_.wait(internal_lock_); } readers_count_ += 1; internal_lock_.unlock(); } void ReaderWriterLock::ReaderUnlock() { internal_lock_.lock(); readers_count_ -= 1; if (0 == readers_count_ && 0 < writers_waiting_) { writers_condition_.notify_one(); } internal_lock_.unlock(); } void ReaderWriterLock::WriterLock() { internal_lock_.lock(); writers_waiting_ += 1; while (0 < readers_count_ || 0 < writers_count_) { writers_condition_.wait(internal_lock_); } writers_count_ += 1; writers_waiting_ -= 1; internal_lock_.unlock(); } void ReaderWriterLock::WriterUnlock() { internal_lock_.lock(); writers_count_ -= 1; if (0 < writers_waiting_) { writers_condition_.notify_one(); } readers_condition_.notify_all(); internal_lock_.unlock(); } } // namespace common } // namespace hsa } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/amd_hsa_locks.hpp000066400000000000000000000067661420110115200235100ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_HSA_LOCKS_HPP #define AMD_HSA_LOCKS_HPP #include #include #include namespace rocr { namespace amd { namespace hsa { namespace common { template class ReaderLockGuard final { public: explicit ReaderLockGuard(LockType &lock): lock_(lock) { lock_.ReaderLock(); } ~ReaderLockGuard() { lock_.ReaderUnlock(); } private: ReaderLockGuard(const ReaderLockGuard&); ReaderLockGuard& operator=(const ReaderLockGuard&); LockType &lock_; }; template class WriterLockGuard final { public: explicit WriterLockGuard(LockType &lock): lock_(lock) { lock_.WriterLock(); } ~WriterLockGuard() { lock_.WriterUnlock(); } private: WriterLockGuard(const WriterLockGuard&); WriterLockGuard& operator=(const WriterLockGuard&); LockType &lock_; }; class ReaderWriterLock final { public: ReaderWriterLock(): readers_count_(0), writers_count_(0), writers_waiting_(0) {} ~ReaderWriterLock() {} void ReaderLock(); void ReaderUnlock(); void WriterLock(); void WriterUnlock(); private: ReaderWriterLock(const ReaderWriterLock&); ReaderWriterLock& operator=(const ReaderWriterLock&); size_t readers_count_; size_t writers_count_; size_t writers_waiting_; std::mutex internal_lock_; std::condition_variable_any readers_condition_; std::condition_variable_any writers_condition_; }; } // namespace common } // namespace hsa } // namespace amd } // namespace rocr #endif // AMD_HSA_LOCKS_HPP ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/amd_options.cpp000066400000000000000000000267551420110115200232300ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "amd_options.hpp" #include #include #include #include #include #include #include #include #include #include namespace rocr { namespace amd { namespace options { //===----------------------------------------------------------------------===// // StringFactory. // //===----------------------------------------------------------------------===// std::string StringFactory::Flatten(const char **cstrs, const uint32_t &cstrs_count, const char &spacer) { if (NULL == cstrs || 0 == cstrs_count) { return std::string(); } std::string flattened; for (uint32_t i = 0; i < cstrs_count; ++i) { if (NULL == cstrs[i]) { return std::string(); } flattened += cstrs[i]; if (i != (cstrs_count - 1)) { flattened += spacer; } } return flattened; } std::list StringFactory::Tokenize(const char *cstr, const char &delim) { if (NULL == cstr) { return std::list(); } const std::string str = cstr; size_t start = 0; size_t end = 0; std::list tokens; while ((end = str.find(delim, start)) != std::string::npos) { if (start != end) { tokens.push_back(str.substr(start, end - start)); } start = end + 1; } if (str.size() > start) { tokens.push_back(str.substr(start)); } return tokens; } std::string StringFactory::ToLower(const std::string& str) { std::string lower(str.length(), ' '); std::transform(str.begin(), str.end(), lower.begin(), ::tolower); return lower; } std::string StringFactory::ToUpper(const std::string& str) { std::string upper(str.length(), ' '); std::transform(str.begin(), str.end(), upper.begin(), ::toupper); return upper; } //===----------------------------------------------------------------------===// // HelpPrinter, HelpStreambuf. // //===----------------------------------------------------------------------===// HelpStreambuf::HelpStreambuf(std::ostream& stream) : basicStream_(&stream), basicBuf_(stream.rdbuf()), wrapWidth_(0), indentSize_(0), atLineStart_(true), lineWidth_(0) { basicStream_->rdbuf(this); } HelpStreambuf::int_type HelpStreambuf::overflow(HelpStreambuf::int_type ch) { if (atLineStart_ && ch != '\n') { std::string indent(indentSize_, ' '); basicBuf_->sputn(indent.data(), indent.size()); lineWidth_ = indentSize_; atLineStart_ = false; } else if (ch == '\n') { atLineStart_ = true; lineWidth_ = 0; } if (wrapWidth_ > 0 && lineWidth_ == wrapWidth_) { basicBuf_->sputc('\n'); std::string indent(indentSize_, ' '); basicBuf_->sputn(indent.data(), indent.size()); lineWidth_ = indentSize_; atLineStart_ = false; } lineWidth_++; return basicBuf_->sputc(ch); } HelpPrinter& HelpPrinter::PrintUsage(const std::string& usage) { sbuf_.IndentSize(0); sbuf_.WrapWidth(0); Stream() << usage; if (usage.length() < USAGE_WIDTH) { Stream() << std::string(USAGE_WIDTH - usage.length(), ' '); } Stream() << std::string(PADDING_WIDTH, ' '); return *this; } HelpPrinter& HelpPrinter::PrintDescription(const std::string& description) { sbuf_.WrapWidth(USAGE_WIDTH + PADDING_WIDTH + DESCRIPTION_WIDTH); sbuf_.IndentSize(USAGE_WIDTH + PADDING_WIDTH); Stream() << description << std::endl; sbuf_.IndentSize(0); sbuf_.WrapWidth(0); return *this; } //===----------------------------------------------------------------------===// // ChoiceOptioin. // //===----------------------------------------------------------------------===// ChoiceOption::ChoiceOption(const std::string& name, const std::vector& choices, const std::string& help, std::ostream& error) : OptionBase(name, help, error) { for (const auto& choice: choices) { choices_.insert(choice); } } bool ChoiceOption::ProcessTokens(std::list &tokens) { assert(0 == name_.compare(tokens.front()) && "option name is mismatched"); if (2 != tokens.size()) { error() << "error: invalid option: \'" << name_ << '\'' << std::endl; return false; } tokens.pop_front(); if (0 == choices_.count(tokens.front())) { error() << "error: invalid option: \'" << name_ << '\'' << std::endl; return false; } is_set_ = true; value_ = tokens.front(); tokens.pop_front(); return true; } void ChoiceOption::PrintHelp(HelpPrinter& printer) const { std::string usage = "-" + name_ + "=["; bool first = true; for (const auto& choice: choices_) { if (!first) { usage += '|'; } else { first = false; } usage += choice; } usage += "]"; printer.PrintUsage(usage).PrintDescription(help_); } //===----------------------------------------------------------------------===// // PrefixOption. // //===----------------------------------------------------------------------===// bool PrefixOption::IsValid() const { return (0 < name_.size()) && (name_.find(':') == std::string::npos); } std::string::size_type PrefixOption::FindPrefix(const std::string& token) const { auto prefix = name_ + ':'; return token.find(prefix); } bool PrefixOption::Accept(const std::string& token) const { return (token.compare(0, name_.length(), name_) == 0) && token.length() > name_.length() && token[name_.length()] == ':'; } bool PrefixOption::ProcessTokens(std::list &tokens) { assert(1 <= tokens.size()); assert(Accept(tokens.front()) && "option name is mismatched"); std::string value = tokens.front(); tokens.pop_front(); value = value.substr(name_.length() + 1); for (const auto& token: tokens) { value += '='; value += token; } tokens.clear(); values_.push_back(value); is_set_ = true; return true; } void PrefixOption::PrintHelp(HelpPrinter& printer) const { printer.PrintUsage("-" + name_ + ":[value]").PrintDescription(help_); } //===----------------------------------------------------------------------===// // OptionParser. // //===----------------------------------------------------------------------===// std::vector::iterator OptionParser::FindOption(const std::string& name) { std::vector::iterator it = options_.begin(); std::vector::iterator end = options_.end(); for (; it != end; ++it) { if ((*it)->Accept(name)) { return it; } } return end; } bool OptionParser::AddOption(OptionBase *option) { if (NULL == option || !option->IsValid()) { return false; } if (FindOption(option->name()) != options_.end()) { return false; } options_.push_back(option); return true; } const std::string& OptionParser::Unknown() const { assert(collectUnknown_); return unknownOptions_; } bool OptionParser::ParseOptions(const char *options) { std::list tokens_l1 = StringFactory::Tokenize(options, ' '); if (0 == tokens_l1.size()) { return true; } std::list::iterator tokens_l1i = tokens_l1.begin(); while (tokens_l1i != tokens_l1.end()) { if ('-' == tokens_l1i->at(0)) { std::list::iterator option_begin = tokens_l1i; std::list tokens_l2; do { tokens_l2.push_back(*tokens_l1i); tokens_l1i++; } while (tokens_l1i != tokens_l1.end() && '-' != tokens_l1i->at(0)); std::list::iterator option_end = tokens_l1i; tokens_l2.front().erase(0, 1); if (1 == tokens_l2.size()) { tokens_l2 = StringFactory::Tokenize(tokens_l2.front().c_str(), '='); if (2 < tokens_l2.size()) { if (collectUnknown_) { unknownOptions_ += *tokens_l1i + " "; continue; } else { error() << "error: invalid option format: \'" << tokens_l2.front() << '\'' << std::endl; Reset(); return false; } } } auto find_status = FindOption(tokens_l2.front()); if (find_status == options_.end()) { if (collectUnknown_) { for (; option_begin != option_end; ++option_begin) { unknownOptions_ += *option_begin + " "; } continue; } else { error() << "error: unknown option: \'" << tokens_l2.front() << '\'' << std::endl; Reset(); return false; } } if (!(*find_status)->ProcessTokens(tokens_l2)) { Reset(); return false; } assert(0 == tokens_l2.size()); } else { if (collectUnknown_) { unknownOptions_ += *tokens_l1i + " "; } else { error() << "error: unknown option: \'" << *tokens_l1i << '\'' << std::endl; Reset(); return false; } } } return true; } void OptionParser::PrintHelp(std::ostream& out, const std::string& addition) const { HelpPrinter printer(out); for (const auto& option: options_) { option->PrintHelp(printer); } out << addition << std::endl; } void OptionParser::Reset() { unknownOptions_.clear(); for (auto &option : options_) { option->Reset(); } } } // namespace options } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/libamdhsacode/amd_options.hpp000066400000000000000000000331151420110115200232210ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef AMD_OPTIONS_HPP #define AMD_OPTIONS_HPP #include #include #include #include #include #include #include #include #include #include namespace rocr { namespace amd { namespace options { //===----------------------------------------------------------------------===// // StringFactory. // //===----------------------------------------------------------------------===// class StringFactory final { public: static std::string Flatten(const char **cstrs, const uint32_t &cstrs_count, const char &spacer = '\0'); static std::list Tokenize(const char *cstr, const char &delim); static std::string ToLower(const std::string& str); static std::string ToUpper(const std::string& str); }; //===----------------------------------------------------------------------===// // HelpPrinter, HelpStreambuf. // //===----------------------------------------------------------------------===// class HelpStreambuf : public std::streambuf { public: explicit HelpStreambuf(std::ostream& stream); virtual ~HelpStreambuf() { basicStream_->rdbuf(basicBuf_); } void IndentSize(unsigned indent) { assert(wrapWidth_ == 0 || indentSize_ < wrapWidth_); indentSize_ = indent; } void WrapWidth(unsigned wrap) { assert(wrapWidth_ == 0 || indentSize_ < wrapWidth_); wrapWidth_ = wrap; } protected: virtual int_type overflow(int_type ch) override; private: std::ostream* basicStream_; std::streambuf* basicBuf_; unsigned wrapWidth_; unsigned indentSize_; bool atLineStart_; unsigned lineWidth_; }; class HelpPrinter { private: static const unsigned USAGE_WIDTH = 30; static const unsigned PADDING_WIDTH = 2; static const unsigned DESCRIPTION_WIDTH = 50; public: HelpPrinter& PrintUsage(const std::string& usage); HelpPrinter& PrintDescription(const std::string& description); std::ostream& Stream() { return *out_; } private: explicit HelpPrinter(std::ostream& out = std::cout) : out_(&out), sbuf_(*out_) {} /// @brief Not copy-constructible. HelpPrinter(const HelpPrinter&); /// @brief Not copy-assignable. HelpPrinter& operator =(const HelpPrinter&); friend class OptionParser; std::ostream *out_; HelpStreambuf sbuf_; }; //===----------------------------------------------------------------------===// // OptionBase. // //===----------------------------------------------------------------------===// class OptionBase { public: virtual ~OptionBase() {} const std::string& name() const { return name_; } const bool& is_set() const { return is_set_; } virtual bool IsValid() const { return 0 < name_.size(); } protected: explicit OptionBase(const std::string& name, const std::string& help = "", std::ostream &error = std::cerr) : name_(name), help_(help), is_set_(false), error_(&error) {} virtual void PrintHelp(HelpPrinter& printer) const = 0; virtual bool Accept(const std::string& name) const { return name_ == name; } const std::string name_; const std::string help_; bool is_set_; std::ostream &error() const { return *error_; } private: /// @brief Not copy-constructible. OptionBase(const OptionBase &ob); /// @brief Not copy-assignable. OptionBase& operator=(const OptionBase &ob); void Reset() { is_set_ = false; } virtual bool ProcessTokens(std::list &tokens) = 0; friend class OptionParser; mutable std::ostream *error_; }; //===----------------------------------------------------------------------===// // Option. // //===----------------------------------------------------------------------===// template class Option final: public OptionBase { public: explicit Option(const std::string& name, const std::string& help = "", std::ostream& error = std::cerr): OptionBase(name, help, error) {} ~Option() {} const std::list& values() const { return values_; } protected: virtual void PrintHelp(HelpPrinter& printer) const override; private: /// @brief Not copy-constructible. Option(const Option &o); /// @brief Not copy-assignable. Option& operator=(const Option &o); bool ProcessTokens(std::list &tokens); std::list values_; }; template bool Option::ProcessTokens(std::list &tokens) { assert(0 == name_.compare(tokens.front()) && "option name is mismatched"); if (2 > tokens.size()) { error() << "error: invalid option: \'" << name_ << '\'' << std::endl; return false; } is_set_ = true; tokens.pop_front(); while (!tokens.empty()) { std::istringstream token_stream(tokens.front()); if (!token_stream.good()) { error() << "error: invalid option: \'" << name_ << '\'' << std::endl; return false; } T value; token_stream >> value; values_.push_back(value); tokens.pop_front(); } return true; } template void Option::PrintHelp(HelpPrinter& printer) const { printer.PrintUsage("-" + name_ + " [" + StringFactory::ToUpper(name_) + "s]") .PrintDescription(help_); } //===----------------------------------------------------------------------===// // ValueOption. // //===----------------------------------------------------------------------===// template class ValueOption final: public OptionBase { public: explicit ValueOption(const std::string& name, const std::string& help = "", std::ostream& error = std::cerr): OptionBase(name, help, error) {} ~ValueOption() {} const T& value() const { return value_; } protected: void PrintHelp(HelpPrinter& printer) const override; private: /// @brief Not copy-constructible. ValueOption(const ValueOption &o); /// @brief Not copy-assignable. ValueOption& operator=(const ValueOption &o); bool ProcessTokens(std::list &tokens) override; T value_; }; template bool ValueOption::ProcessTokens(std::list &tokens) { assert(0 == name_.compare(tokens.front()) && "option name is mismatched"); if (2 != tokens.size()) { error() << "error: invalid option: \'" << name_ << '\'' << std::endl; return false; } is_set_ = true; tokens.pop_front(); std::istringstream token_stream(tokens.front()); if (!token_stream.good()) { error() << "error: invalid option: \'" << name_ << '\'' << std::endl; return false; } token_stream >> value_; tokens.pop_front(); return true; } template void ValueOption::PrintHelp(HelpPrinter& printer) const { printer.PrintUsage("-" + name_ + "=[VAL]") .PrintDescription(help_); } //===----------------------------------------------------------------------===// // ChoiceOptioin. // //===----------------------------------------------------------------------===// class ChoiceOption final: public OptionBase { public: ChoiceOption(const std::string& name, const std::vector& choices, const std::string& help = "", std::ostream& error = std::cerr); ~ChoiceOption() {} const std::string& value() const { return value_; } protected: void PrintHelp(HelpPrinter& printer) const override; private: /// @brief Not copy-constructible. ChoiceOption(const ChoiceOption&); /// @brief Not copy-assignable. ChoiceOption& operator =(const ChoiceOption&); bool ProcessTokens(std::list &tokens) override; std::unordered_set choices_; std::string value_; }; //===----------------------------------------------------------------------===// // Option. // //===----------------------------------------------------------------------===// class NoArgOption final: public OptionBase { public: explicit NoArgOption(const std::string& name, const std::string& help = "", std::ostream& error = std::cerr): OptionBase(name, help, error) {} ~NoArgOption() {} protected: void PrintHelp(HelpPrinter& printer) const override { printer.PrintUsage("-" + name_).PrintDescription(help_); } private: /// @brief Not copy-constructible. NoArgOption(const NoArgOption &o); /// @brief Not copy-assignable. NoArgOption& operator=(const NoArgOption &o); bool ProcessTokens(std::list &tokens) override { assert(0 == name_.compare(tokens.front()) && "option name is mismatched"); if (1 == tokens.size()) { tokens.pop_front(); is_set_ = true; return true; } else if (2 == tokens.size()) { tokens.pop_front(); if (tokens.front() == "1") { is_set_ = true; tokens.pop_front(); return true; } else if (tokens.front() == "0") { is_set_ = false; tokens.pop_front(); return true; } } error() << "error: invalid option: '" << name_ << "'" << std::endl; return false; } }; //===----------------------------------------------------------------------===// // PrefixOption. // //===----------------------------------------------------------------------===// class PrefixOption final: public OptionBase { public: PrefixOption(const std::string& prefix, const std::string& help = "", std::ostream& error = std::cerr) : OptionBase(prefix, help, error) {} ~PrefixOption() {} const std::vector& values() const { return values_; } bool IsValid() const override; protected: void PrintHelp(HelpPrinter& printer) const override; bool Accept(const std::string& token) const override; private: /// @brief Not copy-constructible. PrefixOption(const PrefixOption&); /// @brief Not copy-assignable. PrefixOption& operator =(const PrefixOption&); bool ProcessTokens(std::list& tokens) override; std::string::size_type FindPrefix(const std::string& token) const; std::vector values_; }; //===----------------------------------------------------------------------===// // OptionParser. // //===----------------------------------------------------------------------===// class OptionParser final { public: explicit OptionParser(bool collectUnknown = false, std::ostream& error = std::cerr) : collectUnknown_(collectUnknown), error_(&error) {} ~OptionParser() {} bool AddOption(OptionBase *option); bool ParseOptions(const char *options); const std::string& Unknown() const; void CollectUnknown(bool b) { collectUnknown_ = b; } void PrintHelp(std::ostream& out, const std::string& addition = "") const; void Reset(); private: /// @brief Not copy-constructible. OptionParser(const OptionParser &op); /// @brief Not copy-assignable. OptionParser& operator=(const OptionParser &op); std::ostream& error() { return *error_; } std::vector::iterator FindOption(const std::string& name); std::vector options_; std::string unknownOptions_; bool collectUnknown_; std::ostream *error_; }; } // namespace options } // namespace amd } // namespace rocr #endif // AMD_OPTIONS_HPP ROCR-Runtime-rocm-5.0.0/src/loader/000077500000000000000000000000001420110115200166605ustar00rootroot00000000000000ROCR-Runtime-rocm-5.0.0/src/loader/AMDHSAKernelDescriptor.h000066400000000000000000000207321420110115200231720ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H #define LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H #include #include // Gets offset of specified member in specified type. #ifndef offsetof #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE*)0)->MEMBER) #endif // offsetof // Creates enumeration entries used for packing bits into integers. Enumeration // entries include bit shift amount, bit width, and bit mask. #ifndef AMDHSA_BITS_ENUM_ENTRY #define AMDHSA_BITS_ENUM_ENTRY(NAME, SHIFT, WIDTH) \ NAME ## _SHIFT = (SHIFT), \ NAME ## _WIDTH = (WIDTH), \ NAME = (((1 << (WIDTH)) - 1) << (SHIFT)) #endif // AMDHSA_BITS_ENUM_ENTRY // Gets bits for specified bit mask from specified source. #ifndef AMDHSA_BITS_GET #define AMDHSA_BITS_GET(SRC, MSK) ((SRC & MSK) >> MSK ## _SHIFT) #endif // AMDHSA_BITS_GET // Sets bits for specified bit mask in specified destination. #ifndef AMDHSA_BITS_SET #define AMDHSA_BITS_SET(DST, MSK, VAL) \ DST &= ~MSK; \ DST |= ((VAL << MSK ## _SHIFT) & MSK) #endif // AMDHSA_BITS_SET namespace rocr { namespace llvm { namespace amdhsa { // Floating point rounding modes. Must match hardware definition. enum : uint8_t { FLOAT_ROUND_MODE_NEAR_EVEN = 0, FLOAT_ROUND_MODE_PLUS_INFINITY = 1, FLOAT_ROUND_MODE_MINUS_INFINITY = 2, FLOAT_ROUND_MODE_ZERO = 3, }; // Floating point denorm modes. Must match hardware definition. enum : uint8_t { FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0, FLOAT_DENORM_MODE_FLUSH_DST = 1, FLOAT_DENORM_MODE_FLUSH_SRC = 2, FLOAT_DENORM_MODE_FLUSH_NONE = 3, }; // System VGPR workitem IDs. Must match hardware definition. enum : uint8_t { SYSTEM_VGPR_WORKITEM_ID_X = 0, SYSTEM_VGPR_WORKITEM_ID_X_Y = 1, SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2, SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3, }; // Compute program resource register 1. Must match hardware definition. #define COMPUTE_PGM_RSRC1(NAME, SHIFT, WIDTH) \ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_ ## NAME, SHIFT, WIDTH) enum : int32_t { COMPUTE_PGM_RSRC1(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6), COMPUTE_PGM_RSRC1(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4), COMPUTE_PGM_RSRC1(PRIORITY, 10, 2), COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_32, 12, 2), COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_16_64, 14, 2), COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_32, 16, 2), COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_16_64, 18, 2), COMPUTE_PGM_RSRC1(PRIV, 20, 1), COMPUTE_PGM_RSRC1(ENABLE_DX10_CLAMP, 21, 1), COMPUTE_PGM_RSRC1(DEBUG_MODE, 22, 1), COMPUTE_PGM_RSRC1(ENABLE_IEEE_MODE, 23, 1), COMPUTE_PGM_RSRC1(BULKY, 24, 1), COMPUTE_PGM_RSRC1(CDBG_USER, 25, 1), COMPUTE_PGM_RSRC1(FP16_OVFL, 26, 1), // GFX9+ COMPUTE_PGM_RSRC1(RESERVED0, 27, 5), }; #undef COMPUTE_PGM_RSRC1 // Compute program resource register 2. Must match hardware definition. #define COMPUTE_PGM_RSRC2(NAME, SHIFT, WIDTH) \ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_ ## NAME, SHIFT, WIDTH) enum : int32_t { COMPUTE_PGM_RSRC2(ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET, 0, 1), COMPUTE_PGM_RSRC2(USER_SGPR_COUNT, 1, 5), COMPUTE_PGM_RSRC2(ENABLE_TRAP_HANDLER, 6, 1), COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1), COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1), COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1), COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_INFO, 10, 1), COMPUTE_PGM_RSRC2(ENABLE_VGPR_WORKITEM_ID, 11, 2), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_MEMORY, 14, 1), COMPUTE_PGM_RSRC2(GRANULATED_LDS_SIZE, 15, 9), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1), COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1), COMPUTE_PGM_RSRC2(RESERVED0, 31, 1), }; #undef COMPUTE_PGM_RSRC2 // Kernel code properties. Must be kept backwards compatible. #define KERNEL_CODE_PROPERTY(NAME, SHIFT, WIDTH) \ AMDHSA_BITS_ENUM_ENTRY(KERNEL_CODE_PROPERTY_ ## NAME, SHIFT, WIDTH) enum : int32_t { KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER, 0, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_PTR, 1, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_QUEUE_PTR, 2, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_KERNARG_SEGMENT_PTR, 3, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1), KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1), KERNEL_CODE_PROPERTY(RESERVED0, 7, 9), }; #undef KERNEL_CODE_PROPERTY // Kernel descriptor. Must be kept backwards compatible. struct kernel_descriptor_t { uint32_t group_segment_fixed_size; uint32_t private_segment_fixed_size; uint32_t kernarg_size; uint8_t reserved0[4]; int64_t kernel_code_entry_byte_offset; uint8_t reserved1[24]; uint32_t compute_pgm_rsrc1; uint32_t compute_pgm_rsrc2; uint16_t kernel_code_properties; uint8_t reserved2[6]; }; static_assert( sizeof(kernel_descriptor_t) == 64, "invalid size for kernel_descriptor_t"); static_assert( offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0, "invalid offset for group_segment_fixed_size"); static_assert( offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4, "invalid offset for private_segment_fixed_size"); static_assert( offsetof(kernel_descriptor_t, kernarg_size) == 8, "invalid offset for kernarg_size"); static_assert( offsetof(kernel_descriptor_t, reserved0) == 12, "invalid offset for reserved0"); static_assert( offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) == 16, "invalid offset for kernel_code_entry_byte_offset"); static_assert( offsetof(kernel_descriptor_t, reserved1) == 24, "invalid offset for reserved1"); static_assert( offsetof(kernel_descriptor_t, compute_pgm_rsrc1) == 48, "invalid offset for compute_pgm_rsrc1"); static_assert( offsetof(kernel_descriptor_t, compute_pgm_rsrc2) == 52, "invalid offset for compute_pgm_rsrc2"); static_assert( offsetof(kernel_descriptor_t, kernel_code_properties) == 56, "invalid offset for kernel_code_properties"); static_assert( offsetof(kernel_descriptor_t, reserved2) == 58, "invalid offset for reserved2"); } // end namespace amdhsa } // end namespace llvm } // end namespace rocr #endif // LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H ROCR-Runtime-rocm-5.0.0/src/loader/executable.cpp000066400000000000000000001744071420110115200215220ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #include "executable.hpp" #include #include #include #include #include #include #include #include #include #include #include #include "inc/amd_hsa_elf.h" #include "inc/amd_hsa_kernel_code.h" #include "core/inc/amd_hsa_code.hpp" #include "amd_hsa_code_util.hpp" #include "amd_options.hpp" #include "core/util/utils.h" #include "AMDHSAKernelDescriptor.h" using namespace rocr::amd::hsa; using namespace rocr::amd::hsa::common; // Having a side effect prevents call site optimization that allows removal of a noinline function call // with no side effect. __attribute__((noinline)) static void _loader_debug_state() { static volatile int function_needs_a_side_effect = 0; function_needs_a_side_effect ^= 1; } // r_version history: // 1: Initial debug protocol // 2: New trap handler ABI. The reason for halting a wave is recorded in ttmp11[8:7]. // 3: New trap handler ABI. A wave halted at S_ENDPGM rewinds its PC by 8 bytes, and sets ttmp11[9]=1. // 4: New trap handler ABI. Save the trap id in ttmp11[16:9] // 5: New trap handler ABI. Save the PC in ttmp11[22:7] ttmp6[31:0], and park the wave if stopped // 6: New trap handler ABI. ttmp6[25:0] contains dispatch index modulo queue size // 7: New trap handler ABI. Send interrupts as a bitmask, coalescing concurrent exceptions. HSA_API r_debug _amdgpu_r_debug = {7, nullptr, reinterpret_cast(&_loader_debug_state), r_debug::RT_CONSISTENT, 0}; static link_map* r_debug_tail = nullptr; namespace rocr { namespace amd { namespace hsa { namespace loader { class LoaderOptions { public: explicit LoaderOptions(std::ostream &error = std::cerr); const amd::options::NoArgOption* Help() const { return &help; } const amd::options::NoArgOption* DumpCode() const { return &dump_code; } const amd::options::NoArgOption* DumpIsa() const { return &dump_isa; } const amd::options::NoArgOption* DumpExec() const { return &dump_exec; } const amd::options::NoArgOption* DumpAll() const { return &dump_all; } const amd::options::ValueOption* DumpDir() const { return &dump_dir; } const amd::options::PrefixOption* Substitute() const { return &substitute; } bool ParseOptions(const std::string& options); void Reset(); void PrintHelp(std::ostream& out) const; private: /// @brief Copy constructor - not available. LoaderOptions(const LoaderOptions&); /// @brief Assignment operator - not available. LoaderOptions& operator=(const LoaderOptions&); amd::options::NoArgOption help; amd::options::NoArgOption dump_code; amd::options::NoArgOption dump_isa; amd::options::NoArgOption dump_exec; amd::options::NoArgOption dump_all; amd::options::ValueOption dump_dir; amd::options::PrefixOption substitute; amd::options::OptionParser option_parser; }; LoaderOptions::LoaderOptions(std::ostream& error) : help("help", "print help"), dump_code("dump-code", "Dump finalizer output code object"), dump_isa("dump-isa", "Dump finalizer output to ISA text file"), dump_exec("dump-exec", "Dump executable to text file"), dump_all("dump-all", "Dump all finalizer input and output (as above)"), dump_dir("dump-dir", "Dump directory"), substitute("substitute", "Substitute code object with given index or index range on loading from file"), option_parser(false, error) { option_parser.AddOption(&help); option_parser.AddOption(&dump_code); option_parser.AddOption(&dump_isa); option_parser.AddOption(&dump_exec); option_parser.AddOption(&dump_all); option_parser.AddOption(&dump_dir); option_parser.AddOption(&substitute); } bool LoaderOptions::ParseOptions(const std::string& options) { return option_parser.ParseOptions(options.c_str()); } void LoaderOptions::Reset() { option_parser.Reset(); } void LoaderOptions::PrintHelp(std::ostream& out) const { option_parser.PrintHelp(out); } static const char *LOADER_DUMP_PREFIX = "amdcode"; Loader* Loader::Create(Context* context) { return new AmdHsaCodeLoader(context); } void Loader::Destroy(Loader *loader) { // Loader resets the link_map, but the executables and loaded code objects are not deleted. _amdgpu_r_debug.r_map = nullptr; _amdgpu_r_debug.r_state = r_debug::RT_CONSISTENT; r_debug_tail = nullptr; delete loader; } Executable* AmdHsaCodeLoader::CreateExecutable( hsa_profile_t profile, const char *options, hsa_default_float_rounding_mode_t default_float_rounding_mode) { WriterLockGuard writer_lock(rw_lock_); executables.push_back(new ExecutableImpl(profile, context, executables.size(), default_float_rounding_mode)); return executables.back(); } static void AddCodeObjectInfoIntoDebugMap(link_map* map) { if (r_debug_tail) { r_debug_tail->l_next = map; map->l_prev = r_debug_tail; map->l_next = nullptr; } else { _amdgpu_r_debug.r_map = map; map->l_prev = nullptr; map->l_next = nullptr; } r_debug_tail = map; } static void RemoveCodeObjectInfoFromDebugMap(link_map* map) { if (r_debug_tail == map) { r_debug_tail = map->l_prev; } if (_amdgpu_r_debug.r_map == map) { _amdgpu_r_debug.r_map = map->l_next; } if (map->l_prev) { map->l_prev->l_next = map->l_next; } if (map->l_next) { map->l_next->l_prev = map->l_prev; } free(map->l_name); memset(map, 0, sizeof(link_map)); } hsa_status_t AmdHsaCodeLoader::FreezeExecutable(Executable *executable, const char *options) { hsa_status_t status = executable->Freeze(options); if (status != HSA_STATUS_SUCCESS) { return status; } // Assuming runtime atomic implements C++ std::memory_order WriterLockGuard writer_lock(rw_lock_); atomic::Store(&_amdgpu_r_debug.r_state, r_debug::RT_ADD, std::memory_order_relaxed); atomic::Fence(std::memory_order_acq_rel); _loader_debug_state(); atomic::Fence(std::memory_order_acq_rel); for (auto &lco : reinterpret_cast(executable)->loaded_code_objects) { AddCodeObjectInfoIntoDebugMap(&(lco->r_debug_info)); } atomic::Store(&_amdgpu_r_debug.r_state, r_debug::RT_CONSISTENT, std::memory_order_release); _loader_debug_state(); return HSA_STATUS_SUCCESS; } void AmdHsaCodeLoader::DestroyExecutable(Executable *executable) { // Assuming runtime atomic implements C++ std::memory_order WriterLockGuard writer_lock(rw_lock_); atomic::Store(&_amdgpu_r_debug.r_state, r_debug::RT_DELETE, std::memory_order_relaxed); atomic::Fence(std::memory_order_acq_rel); _loader_debug_state(); atomic::Fence(std::memory_order_acq_rel); for (auto &lco : reinterpret_cast(executable)->loaded_code_objects) { RemoveCodeObjectInfoFromDebugMap(&(lco->r_debug_info)); } atomic::Store(&_amdgpu_r_debug.r_state, r_debug::RT_CONSISTENT, std::memory_order_release); _loader_debug_state(); executables[((ExecutableImpl*)executable)->id()] = nullptr; delete executable; } hsa_status_t AmdHsaCodeLoader::IterateExecutables( hsa_status_t (*callback)( hsa_executable_t executable, void *data), void *data) { WriterLockGuard writer_lock(rw_lock_); assert(callback); for (auto &exec : executables) { hsa_status_t status = callback(Executable::Handle(exec), data); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } hsa_status_t AmdHsaCodeLoader::QuerySegmentDescriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors) { if (!num_segment_descriptors) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (*num_segment_descriptors == 0 && segment_descriptors) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } if (*num_segment_descriptors != 0 && !segment_descriptors) { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } this->EnableReadOnlyMode(); size_t actual_num_segment_descriptors = 0; for (auto &executable : executables) { if (executable) { actual_num_segment_descriptors += executable->GetNumSegmentDescriptors(); } } if (*num_segment_descriptors == 0) { *num_segment_descriptors = actual_num_segment_descriptors; this->DisableReadOnlyMode(); return HSA_STATUS_SUCCESS; } if (*num_segment_descriptors != actual_num_segment_descriptors) { this->DisableReadOnlyMode(); return HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS; } size_t i = 0; for (auto &executable : executables) { if (executable) { i += executable->QuerySegmentDescriptors(segment_descriptors, actual_num_segment_descriptors, i); } } this->DisableReadOnlyMode(); return HSA_STATUS_SUCCESS; } uint64_t AmdHsaCodeLoader::FindHostAddress(uint64_t device_address) { ReaderLockGuard reader_lock(rw_lock_); if (device_address == 0) { return 0; } for (auto &exec : executables) { if (exec != nullptr) { uint64_t host_address = exec->FindHostAddress(device_address); if (host_address != 0) { return host_address; } } } return 0; } void AmdHsaCodeLoader::PrintHelp(std::ostream& out) { LoaderOptions().PrintHelp(out); } void AmdHsaCodeLoader::EnableReadOnlyMode() { rw_lock_.ReaderLock(); for (auto &executable : executables) { if (executable) { ((ExecutableImpl*)executable)->EnableReadOnlyMode(); } } } void AmdHsaCodeLoader::DisableReadOnlyMode() { rw_lock_.ReaderUnlock(); for (auto &executable : executables) { if (executable) { ((ExecutableImpl*)executable)->DisableReadOnlyMode(); } } } //===----------------------------------------------------------------------===// // SymbolImpl. // //===----------------------------------------------------------------------===// bool SymbolImpl::GetInfo(hsa_symbol_info32_t symbol_info, void *value) { static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_TYPE) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_TYPE)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_TYPE) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_TYPE)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_NAME_LENGTH) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_NAME) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_NAME)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_MODULE_NAME_LENGTH) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME_LENGTH)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_MODULE_NAME) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_LINKAGE) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_LINKAGE)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_IS_DEFINITION) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_IS_DEFINITION)), "attributes are not compatible" ); assert(value); switch (symbol_info) { case HSA_CODE_SYMBOL_INFO_TYPE: { *((hsa_symbol_kind_t*)value) = kind; break; } case HSA_CODE_SYMBOL_INFO_NAME_LENGTH: { *((uint32_t*)value) = symbol_name.size(); break; } case HSA_CODE_SYMBOL_INFO_NAME: { memset(value, 0x0, symbol_name.size()); memcpy(value, symbol_name.c_str(), symbol_name.size()); break; } case HSA_CODE_SYMBOL_INFO_MODULE_NAME_LENGTH: { *((uint32_t*)value) = module_name.size(); break; } case HSA_CODE_SYMBOL_INFO_MODULE_NAME: { memset(value, 0x0, module_name.size()); memcpy(value, module_name.c_str(), module_name.size()); break; } case HSA_CODE_SYMBOL_INFO_LINKAGE: { *((hsa_symbol_linkage_t*)value) = linkage; break; } case HSA_CODE_SYMBOL_INFO_IS_DEFINITION: { *((bool*)value) = is_definition; break; } case HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_CALL_CONVENTION: { *((uint32_t*)value) = 0; break; } case HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT: case HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS: { if (!is_loaded) { return false; } *((uint64_t*)value) = address; break; } case HSA_EXECUTABLE_SYMBOL_INFO_AGENT: { if (!is_loaded) { return false; } *((hsa_agent_t*)value) = agent; break; } default: { return false; } } return true; } //===----------------------------------------------------------------------===// // KernelSymbol. // //===----------------------------------------------------------------------===// bool KernelSymbol::GetInfo(hsa_symbol_info32_t symbol_info, void *value) { static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK)), "attributes are not compatible" ); assert(value); switch (symbol_info) { case HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE: { *((uint32_t*)value) = kernarg_segment_size; break; } case HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT: { *((uint32_t*)value) = kernarg_segment_alignment; break; } case HSA_CODE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE: { *((uint32_t*)value) = group_segment_size; break; } case HSA_CODE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE: { *((uint32_t*)value) = private_segment_size; break; } case HSA_CODE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK: { *((bool*)value) = is_dynamic_callstack; break; } case HSA_EXT_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT_SIZE: { *((uint32_t*)value) = size; break; } case HSA_EXT_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT_ALIGN: { *((uint32_t*)value) = alignment; break; } default: { return SymbolImpl::GetInfo(symbol_info, value); } } return true; } //===----------------------------------------------------------------------===// // VariableSymbol. // //===----------------------------------------------------------------------===// bool VariableSymbol::GetInfo(hsa_symbol_info32_t symbol_info, void *value) { static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_VARIABLE_ALLOCATION) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALLOCATION)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_VARIABLE_SEGMENT) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SEGMENT)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_VARIABLE_ALIGNMENT) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALIGNMENT)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_VARIABLE_SIZE) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE)), "attributes are not compatible" ); static_assert( (symbol_attribute32_t(HSA_CODE_SYMBOL_INFO_VARIABLE_IS_CONST) == symbol_attribute32_t(HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_IS_CONST)), "attributes are not compatible" ); switch (symbol_info) { case HSA_CODE_SYMBOL_INFO_VARIABLE_ALLOCATION: { *((hsa_variable_allocation_t*)value) = allocation; break; } case HSA_CODE_SYMBOL_INFO_VARIABLE_SEGMENT: { *((hsa_variable_segment_t*)value) = segment; break; } case HSA_CODE_SYMBOL_INFO_VARIABLE_ALIGNMENT: { *((uint32_t*)value) = alignment; break; } case HSA_CODE_SYMBOL_INFO_VARIABLE_SIZE: { *((uint32_t*)value) = size; break; } case HSA_CODE_SYMBOL_INFO_VARIABLE_IS_CONST: { *((bool*)value) = is_constant; break; } default: { return SymbolImpl::GetInfo(symbol_info, value); } } return true; } bool LoadedCodeObjectImpl::GetInfo(amd_loaded_code_object_info_t attribute, void *value) { assert(value); switch (attribute) { case AMD_LOADED_CODE_OBJECT_INFO_ELF_IMAGE: ((hsa_code_object_t*)value)->handle = reinterpret_cast(elf_data); break; case AMD_LOADED_CODE_OBJECT_INFO_ELF_IMAGE_SIZE: *((size_t*)value) = elf_size; break; default: { return false; } } return true; } hsa_status_t LoadedCodeObjectImpl::IterateLoadedSegments( hsa_status_t (*callback)( amd_loaded_segment_t loaded_segment, void *data), void *data) { assert(callback); for (auto &loaded_segment : loaded_segments) { hsa_status_t status = callback(LoadedSegment::Handle(loaded_segment), data); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } void LoadedCodeObjectImpl::Print(std::ostream& out) { out << "Code Object" << std::endl; } bool Segment::GetInfo(amd_loaded_segment_info_t attribute, void *value) { assert(value); switch (attribute) { case AMD_LOADED_SEGMENT_INFO_TYPE: { *((amdgpu_hsa_elf_segment_t*)value) = segment; break; } case AMD_LOADED_SEGMENT_INFO_ELF_BASE_ADDRESS: { *((uint64_t*)value) = vaddr; break; } case AMD_LOADED_SEGMENT_INFO_LOAD_BASE_ADDRESS: { *((uint64_t*)value) = reinterpret_cast(this->Address(this->VAddr())); break; } case AMD_LOADED_SEGMENT_INFO_SIZE: { *((size_t*)value) = size; break; } default: { return false; } } return true; } uint64_t Segment::Offset(uint64_t addr) { assert(IsAddressInSegment(addr)); return addr - vaddr; } void* Segment::Address(uint64_t addr) { return owner->context()->SegmentAddress(segment, agent, ptr, Offset(addr)); } bool Segment::Freeze() { return !frozen ? (frozen = owner->context()->SegmentFreeze(segment, agent, ptr, size)) : true; } bool Segment::IsAddressInSegment(uint64_t addr) { return vaddr <= addr && addr < vaddr + size; } void Segment::Copy(uint64_t addr, const void* src, size_t size) { // loader must do copies before freezing. assert(!frozen); if (size > 0) { owner->context()->SegmentCopy(segment, agent, ptr, Offset(addr), src, size); } } void Segment::Print(std::ostream& out) { out << "Segment" << std::endl << " Type: " << AmdHsaElfSegmentToString(segment) << " Size: " << size << " VAddr: " << vaddr << std::endl << " Ptr: " << std::hex << ptr << std::dec << std::endl; } void Segment::Destroy() { owner->context()->SegmentFree(segment, agent, ptr, size); } //===----------------------------------------------------------------------===// // ExecutableImpl. // //===----------------------------------------------------------------------===// ExecutableImpl::ExecutableImpl( const hsa_profile_t &_profile, Context *context, size_t id, hsa_default_float_rounding_mode_t default_float_rounding_mode) : Executable() , profile_(_profile) , context_(context) , id_(id) , default_float_rounding_mode_(default_float_rounding_mode) , state_(HSA_EXECUTABLE_STATE_UNFROZEN) , program_allocation_segment(nullptr) { } ExecutableImpl::~ExecutableImpl() { for (ExecutableObject* o : objects) { o->Destroy(); delete o; } objects.clear(); for (auto &symbol_entry : program_symbols_) { delete symbol_entry.second; } for (auto &symbol_entry : agent_symbols_) { delete symbol_entry.second; } } hsa_status_t ExecutableImpl::DefineProgramExternalVariable( const char *name, void *address) { WriterLockGuard writer_lock(rw_lock_); assert(name); if (HSA_EXECUTABLE_STATE_FROZEN == state_) { return HSA_STATUS_ERROR_FROZEN_EXECUTABLE; } auto symbol_entry = program_symbols_.find(std::string(name)); if (symbol_entry != program_symbols_.end()) { return HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED; } program_symbols_.insert( std::make_pair(std::string(name), new VariableSymbol(true, "", // Only program linkage symbols can be // defined. std::string(name), HSA_SYMBOL_LINKAGE_PROGRAM, true, HSA_VARIABLE_ALLOCATION_PROGRAM, HSA_VARIABLE_SEGMENT_GLOBAL, 0, // TODO: size. 0, // TODO: align. false, // TODO: const. true, reinterpret_cast(address)))); return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::DefineAgentExternalVariable( const char *name, hsa_agent_t agent, hsa_variable_segment_t segment, void *address) { WriterLockGuard writer_lock(rw_lock_); assert(name); if (HSA_EXECUTABLE_STATE_FROZEN == state_) { return HSA_STATUS_ERROR_FROZEN_EXECUTABLE; } auto symbol_entry = agent_symbols_.find(std::make_pair(std::string(name), agent)); if (symbol_entry != agent_symbols_.end()) { return HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED; } auto insert_status = agent_symbols_.insert( std::make_pair(std::make_pair(std::string(name), agent), new VariableSymbol(true, "", // Only program linkage symbols can be // defined. std::string(name), HSA_SYMBOL_LINKAGE_PROGRAM, true, HSA_VARIABLE_ALLOCATION_AGENT, segment, 0, // TODO: size. 0, // TODO: align. false, // TODO: const. true, reinterpret_cast(address)))); assert(insert_status.second); insert_status.first->second->agent = agent; return HSA_STATUS_SUCCESS; } bool ExecutableImpl::IsProgramSymbol(const char *symbol_name) { assert(symbol_name); ReaderLockGuard reader_lock(rw_lock_); return program_symbols_.find(std::string(symbol_name)) != program_symbols_.end(); } Symbol* ExecutableImpl::GetSymbol( const char *symbol_name, const hsa_agent_t *agent) { ReaderLockGuard reader_lock(rw_lock_); return this->GetSymbolInternal(symbol_name, agent); } Symbol* ExecutableImpl::GetSymbolInternal( const char *symbol_name, const hsa_agent_t *agent) { assert(symbol_name); std::string mangled_name = std::string(symbol_name); if (mangled_name.empty()) { return nullptr; } if (!agent) { auto program_symbol = program_symbols_.find(mangled_name); if (program_symbol != program_symbols_.end()) { return program_symbol->second; } return nullptr; } auto agent_symbol = agent_symbols_.find(std::make_pair(mangled_name, *agent)); if (agent_symbol != agent_symbols_.end()) { return agent_symbol->second; } return nullptr; } hsa_status_t ExecutableImpl::IterateSymbols( iterate_symbols_f callback, void *data) { ReaderLockGuard reader_lock(rw_lock_); assert(callback); for (auto &symbol_entry : program_symbols_) { hsa_status_t hsc = callback(Executable::Handle(this), Symbol::Handle(symbol_entry.second), data); if (HSA_STATUS_SUCCESS != hsc) { return hsc; } } for (auto &symbol_entry : agent_symbols_) { hsa_status_t hsc = callback(Executable::Handle(this), Symbol::Handle(symbol_entry.second), data); if (HSA_STATUS_SUCCESS != hsc) { return hsc; } } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::IterateAgentSymbols( hsa_agent_t agent, hsa_status_t (*callback)(hsa_executable_t exec, hsa_agent_t agent, hsa_executable_symbol_t symbol, void *data), void *data) { ReaderLockGuard reader_lock(rw_lock_); assert(callback); for (auto &symbol_entry : agent_symbols_) { if (symbol_entry.second->GetAgent().handle != agent.handle) { continue; } hsa_status_t status = callback( Executable::Handle(this), agent, Symbol::Handle(symbol_entry.second), data); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::IterateProgramSymbols( hsa_status_t (*callback)(hsa_executable_t exec, hsa_executable_symbol_t symbol, void *data), void *data) { ReaderLockGuard reader_lock(rw_lock_); assert(callback); for (auto &symbol_entry : program_symbols_) { hsa_status_t status = callback( Executable::Handle(this), Symbol::Handle(symbol_entry.second), data); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::IterateLoadedCodeObjects( hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data) { ReaderLockGuard reader_lock(rw_lock_); assert(callback); for (auto &loaded_code_object : loaded_code_objects) { hsa_status_t status = callback( Executable::Handle(this), LoadedCodeObject::Handle(loaded_code_object), data); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } size_t ExecutableImpl::GetNumSegmentDescriptors() { // assuming we are in readonly mode. size_t actual_num_segment_descriptors = 0; for (auto &obj : loaded_code_objects) { actual_num_segment_descriptors += obj->LoadedSegments().size(); } return actual_num_segment_descriptors; } size_t ExecutableImpl::QuerySegmentDescriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t total_num_segment_descriptors, size_t first_empty_segment_descriptor) { // assuming we are in readonly mode. assert(segment_descriptors); assert(first_empty_segment_descriptor < total_num_segment_descriptors); size_t i = first_empty_segment_descriptor; for (auto &obj : loaded_code_objects) { assert(i < total_num_segment_descriptors); for (auto &seg : obj->LoadedSegments()) { segment_descriptors[i].agent = seg->Agent(); segment_descriptors[i].executable = Executable::Handle(seg->Owner()); segment_descriptors[i].code_object_storage_type = HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY; segment_descriptors[i].code_object_storage_base = obj->ElfData(); segment_descriptors[i].code_object_storage_size = obj->ElfSize(); segment_descriptors[i].code_object_storage_offset = seg->StorageOffset(); segment_descriptors[i].segment_base = seg->Address(seg->VAddr()); segment_descriptors[i].segment_size = seg->Size(); ++i; } } return i - first_empty_segment_descriptor; } hsa_agent_t LoadedCodeObjectImpl::getAgent() const { assert(loaded_segments.size() == 1 && "Only supports code objects v2+"); return loaded_segments.front()->Agent(); } hsa_executable_t LoadedCodeObjectImpl::getExecutable() const { assert(loaded_segments.size() == 1 && "Only supports code objects v2+"); return Executable::Handle(loaded_segments.front()->Owner()); } uint64_t LoadedCodeObjectImpl::getElfData() const { return reinterpret_cast(elf_data); } uint64_t LoadedCodeObjectImpl::getElfSize() const { return (uint64_t)elf_size; } uint64_t LoadedCodeObjectImpl::getStorageOffset() const { assert(loaded_segments.size() == 1 && "Only supports code objects v2+"); return (uint64_t)loaded_segments.front()->StorageOffset(); } uint64_t LoadedCodeObjectImpl::getLoadBase() const { // TODO Add support for code objects with 0 segments. assert(loaded_segments.size() == 1 && "Only supports code objects v2+"); return reinterpret_cast(loaded_segments.front()->Address(0)); } uint64_t LoadedCodeObjectImpl::getLoadSize() const { // TODO Add support for code objects with 0 or >1 segments. assert(loaded_segments.size() == 1 && "Only supports code objects v2+"); return (uint64_t)loaded_segments.front()->Size(); } int64_t LoadedCodeObjectImpl::getDelta() const { // TODO Add support for code objects with 0 segments. assert(loaded_segments.size() == 1 && "Only supports code objects v2+"); return getLoadBase() - loaded_segments.front()->VAddr(); } std::string LoadedCodeObjectImpl::getUri() const { return std::string(r_debug_info.l_name); } hsa_executable_t AmdHsaCodeLoader::FindExecutable(uint64_t device_address) { hsa_executable_t execHandle = {0}; ReaderLockGuard reader_lock(rw_lock_); if (device_address == 0) { return execHandle; } for (auto &exec : executables) { if (exec != nullptr) { uint64_t host_address = exec->FindHostAddress(device_address); if (host_address != 0) { return Executable::Handle(exec); } } } return execHandle; } uint64_t ExecutableImpl::FindHostAddress(uint64_t device_address) { for (auto &obj : loaded_code_objects) { assert(obj); for (auto &seg : obj->LoadedSegments()) { assert(seg); uint64_t paddr = (uint64_t)(uintptr_t)seg->Address(seg->VAddr()); if (paddr <= device_address && device_address < paddr + seg->Size()) { void *haddr = context_->SegmentHostAddress( seg->ElfSegment(), seg->Agent(), seg->Ptr(), device_address - paddr); return nullptr == haddr ? 0 : (uint64_t)(uintptr_t)haddr; } } } return 0; } void ExecutableImpl::EnableReadOnlyMode() { rw_lock_.ReaderLock(); } void ExecutableImpl::DisableReadOnlyMode() { rw_lock_.ReaderUnlock(); } #define HSAERRCHECK(hsc) \ if (hsc != HSA_STATUS_SUCCESS) { \ assert(false); \ return hsc; \ } \ hsa_status_t ExecutableImpl::GetInfo( hsa_executable_info_t executable_info, void *value) { ReaderLockGuard reader_lock(rw_lock_); assert(value); switch (executable_info) { case HSA_EXECUTABLE_INFO_PROFILE: { *((hsa_profile_t*)value) = profile_;; break; } case HSA_EXECUTABLE_INFO_STATE: { *((hsa_executable_state_t*)value) = state_; break; } case HSA_EXECUTABLE_INFO_DEFAULT_FLOAT_ROUNDING_MODE: { *((hsa_default_float_rounding_mode_t*)value) = default_float_rounding_mode_; break; } default: { return HSA_STATUS_ERROR_INVALID_ARGUMENT; } } return HSA_STATUS_SUCCESS; } static uint32_t NextCodeObjectNum() { static std::atomic_uint_fast32_t dumpN(1); return dumpN++; } hsa_status_t ExecutableImpl::LoadCodeObject( hsa_agent_t agent, hsa_code_object_t code_object, const char *options, const std::string &uri, hsa_loaded_code_object_t *loaded_code_object) { return LoadCodeObject(agent, code_object, 0, options, uri, loaded_code_object); } hsa_status_t ExecutableImpl::LoadCodeObject( hsa_agent_t agent, hsa_code_object_t code_object, size_t code_object_size, const char *options, const std::string &uri, hsa_loaded_code_object_t *loaded_code_object) { WriterLockGuard writer_lock(rw_lock_); if (HSA_EXECUTABLE_STATE_FROZEN == state_) { logger_ << "LoaderError: executable is already frozen\n"; return HSA_STATUS_ERROR_FROZEN_EXECUTABLE; } LoaderOptions loaderOptions; if (options && !loaderOptions.ParseOptions(options)) { return HSA_STATUS_ERROR; } const char *options_append = getenv("LOADER_OPTIONS_APPEND"); if (options_append && !loaderOptions.ParseOptions(options_append)) { return HSA_STATUS_ERROR; } typedef std::tuple Substitute; std::vector substitutes; for (const std::string& s : loaderOptions.Substitute()->values()) { std::string::size_type vi = s.find('='); if (vi == std::string::npos) { return HSA_STATUS_ERROR; } std::string value = s.substr(vi + 1); std::string range = s.substr(0, vi); std::string::size_type mi = range.find('-'); uint32_t n1 = UINT32_MAX, n2 = UINT32_MAX; if (mi != std::string::npos) { std::string s1, s2; s1 = range.substr(0, mi - 1); s2 = range.substr(mi + 1); std::istringstream is1(s1); is1 >> n1; std::istringstream is2(s2); is2 >> n2; } else { std::istringstream is(range); is >> n1; n2 = n1; } substitutes.push_back(std::make_tuple(n1, n2, value)); } uint32_t codeNum = NextCodeObjectNum(); code.reset(new code::AmdHsaCode()); std::string substituteFileName; for (const Substitute& ss : substitutes) { if (codeNum >= std::get<0>(ss) && codeNum <= std::get<1>(ss)) { substituteFileName = std::get<2>(ss); break; } } std::vector buffer; if (substituteFileName.empty()) { if (!code->InitAsHandle(code_object)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } } else { if (!ReadFileIntoBuffer(substituteFileName, buffer)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (!code->InitAsBuffer(&buffer[0], buffer.size())) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } } if (loaderOptions.DumpAll()->is_set() || loaderOptions.DumpCode()->is_set()) { if (!code->SaveToFile(amd::hsa::DumpFileName(loaderOptions.DumpDir()->value(), LOADER_DUMP_PREFIX, "hsaco", codeNum))) { // Ignore error. } } if (loaderOptions.DumpAll()->is_set() || loaderOptions.DumpIsa()->is_set()) { if (!code->PrintToFile(amd::hsa::DumpFileName(loaderOptions.DumpDir()->value(), LOADER_DUMP_PREFIX, "isa", codeNum))) { // Ignore error. } } std::string codeIsa; if (!code->GetIsa(codeIsa)) { logger_ << "LoaderError: failed to determine code object's ISA\n"; return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } uint32_t majorVersion, minorVersion; if (!code->GetCodeObjectVersion(&majorVersion, &minorVersion)) { logger_ << "LoaderError: failed to determine code object's version\n"; return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (majorVersion < 1 || majorVersion > 4) { logger_ << "LoaderError: unsupported code object version: " << majorVersion << "\n"; return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (agent.handle == 0 && majorVersion == 1) { logger_ << "LoaderError: code object v1 requires non-null agent\n"; return HSA_STATUS_ERROR_INVALID_AGENT; } uint32_t codeHsailMajor; uint32_t codeHsailMinor; hsa_profile_t codeProfile; hsa_machine_model_t codeMachineModel; hsa_default_float_rounding_mode_t codeRoundingMode; if (!code->GetNoteHsail(&codeHsailMajor, &codeHsailMinor, &codeProfile, &codeMachineModel, &codeRoundingMode)) { codeProfile = profile_; } if (profile_ != codeProfile) { logger_ << "LoaderError: mismatched profiles\n"; return HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS; } hsa_isa_t objectsIsa = context_->IsaFromName(codeIsa.c_str()); if (!objectsIsa.handle) { logger_ << "LoaderError: code object's ISA (" << codeIsa.c_str() << ") is invalid\n"; return HSA_STATUS_ERROR_INVALID_ISA_NAME; } if (agent.handle != 0 && !context_->IsaSupportedByAgent(agent, objectsIsa)) { logger_ << "LoaderError: code object's ISA (" << codeIsa.c_str() << ") is not supported by the agent\n"; return HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS; } hsa_status_t status; objects.push_back(new LoadedCodeObjectImpl(this, agent, code->ElfData(), code->ElfSize())); loaded_code_objects.push_back((LoadedCodeObjectImpl*)objects.back()); status = LoadSegments(agent, code.get(), majorVersion); if (status != HSA_STATUS_SUCCESS) return status; for (size_t i = 0; i < code->SymbolCount(); ++i) { if (majorVersion >= 2 && code->GetSymbol(i)->elfSym()->type() != STT_AMDGPU_HSA_KERNEL && code->GetSymbol(i)->elfSym()->binding() == STB_LOCAL) continue; status = LoadSymbol(agent, code->GetSymbol(i), majorVersion); if (status != HSA_STATUS_SUCCESS) { return status; } } status = ApplyRelocations(agent, code.get()); if (status != HSA_STATUS_SUCCESS) { return status; } code.reset(); if (loaderOptions.DumpAll()->is_set() || loaderOptions.DumpExec()->is_set()) { if (!PrintToFile(amd::hsa::DumpFileName(loaderOptions.DumpDir()->value(), LOADER_DUMP_PREFIX, "exec", codeNum))) { // Ignore error. } } loaded_code_objects.back()->r_debug_info.l_addr = loaded_code_objects.back()->getDelta(); loaded_code_objects.back()->r_debug_info.l_name = strdup(uri.c_str()); loaded_code_objects.back()->r_debug_info.l_prev = nullptr; loaded_code_objects.back()->r_debug_info.l_next = nullptr; if (nullptr != loaded_code_object) { *loaded_code_object = LoadedCodeObject::Handle(loaded_code_objects.back()); } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::LoadSegments(hsa_agent_t agent, const code::AmdHsaCode *c, uint32_t majorVersion) { if (majorVersion < 2) return LoadSegmentsV1(agent, c); else return LoadSegmentsV2(agent, c); } hsa_status_t ExecutableImpl::LoadSegmentsV1(hsa_agent_t agent, const code::AmdHsaCode *c) { hsa_status_t status = HSA_STATUS_SUCCESS; for (size_t i = 0; i < c->DataSegmentCount(); ++i) { status = LoadSegmentV1(agent, c->DataSegment(i)); if (status != HSA_STATUS_SUCCESS) return status; } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::LoadSegmentsV2(hsa_agent_t agent, const code::AmdHsaCode *c) { assert(c->Machine() == ELF::EM_AMDGPU && "Program code objects are not supported"); if (!c->DataSegmentCount()) return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; uint64_t vaddr = c->DataSegment(0)->vaddr(); uint64_t size = c->DataSegment(c->DataSegmentCount() - 1)->vaddr() + c->DataSegment(c->DataSegmentCount() - 1)->memSize(); void *ptr = context_->SegmentAlloc(AMDGPU_HSA_SEGMENT_CODE_AGENT, agent, size, AMD_ISA_ALIGN_BYTES, true); if (!ptr) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; Segment *load_segment = new Segment(this, agent, AMDGPU_HSA_SEGMENT_CODE_AGENT, ptr, size, vaddr, c->DataSegment(0)->offset()); if (!load_segment) return HSA_STATUS_ERROR_OUT_OF_RESOURCES; hsa_status_t status = HSA_STATUS_SUCCESS; for (size_t i = 0; i < c->DataSegmentCount(); ++i) { status = LoadSegmentV2(c->DataSegment(i), load_segment); if (status != HSA_STATUS_SUCCESS) return status; } objects.push_back(load_segment); loaded_code_objects.back()->LoadedSegments().push_back(load_segment); return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::LoadSegmentV1(hsa_agent_t agent, const code::Segment *s) { assert(s->type() < PT_LOOS + AMDGPU_HSA_SEGMENT_LAST); if (s->memSize() == 0) return HSA_STATUS_SUCCESS; amdgpu_hsa_elf_segment_t segment = (amdgpu_hsa_elf_segment_t)(s->type() - PT_LOOS); Segment *new_seg = nullptr; bool need_alloc = true; if (segment == AMDGPU_HSA_SEGMENT_GLOBAL_PROGRAM && nullptr != program_allocation_segment) { new_seg = program_allocation_segment; need_alloc = false; } if (need_alloc) { void* ptr = context_->SegmentAlloc(segment, agent, s->memSize(), s->align(), true); if (!ptr) { return HSA_STATUS_ERROR_OUT_OF_RESOURCES; } new_seg = new Segment(this, agent, segment, ptr, s->memSize(), s->vaddr(), s->offset()); new_seg->Copy(s->vaddr(), s->data(), s->imageSize()); objects.push_back(new_seg); if (segment == AMDGPU_HSA_SEGMENT_GLOBAL_PROGRAM) { program_allocation_segment = new_seg; } } assert(new_seg); loaded_code_objects.back()->LoadedSegments().push_back(new_seg); return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::LoadSegmentV2(const code::Segment *data_segment, loader::Segment *load_segment) { assert(data_segment && load_segment); load_segment->Copy(data_segment->vaddr(), data_segment->data(), data_segment->imageSize()); return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::LoadSymbol(hsa_agent_t agent, code::Symbol* sym, uint32_t majorVersion) { if (sym->IsDeclaration()) { return LoadDeclarationSymbol(agent, sym, majorVersion); } else { return LoadDefinitionSymbol(agent, sym, majorVersion); } } namespace { bool string_ends_with(const std::string &str, const std::string &suf) { return str.size() >= suf.size() ? str.compare(str.size() - suf.size(), suf.size(), suf) == 0 : false; } } hsa_status_t ExecutableImpl::LoadDefinitionSymbol(hsa_agent_t agent, code::Symbol* sym, uint32_t majorVersion) { bool isAgent = sym->IsAgent(); if (majorVersion >= 2) { isAgent = agent.handle != 0; } if (isAgent) { auto agent_symbol = agent_symbols_.find(std::make_pair(sym->Name(), agent)); if (agent_symbol != agent_symbols_.end()) { // TODO(spec): this is not spec compliant. return HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED; } } else { auto program_symbol = program_symbols_.find(sym->Name()); if (program_symbol != program_symbols_.end()) { // TODO(spec): this is not spec compliant. return HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED; } } uint64_t address = SymbolAddress(agent, sym); SymbolImpl *symbol = nullptr; if (string_ends_with(sym->GetSymbolName(), ".kd")) { // V3. llvm::amdhsa::kernel_descriptor_t kd; sym->GetSection()->getData(sym->SectionOffset(), &kd, sizeof(kd)); uint32_t kernarg_segment_size = kd.kernarg_size; // FIXME: If 0 then the compiler is not specifying the size. uint32_t kernarg_segment_alignment = 16; // FIXME: Use the minumum HSA required alignment. uint32_t group_segment_size = kd.group_segment_fixed_size; uint32_t private_segment_size = kd.private_segment_fixed_size; bool is_dynamic_callstack = false; uint64_t size = sym->Size(); KernelSymbol *kernel_symbol = new KernelSymbol(true, sym->GetModuleName(), sym->GetSymbolName(), sym->Linkage(), true, // sym->IsDefinition() kernarg_segment_size, kernarg_segment_alignment, group_segment_size, private_segment_size, is_dynamic_callstack, size, 64, address); symbol = kernel_symbol; } else if (sym->IsVariableSymbol()) { symbol = new VariableSymbol(true, sym->GetModuleName(), sym->GetSymbolName(), sym->Linkage(), true, // sym->IsDefinition() sym->Allocation(), sym->Segment(), sym->Size(), sym->Alignment(), sym->IsConst(), false, address); } else if (sym->IsKernelSymbol()) { amd_kernel_code_t akc; sym->GetSection()->getData(sym->SectionOffset(), &akc, sizeof(akc)); uint32_t kernarg_segment_size = uint32_t(akc.kernarg_segment_byte_size); uint32_t kernarg_segment_alignment = uint32_t(1 << akc.kernarg_segment_alignment); uint32_t group_segment_size = uint32_t(akc.workgroup_group_segment_byte_size); uint32_t private_segment_size = uint32_t(akc.workitem_private_segment_byte_size); bool is_dynamic_callstack = AMD_HSA_BITS_GET(akc.kernel_code_properties, AMD_KERNEL_CODE_PROPERTIES_IS_DYNAMIC_CALLSTACK) ? true : false; uint64_t size = sym->Size(); if (!size && sym->SectionOffset() < sym->GetSection()->size()) { // ORCA Runtime relies on symbol size equal to size of kernel ISA. If symbol size is 0 in ELF, // calculate end of segment - symbol value. size = sym->GetSection()->size() - sym->SectionOffset(); } KernelSymbol *kernel_symbol = new KernelSymbol(true, sym->GetModuleName(), sym->GetSymbolName(), sym->Linkage(), true, // sym->IsDefinition() kernarg_segment_size, kernarg_segment_alignment, group_segment_size, private_segment_size, is_dynamic_callstack, size, 256, address); kernel_symbol->debug_info.elf_raw = code->ElfData(); kernel_symbol->debug_info.elf_size = code->ElfSize(); kernel_symbol->debug_info.kernel_name = kernel_symbol->full_name.c_str(); kernel_symbol->debug_info.owning_segment = (void*)SymbolSegment(agent, sym)->Address(sym->GetSection()->addr()); symbol = kernel_symbol; // \todo kzhuravl 10/15/15 This is a debugger backdoor: needs to be // removed. uint64_t target_address = sym->GetSection()->addr() + sym->SectionOffset() + ((size_t)(&((amd_kernel_code_t*)0)->runtime_loader_kernel_symbol)); uint64_t source_value = (uint64_t) (uintptr_t) &kernel_symbol->debug_info; SymbolSegment(agent, sym)->Copy(target_address, &source_value, sizeof(source_value)); } else { assert(!"Unexpected symbol type in LoadDefinitionSymbol"); return HSA_STATUS_ERROR; } assert(symbol); if (isAgent) { symbol->agent = agent; agent_symbols_.insert(std::make_pair(std::make_pair(sym->Name(), agent), symbol)); } else { program_symbols_.insert(std::make_pair(sym->Name(), symbol)); } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::LoadDeclarationSymbol(hsa_agent_t agent, code::Symbol* sym, uint32_t majorVersion) { auto program_symbol = program_symbols_.find(sym->Name()); if (program_symbol == program_symbols_.end()) { auto agent_symbol = agent_symbols_.find(std::make_pair(sym->Name(), agent)); if (agent_symbol == agent_symbols_.end()) { logger_ << "LoaderError: symbol \"" << sym->Name() << "\" is undefined\n"; // TODO(spec): this is not spec compliant. return HSA_STATUS_ERROR_VARIABLE_UNDEFINED; } } return HSA_STATUS_SUCCESS; } Segment* ExecutableImpl::VirtualAddressSegment(uint64_t vaddr) { for (auto &seg : loaded_code_objects.back()->LoadedSegments()) { if (seg->IsAddressInSegment(vaddr)) { return seg; } } return 0; } uint64_t ExecutableImpl::SymbolAddress(hsa_agent_t agent, code::Symbol* sym) { code::Section* sec = sym->GetSection(); Segment* seg = SectionSegment(agent, sec); return nullptr == seg ? 0 : (uint64_t) (uintptr_t) seg->Address(sym->VAddr()); } uint64_t ExecutableImpl::SymbolAddress(hsa_agent_t agent, elf::Symbol* sym) { elf::Section* sec = sym->section(); Segment* seg = SectionSegment(agent, sec); uint64_t vaddr = sec->addr() + sym->value(); return nullptr == seg ? 0 : (uint64_t) (uintptr_t) seg->Address(vaddr); } Segment* ExecutableImpl::SymbolSegment(hsa_agent_t agent, code::Symbol* sym) { return SectionSegment(agent, sym->GetSection()); } Segment* ExecutableImpl::SectionSegment(hsa_agent_t agent, code::Section* sec) { for (Segment* seg : loaded_code_objects.back()->LoadedSegments()) { if (seg->IsAddressInSegment(sec->addr())) { return seg; } } return 0; } hsa_status_t ExecutableImpl::ApplyRelocations(hsa_agent_t agent, amd::hsa::code::AmdHsaCode *c) { hsa_status_t status = HSA_STATUS_SUCCESS; for (size_t i = 0; i < c->RelocationSectionCount(); ++i) { if (c->GetRelocationSection(i)->targetSection()) { status = ApplyStaticRelocationSection(agent, c->GetRelocationSection(i)); } else { // Dynamic relocations are supported starting code object v2.1. uint32_t majorVersion, minorVersion; if (!c->GetCodeObjectVersion(&majorVersion, &minorVersion)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (majorVersion < 2) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (majorVersion == 2 && minorVersion < 1) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } status = ApplyDynamicRelocationSection(agent, c->GetRelocationSection(i)); } if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::ApplyStaticRelocationSection(hsa_agent_t agent, amd::hsa::code::RelocationSection* sec) { // Skip link-time relocations (if any). if (!(sec->targetSection()->flags() & SHF_ALLOC)) { return HSA_STATUS_SUCCESS; } hsa_status_t status = HSA_STATUS_SUCCESS; for (size_t i = 0; i < sec->relocationCount(); ++i) { status = ApplyStaticRelocation(agent, sec->relocation(i)); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::ApplyStaticRelocation(hsa_agent_t agent, amd::hsa::code::Relocation *rel) { hsa_status_t status = HSA_STATUS_SUCCESS; amd::elf::Symbol* sym = rel->symbol(); code::RelocationSection* rsec = rel->section(); code::Section* sec = rsec->targetSection(); Segment* rseg = SectionSegment(agent, sec); size_t reladdr = sec->addr() + rel->offset(); switch (rel->type()) { case R_AMDGPU_32_LOW: case R_AMDGPU_32_HIGH: case R_AMDGPU_64: { uint64_t addr; switch (sym->type()) { case STT_OBJECT: case STT_SECTION: case STT_AMDGPU_HSA_KERNEL: case STT_AMDGPU_HSA_INDIRECT_FUNCTION: addr = SymbolAddress(agent, sym); if (!addr) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } break; case STT_COMMON: { hsa_agent_t *sagent = &agent; if (STA_AMDGPU_HSA_GLOBAL_PROGRAM == ELF64_ST_AMDGPU_ALLOCATION(sym->other())) { sagent = nullptr; } SymbolImpl* esym = (SymbolImpl*) GetSymbolInternal(sym->name().c_str(), sagent); if (!esym) { logger_ << "LoaderError: symbol \"" << sym->name() << "\" is undefined\n"; return HSA_STATUS_ERROR_VARIABLE_UNDEFINED; } addr = esym->address; break; } default: return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } addr += rel->addend(); uint32_t addr32 = 0; switch (rel->type()) { case R_AMDGPU_32_HIGH: addr32 = uint32_t((addr >> 32) & 0xFFFFFFFF); rseg->Copy(reladdr, &addr32, sizeof(addr32)); break; case R_AMDGPU_32_LOW: addr32 = uint32_t(addr & 0xFFFFFFFF); rseg->Copy(reladdr, &addr32, sizeof(addr32)); break; case R_AMDGPU_64: rseg->Copy(reladdr, &addr, sizeof(addr)); break; default: return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } break; } case R_AMDGPU_INIT_SAMPLER: { if (STT_AMDGPU_HSA_METADATA != sym->type() || SHT_PROGBITS != sym->section()->type() || !(sym->section()->flags() & SHF_MERGE)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } amdgpu_hsa_sampler_descriptor_t desc; if (!sym->section()->getData(sym->value(), &desc, sizeof(desc))) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (AMDGPU_HSA_METADATA_KIND_INIT_SAMP != desc.kind) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } hsa_ext_sampler_descriptor_t hsa_sampler_descriptor; hsa_sampler_descriptor.coordinate_mode = hsa_ext_sampler_coordinate_mode_t(desc.coord); hsa_sampler_descriptor.filter_mode = hsa_ext_sampler_filter_mode_t(desc.filter); hsa_sampler_descriptor.address_mode = hsa_ext_sampler_addressing_mode_t(desc.addressing); hsa_ext_sampler_t hsa_sampler = {0}; status = context_->SamplerCreate(agent, &hsa_sampler_descriptor, &hsa_sampler); if (status != HSA_STATUS_SUCCESS) { return status; } assert(hsa_sampler.handle); rseg->Copy(reladdr, &hsa_sampler, sizeof(hsa_sampler)); break; } case R_AMDGPU_INIT_IMAGE: { if (STT_AMDGPU_HSA_METADATA != sym->type() || SHT_PROGBITS != sym->section()->type() || !(sym->section()->flags() & SHF_MERGE)) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } amdgpu_hsa_image_descriptor_t desc; if (!sym->section()->getData(sym->value(), &desc, sizeof(desc))) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } if (AMDGPU_HSA_METADATA_KIND_INIT_ROIMG != desc.kind && AMDGPU_HSA_METADATA_KIND_INIT_WOIMG != desc.kind && AMDGPU_HSA_METADATA_KIND_INIT_RWIMG != desc.kind) { return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } hsa_ext_image_format_t hsa_image_format; hsa_image_format.channel_order = hsa_ext_image_channel_order_t(desc.channel_order); hsa_image_format.channel_type = hsa_ext_image_channel_type_t(desc.channel_type); hsa_ext_image_descriptor_t hsa_image_descriptor; hsa_image_descriptor.geometry = hsa_ext_image_geometry_t(desc.geometry); hsa_image_descriptor.width = size_t(desc.width); hsa_image_descriptor.height = size_t(desc.height); hsa_image_descriptor.depth = size_t(desc.depth); hsa_image_descriptor.array_size = size_t(desc.array); hsa_image_descriptor.format = hsa_image_format; hsa_access_permission_t hsa_image_permission = HSA_ACCESS_PERMISSION_RO; switch (desc.kind) { case AMDGPU_HSA_METADATA_KIND_INIT_ROIMG: { hsa_image_permission = HSA_ACCESS_PERMISSION_RO; break; } case AMDGPU_HSA_METADATA_KIND_INIT_WOIMG: { hsa_image_permission = HSA_ACCESS_PERMISSION_WO; break; } case AMDGPU_HSA_METADATA_KIND_INIT_RWIMG: { hsa_image_permission = HSA_ACCESS_PERMISSION_RW; break; } default: { assert(false); return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } } hsa_ext_image_t hsa_image = {0}; status = context_->ImageCreate(agent, hsa_image_permission, &hsa_image_descriptor, NULL, // TODO: image_data? &hsa_image); if (status != HSA_STATUS_SUCCESS) { return status; } rseg->Copy(reladdr, &hsa_image, sizeof(hsa_image)); break; } default: // Ignore. break; } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::ApplyDynamicRelocationSection(hsa_agent_t agent, amd::hsa::code::RelocationSection* sec) { hsa_status_t status = HSA_STATUS_SUCCESS; for (size_t i = 0; i < sec->relocationCount(); ++i) { status = ApplyDynamicRelocation(agent, sec->relocation(i)); if (status != HSA_STATUS_SUCCESS) { return status; } } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::ApplyDynamicRelocation(hsa_agent_t agent, amd::hsa::code::Relocation *rel) { Segment* relSeg = VirtualAddressSegment(rel->offset()); uint64_t symAddr = 0; switch (rel->symbol()->type()) { case STT_OBJECT: case STT_AMDGPU_HSA_KERNEL: case STT_FUNC: { Segment* symSeg = VirtualAddressSegment(rel->symbol()->value()); symAddr = reinterpret_cast(symSeg->Address(rel->symbol()->value())); break; } // External symbols, they must be defined prior loading. case STT_NOTYPE: { // TODO: Only agent allocation variables are supported in v2.1. How will // we distinguish between program allocation and agent allocation // variables? auto agent_symbol = agent_symbols_.find(std::make_pair(rel->symbol()->name(), agent)); if (agent_symbol != agent_symbols_.end()) symAddr = agent_symbol->second->address; break; } default: // Only objects and kernels are supported in v2.1. return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } symAddr += rel->addend(); switch (rel->type()) { case R_AMDGPU_32_HIGH: { if (!symAddr) { logger_ << "LoaderError: symbol \"" << rel->symbol()->name() << "\" is undefined\n"; return HSA_STATUS_ERROR_VARIABLE_UNDEFINED; } uint32_t symAddr32 = uint32_t((symAddr >> 32) & 0xFFFFFFFF); relSeg->Copy(rel->offset(), &symAddr32, sizeof(symAddr32)); break; } case R_AMDGPU_32_LOW: { if (!symAddr) { logger_ << "LoaderError: symbol \"" << rel->symbol()->name() << "\" is undefined\n"; return HSA_STATUS_ERROR_VARIABLE_UNDEFINED; } uint32_t symAddr32 = uint32_t(symAddr & 0xFFFFFFFF); relSeg->Copy(rel->offset(), &symAddr32, sizeof(symAddr32)); break; } case R_AMDGPU_64: { if (!symAddr) { logger_ << "LoaderError: symbol \"" << rel->symbol()->name() << "\" is undefined\n"; return HSA_STATUS_ERROR_VARIABLE_UNDEFINED; } relSeg->Copy(rel->offset(), &symAddr, sizeof(symAddr)); break; } case R_AMDGPU_RELATIVE64: { int64_t baseDelta = reinterpret_cast(relSeg->Address(0)) - relSeg->VAddr(); uint64_t relocatedAddr = baseDelta + rel->addend(); relSeg->Copy(rel->offset(), &relocatedAddr, sizeof(relocatedAddr)); break; } default: return HSA_STATUS_ERROR_INVALID_CODE_OBJECT; } return HSA_STATUS_SUCCESS; } hsa_status_t ExecutableImpl::Freeze(const char *options) { amd::hsa::common::WriterLockGuard writer_lock(rw_lock_); if (HSA_EXECUTABLE_STATE_FROZEN == state_) { return HSA_STATUS_ERROR_FROZEN_EXECUTABLE; } for (auto &lco : loaded_code_objects) { for (auto &ls : lco->LoadedSegments()) { ls->Freeze(); } } state_ = HSA_EXECUTABLE_STATE_FROZEN; return HSA_STATUS_SUCCESS; } void ExecutableImpl::Print(std::ostream& out) { out << "AMD Executable" << std::endl; out << " Id: " << id() << " Profile: " << HsaProfileToString(profile()) << std::endl << std::endl; out << "Loaded Objects (total " << objects.size() << ")" << std::endl; size_t i = 0; for (ExecutableObject* o : objects) { out << "Loaded Object " << i++ << ": "; o->Print(out); out << std::endl; } out << "End AMD Executable" << std::endl; } bool ExecutableImpl::PrintToFile(const std::string& filename) { std::ofstream out(filename); if (out.fail()) { return false; } Print(out); return out.fail(); } } // namespace loader } // namespace hsa } // namespace amd } // namespace rocr ROCR-Runtime-rocm-5.0.0/src/loader/executable.hpp000066400000000000000000000470221420110115200215170ustar00rootroot00000000000000//////////////////////////////////////////////////////////////////////////////// // // The University of Illinois/NCSA // Open Source License (NCSA) // // Copyright (c) 2014-2020, Advanced Micro Devices, Inc. All rights reserved. // // Developed by: // // AMD Research and AMD HSA Software Development // // Advanced Micro Devices, Inc. // // www.amd.com // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to // deal with the Software without restriction, including without limitation // the rights to use, copy, modify, merge, publish, distribute, sublicense, // and/or sell copies of the Software, and to permit persons to whom the // Software is furnished to do so, subject to the following conditions: // // - Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // - Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimers in // the documentation and/or other materials provided with the distribution. // - Neither the names of Advanced Micro Devices, Inc, // nor the names of its contributors may be used to endorse or promote // products derived from this Software without specific prior written // permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL // THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR // OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, // ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER // DEALINGS WITH THE SOFTWARE. // //////////////////////////////////////////////////////////////////////////////// #ifndef HSA_RUNTIME_CORE_LOADER_EXECUTABLE_HPP_ #define HSA_RUNTIME_CORE_LOADER_EXECUTABLE_HPP_ #include #include #include #include #include #include #include #include #include #include #include #include #include "inc/hsa.h" #include "inc/hsa_ext_image.h" #include "core/inc/amd_hsa_loader.hpp" #include "core/inc/amd_hsa_code.hpp" #include "inc/amd_hsa_kernel_code.h" #include "amd_hsa_locks.hpp" namespace rocr { namespace amd { namespace hsa { namespace loader { class MemoryAddress; class SymbolImpl; class KernelSymbol; class VariableSymbol; class ExecutableImpl; //===----------------------------------------------------------------------===// // SymbolImpl. // //===----------------------------------------------------------------------===// typedef uint32_t symbol_attribute32_t; class SymbolImpl: public Symbol { public: virtual ~SymbolImpl() {} bool IsKernel() const { return HSA_SYMBOL_KIND_KERNEL == kind; } bool IsVariable() const { return HSA_SYMBOL_KIND_VARIABLE == kind; } bool is_loaded; hsa_symbol_kind_t kind; std::string module_name; std::string symbol_name; hsa_symbol_linkage_t linkage; bool is_definition; uint64_t address; hsa_agent_t agent; hsa_agent_t GetAgent() override { return agent; } protected: SymbolImpl(const bool &_is_loaded, const hsa_symbol_kind_t &_kind, const std::string &_module_name, const std::string &_symbol_name, const hsa_symbol_linkage_t &_linkage, const bool &_is_definition, const uint64_t &_address = 0) : is_loaded(_is_loaded) , kind(_kind) , module_name(_module_name) , symbol_name(_symbol_name) , linkage(_linkage) , is_definition(_is_definition) , address(_address) {} virtual bool GetInfo(hsa_symbol_info32_t symbol_info, void* value) override; private: SymbolImpl(const SymbolImpl &s); SymbolImpl& operator=(const SymbolImpl &s); }; //===----------------------------------------------------------------------===// // KernelSymbol. // //===----------------------------------------------------------------------===// class KernelSymbol final: public SymbolImpl { public: KernelSymbol(const bool &_is_loaded, const std::string &_module_name, const std::string &_symbol_name, const hsa_symbol_linkage_t &_linkage, const bool &_is_definition, const uint32_t &_kernarg_segment_size, const uint32_t &_kernarg_segment_alignment, const uint32_t &_group_segment_size, const uint32_t &_private_segment_size, const bool &_is_dynamic_callstack, const uint32_t &_size, const uint32_t &_alignment, const uint64_t &_address = 0) : SymbolImpl(_is_loaded, HSA_SYMBOL_KIND_KERNEL, _module_name, _symbol_name, _linkage, _is_definition, _address) , full_name(_module_name.empty() ? _symbol_name : _module_name + "::" + _symbol_name) , kernarg_segment_size(_kernarg_segment_size) , kernarg_segment_alignment(_kernarg_segment_alignment) , group_segment_size(_group_segment_size) , private_segment_size(_private_segment_size) , is_dynamic_callstack(_is_dynamic_callstack) , size(_size) , alignment(_alignment) {} ~KernelSymbol() {} bool GetInfo(hsa_symbol_info32_t symbol_info, void *value); std::string full_name; uint32_t kernarg_segment_size; uint32_t kernarg_segment_alignment; uint32_t group_segment_size; uint32_t private_segment_size; bool is_dynamic_callstack; uint32_t size; uint32_t alignment; amd_runtime_loader_debug_info_t debug_info; private: KernelSymbol(const KernelSymbol &ks); KernelSymbol& operator=(const KernelSymbol &ks); }; //===----------------------------------------------------------------------===// // VariableSymbol. // //===----------------------------------------------------------------------===// class VariableSymbol final: public SymbolImpl { public: VariableSymbol(const bool &_is_loaded, const std::string &_module_name, const std::string &_symbol_name, const hsa_symbol_linkage_t &_linkage, const bool &_is_definition, const hsa_variable_allocation_t &_allocation, const hsa_variable_segment_t &_segment, const uint32_t &_size, const uint32_t &_alignment, const bool &_is_constant, const bool &_is_external = false, const uint64_t &_address = 0) : SymbolImpl(_is_loaded, HSA_SYMBOL_KIND_VARIABLE, _module_name, _symbol_name, _linkage, _is_definition, _address) , allocation(_allocation) , segment(_segment) , size(_size) , alignment(_alignment) , is_constant(_is_constant) , is_external(_is_external) {} ~VariableSymbol() {} bool GetInfo(hsa_symbol_info32_t symbol_info, void *value); hsa_variable_allocation_t allocation; hsa_variable_segment_t segment; uint32_t size; uint32_t alignment; bool is_constant; bool is_external; private: VariableSymbol(const VariableSymbol &vs); VariableSymbol& operator=(const VariableSymbol &vs); }; //===----------------------------------------------------------------------===// // Logger. // //===----------------------------------------------------------------------===// class Logger final { public: Logger(std::ostream &Stream = std::cerr) : OutStream(Stream) {} template Logger &operator<<(const T &Data) { if (!IsLoggingEnabled()) return *this; OutStream << Data; return *this; } private: Logger(const Logger &L); Logger& operator=(const Logger &L); bool IsLoggingEnabled() const { const char *enable_logging = getenv("LOADER_ENABLE_LOGGING"); if (!enable_logging) return false; if (std::string(enable_logging) == "0") return false; return true; } std::ostream &OutStream; }; //===----------------------------------------------------------------------===// // Executable. // //===----------------------------------------------------------------------===// class ExecutableImpl; class LoadedCodeObjectImpl; class Segment; class ExecutableObject { protected: ExecutableImpl *owner; hsa_agent_t agent; public: ExecutableObject(ExecutableImpl *owner_, hsa_agent_t agent_) : owner(owner_), agent(agent_) { } ExecutableImpl* Owner() const { return owner; } hsa_agent_t Agent() const { return agent; } virtual void Print(std::ostream& out) = 0; virtual void Destroy() = 0; virtual ~ExecutableObject() { } }; class LoadedCodeObjectImpl : public LoadedCodeObject, public ExecutableObject { friend class AmdHsaCodeLoader; private: LoadedCodeObjectImpl(const LoadedCodeObjectImpl&); LoadedCodeObjectImpl& operator=(const LoadedCodeObjectImpl&); const void *elf_data; const size_t elf_size; std::vector loaded_segments; public: LoadedCodeObjectImpl(ExecutableImpl *owner_, hsa_agent_t agent_, const void *elf_data_, size_t elf_size_) : ExecutableObject(owner_, agent_), elf_data(elf_data_), elf_size(elf_size_) { memset(&r_debug_info, 0, sizeof(r_debug_info)); } const void* ElfData() const { return elf_data; } size_t ElfSize() const { return elf_size; } std::vector& LoadedSegments() { return loaded_segments; } bool GetInfo(amd_loaded_code_object_info_t attribute, void *value) override; hsa_status_t IterateLoadedSegments( hsa_status_t (*callback)( amd_loaded_segment_t loaded_segment, void *data), void *data) override; void Print(std::ostream& out) override; void Destroy() override {} hsa_agent_t getAgent() const override; hsa_executable_t getExecutable() const override; uint64_t getElfData() const override; uint64_t getElfSize() const override; uint64_t getStorageOffset() const override; uint64_t getLoadBase() const override; uint64_t getLoadSize() const override; int64_t getDelta() const override; std::string getUri() const override; link_map r_debug_info; }; class Segment : public LoadedSegment, public ExecutableObject { private: amdgpu_hsa_elf_segment_t segment; void *ptr; size_t size; uint64_t vaddr; bool frozen; size_t storage_offset; public: Segment(ExecutableImpl *owner_, hsa_agent_t agent_, amdgpu_hsa_elf_segment_t segment_, void* ptr_, size_t size_, uint64_t vaddr_, size_t storage_offset_) : ExecutableObject(owner_, agent_), segment(segment_), ptr(ptr_), size(size_), vaddr(vaddr_), frozen(false), storage_offset(storage_offset_) { } amdgpu_hsa_elf_segment_t ElfSegment() const { return segment; } void* Ptr() const { return ptr; } size_t Size() const { return size; } uint64_t VAddr() const { return vaddr; } size_t StorageOffset() const { return storage_offset; } bool GetInfo(amd_loaded_segment_info_t attribute, void *value) override; uint64_t Offset(uint64_t addr); // Offset within segment. Used together with ptr with loader context functions. void* Address(uint64_t addr); // Address in segment. Used for relocations and valid on agent. bool Freeze(); bool IsAddressInSegment(uint64_t addr); void Copy(uint64_t addr, const void* src, size_t size); void Print(std::ostream& out) override; void Destroy() override; }; class Sampler : public ExecutableObject { private: hsa_ext_sampler_t samp; public: Sampler(ExecutableImpl *owner, hsa_agent_t agent, hsa_ext_sampler_t samp_) : ExecutableObject(owner, agent), samp(samp_) { } void Print(std::ostream& out) override; void Destroy() override; }; class Image : public ExecutableObject { private: hsa_ext_image_t img; public: Image(ExecutableImpl *owner, hsa_agent_t agent, hsa_ext_image_t img_) : ExecutableObject(owner, agent), img(img_) { } void Print(std::ostream& out) override; void Destroy() override; }; typedef std::string ProgramSymbol; typedef std::unordered_map ProgramSymbolMap; typedef std::pair AgentSymbol; struct ASC { bool operator()(const AgentSymbol &las, const AgentSymbol &ras) const { return las.first == ras.first && las.second.handle == ras.second.handle; } }; struct ASH { size_t operator()(const AgentSymbol &as) const { size_t h = std::hash()(as.first); size_t i = std::hash()(as.second.handle); return h ^ (i << 1); } }; typedef std::unordered_map AgentSymbolMap; class ExecutableImpl final: public Executable { friend class AmdHsaCodeLoader; public: const hsa_profile_t& profile() const { return profile_; } const hsa_executable_state_t& state() const { return state_; } ExecutableImpl( const hsa_profile_t &_profile, Context *context, size_t id, hsa_default_float_rounding_mode_t default_float_rounding_mode); ~ExecutableImpl(); hsa_status_t GetInfo(hsa_executable_info_t executable_info, void *value) override; hsa_status_t DefineProgramExternalVariable( const char *name, void *address) override; hsa_status_t DefineAgentExternalVariable( const char *name, hsa_agent_t agent, hsa_variable_segment_t segment, void *address) override; hsa_status_t LoadCodeObject( hsa_agent_t agent, hsa_code_object_t code_object, const char *options, const std::string &uri, hsa_loaded_code_object_t *loaded_code_object) override; hsa_status_t LoadCodeObject( hsa_agent_t agent, hsa_code_object_t code_object, size_t code_object_size, const char *options, const std::string &uri, hsa_loaded_code_object_t *loaded_code_object) override; hsa_status_t Freeze(const char *options) override; hsa_status_t Validate(uint32_t *result) override { amd::hsa::common::ReaderLockGuard reader_lock(rw_lock_); assert(result); *result = 0; return HSA_STATUS_SUCCESS; } /// @note needed for hsa v1.0. /// @todo remove during loader refactoring. bool IsProgramSymbol(const char *symbol_name) override; Symbol* GetSymbol( const char *symbol_name, const hsa_agent_t *agent) override; hsa_status_t IterateSymbols( iterate_symbols_f callback, void *data) override; /// @since hsa v1.1. hsa_status_t IterateAgentSymbols( hsa_agent_t agent, hsa_status_t (*callback)(hsa_executable_t exec, hsa_agent_t agent, hsa_executable_symbol_t symbol, void *data), void *data) override; /// @since hsa v1.1. hsa_status_t IterateProgramSymbols( hsa_status_t (*callback)(hsa_executable_t exec, hsa_executable_symbol_t symbol, void *data), void *data) override; hsa_status_t IterateLoadedCodeObjects( hsa_status_t (*callback)( hsa_executable_t executable, hsa_loaded_code_object_t loaded_code_object, void *data), void *data) override; size_t GetNumSegmentDescriptors() override; size_t QuerySegmentDescriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t total_num_segment_descriptors, size_t first_empty_segment_descriptor) override; uint64_t FindHostAddress(uint64_t device_address) override; void EnableReadOnlyMode(); void DisableReadOnlyMode(); void Print(std::ostream& out) override; bool PrintToFile(const std::string& filename) override; Context* context() { return context_; } size_t id() { return id_; } private: ExecutableImpl(const ExecutableImpl &e); ExecutableImpl& operator=(const ExecutableImpl &e); std::unique_ptr code; Symbol* GetSymbolInternal( const char *symbol_name, const hsa_agent_t *agent); hsa_status_t LoadSegments(hsa_agent_t agent, const code::AmdHsaCode *c, uint32_t majorVersion); hsa_status_t LoadSegmentsV1(hsa_agent_t agent, const code::AmdHsaCode *c); hsa_status_t LoadSegmentsV2(hsa_agent_t agent, const code::AmdHsaCode *c); hsa_status_t LoadSegmentV1(hsa_agent_t agent, const code::Segment *s); hsa_status_t LoadSegmentV2(const code::Segment *data_segment, loader::Segment *load_segment); hsa_status_t LoadSymbol(hsa_agent_t agent, amd::hsa::code::Symbol* sym, uint32_t majorVersion); hsa_status_t LoadDefinitionSymbol(hsa_agent_t agent, amd::hsa::code::Symbol* sym, uint32_t majorVersion); hsa_status_t LoadDeclarationSymbol(hsa_agent_t agent, amd::hsa::code::Symbol* sym, uint32_t majorVersion); hsa_status_t ApplyRelocations(hsa_agent_t agent, amd::hsa::code::AmdHsaCode *c); hsa_status_t ApplyStaticRelocationSection(hsa_agent_t agent, amd::hsa::code::RelocationSection* sec); hsa_status_t ApplyStaticRelocation(hsa_agent_t agent, amd::hsa::code::Relocation *rel); hsa_status_t ApplyDynamicRelocationSection(hsa_agent_t agent, amd::hsa::code::RelocationSection* sec); hsa_status_t ApplyDynamicRelocation(hsa_agent_t agent, amd::hsa::code::Relocation *rel); Segment* VirtualAddressSegment(uint64_t vaddr); uint64_t SymbolAddress(hsa_agent_t agent, amd::hsa::code::Symbol* sym); uint64_t SymbolAddress(hsa_agent_t agent, amd::elf::Symbol* sym); Segment* SymbolSegment(hsa_agent_t agent, amd::hsa::code::Symbol* sym); Segment* SectionSegment(hsa_agent_t agent, amd::hsa::code::Section* sec); amd::hsa::common::ReaderWriterLock rw_lock_; hsa_profile_t profile_; Context *context_; Logger logger_; const size_t id_; hsa_default_float_rounding_mode_t default_float_rounding_mode_; hsa_executable_state_t state_; ProgramSymbolMap program_symbols_; AgentSymbolMap agent_symbols_; std::vector objects; Segment *program_allocation_segment; std::vector loaded_code_objects; }; class AmdHsaCodeLoader : public Loader { private: Context* context; std::vector executables; amd::hsa::common::ReaderWriterLock rw_lock_; public: AmdHsaCodeLoader(Context* context_) : context(context_) { assert(context); } Context* GetContext() const override { return context; } Executable* CreateExecutable( hsa_profile_t profile, const char *options, hsa_default_float_rounding_mode_t default_float_rounding_mode = HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT) override; hsa_status_t FreezeExecutable(Executable *executable, const char *options) override; void DestroyExecutable(Executable *executable) override; hsa_status_t IterateExecutables( hsa_status_t (*callback)( hsa_executable_t executable, void *data), void *data) override; hsa_status_t QuerySegmentDescriptors( hsa_ven_amd_loader_segment_descriptor_t *segment_descriptors, size_t *num_segment_descriptors) override; hsa_executable_t FindExecutable(uint64_t device_address) override; uint64_t FindHostAddress(uint64_t device_address) override; void PrintHelp(std::ostream& out) override; void EnableReadOnlyMode(); void DisableReadOnlyMode(); }; } // namespace loader } // namespace hsa } // namespace amd } // namespace rocr #endif // HSA_RUNTIME_CORE_LOADER_EXECUTABLE_HPP_