2023-11-20 15:14

企业号

关注

图形引擎实战：Native RenderPass原理及URP

上篇文章介绍了RenderPass相关概念，本篇将介绍URP中RenderPass的应用。

NativeRenderPass的主要调度函数如下： - ScriptableRenderer.Execute - SetupNativeRenderPassFrameData (合并RenderPass) - ExecuteBlock - ExecuteRenderPass (设置并启用设置RenderPass) - ConfigureNativeRenderPass (集中Configure RenderPass内各个SubPass) - SetRenderPassAttachments - SetNativeRenderPassMRTAttachmentList (收集RenderPass内所有Attachment，构造AttachmentDescriptor列表与pass的color attachment list、input attachment list)/SetNativeRenderPassAttachmentList - ExecuteNativeRenderPass

下面是主要调度函数的解释：SetupNativeRenderPassFrameData用来合并所有符合条件的pass，既相连的、target 宽高一致、samples一致的renderpass，方法为用这些参数计算哈希，确保相连是用一个currentHashIndex作为哈希的一部分，如果检测到当前pass和前一个宽高samples一致的pass不连续，则将currentHashIndex+1后重新计算哈希。

internal void SetupNativeRenderPassFrameData(CameraData cameraData, bool isRenderPassEnabled)
{
    using (new ProfilingScope(null, Profiling.setupFrameData))
    {
        int lastPassIndex = m_ActiveRenderPassQueue.Count - 1;

        // Make sure the list is already sorted!

        m_MergeableRenderPassesMap.Clear();
        m_RenderPassesAttachmentCount.Clear();

        // 用于确保pass连续的hash索引
        uint currentHashIndex = 0;
        for (int i = 0; i < m_ActiveRenderPassQueue.Count; ++i)
        {
            var renderPass = m_ActiveRenderPassQueue[i];

            // 收集当前ScriptableRenderPass的RT的desc，包括宽高、sample、depth图
            var rpDesc = InitializeRenderPassDescriptor(cameraData, renderPass);

            renderPass.isLastPass = false;
            renderPass.renderPassQueueIndex = i;

            bool RPEnabled = renderPass.useNativeRenderPass && isRenderPassEnabled;
            if (!RPEnabled)
                continue;

            // 首先计算rp的hash
            Hash128 hash = CreateRenderPassHash(rpDesc, currentHashIndex);

            // 建立一个索引到hash的映射
            m_PassIndexToPassHash[i] = hash;

            // 如果这种hash不存在，则从pool中获取一个长度20的int数组，这个数组会在渲染完后，全部置-1
            // 这个数组会向里面逐个添加renderpass的索引
            if (!m_MergeableRenderPassesMap.ContainsKey(hash))
            {
                m_MergeableRenderPassesMap.Add(hash, m_MergeableRenderPassesMapArrays[m_MergeableRenderPassesMap.Count]);
                m_RenderPassesAttachmentCount.Add(hash, 0);
            }
            // 如果hash存在，则取得这个hash列表中，最后一个rp的索引，来判断与当前rp是否连续
            else if (m_MergeableRenderPassesMap[hash][GetValidPassIndexCount(m_MergeableRenderPassesMap[hash]) - 1] != (i - 1))
            {
                // 如果不连续，则打断rp合并，添加哈希索引，并重新计算hash，取得列表、设置映射

                currentHashIndex++;
                hash = CreateRenderPassHash(rpDesc, currentHashIndex);

                m_PassIndexToPassHash[i] = hash;

                m_MergeableRenderPassesMap.Add(hash, m_MergeableRenderPassesMapArrays[m_MergeableRenderPassesMap.Count]);
                m_RenderPassesAttachmentCount.Add(hash, 0);
            }

            // 将自己的索引设置到映射列表中最后一个元素
            m_MergeableRenderPassesMap[hash][GetValidPassIndexCount(m_MergeableRenderPassesMap[hash])] = i;
        }

        m_ActiveRenderPassQueue[lastPassIndex].isLastPass = true;

        for (int i = 0; i < m_ActiveRenderPassQueue.Count; ++i)
        {
            m_ActiveRenderPassQueue[i].m_ColorAttachmentIndices = new NativeArray<int>(8, Allocator.Temp);
            m_ActiveRenderPassQueue[i].m_InputAttachmentIndices = new NativeArray<int>(8, Allocator.Temp);
        }
    }
}

ConfigureNativeRenderPass用来集中Configure所有subpass，具体做法是遍历所有subpass（所有是指当前ExecuteBlock中的所有），通过subpass找到所属的renderpass，假如这个subpass是renderpass中第一个subpass，既调用renderpass中所有subpass的Configure。

internal void ConfigureNativeRenderPass(CommandBuffer cmd, ScriptableRenderPass renderPass, CameraData cameraData)
{
    using (new ProfilingScope(null, Profiling.configure))
    {
        // 当前renderpass(SubPass)在所有有效的rp列表的索引，在SetupNativeRenderPassFrameData里设置
        int currentPassIndex = renderPass.renderPassQueueIndex;
        // 通过索引获取hash
        Hash128 currentPassHash = m_PassIndexToPassHash[currentPassIndex];
        // 通过hash获取renderpass列表
        int[] currentMergeablePasses = m_MergeableRenderPassesMap[currentPassHash];

        // 如果当前SubPass是列表中第一个第一个
        if (currentMergeablePasses.First() == currentPassIndex)
        {
            foreach (var passIdx in currentMergeablePasses)
            {
                if (passIdx == -1)
                    break;
                ScriptableRenderPass pass = m_ActiveRenderPassQueue[passIdx];
                pass.Configure(cmd, cameraData.cameraTargetDescriptor);
            }
        }
    }
}

SetNativeRenderPassMRTAttachmentList用来收集RenderPass中，所有subpass的color attachment和input attachment，组成构造AttachmentDescriptor列表，以及每个subpass对列表的引用列表

internal void SetNativeRenderPassMRTAttachmentList(ScriptableRenderPass renderPass, ref CameraData cameraData, bool needCustomCameraColorClear, ClearFlag clearFlag)
{
    using (new ProfilingScope(null, Profiling.setMRTAttachmentsList))
    {
        //获取当前renderpass所有subpass
        int currentPassIndex = renderPass.renderPassQueueIndex;
        Hash128 currentPassHash = m_PassIndexToPassHash[currentPassIndex];
        int[] currentMergeablePasses = m_MergeableRenderPassesMap[currentPassHash];

        // 如果当前不是第一个subpass就退出
        if (currentMergeablePasses.First() != currentPassIndex)
            return;

        m_RenderPassesAttachmentCount[currentPassHash] = 0;

        // 将每个Attachment的StoreAction设置为Store，如果pass有单独设置，则设置为单独设置的
        UpdateFinalStoreActions(currentMergeablePasses, cameraData);

        int currentAttachmentIdx = 0;//attachment计数器
        foreach (var passIdx in currentMergeablePasses)
        {
            if (passIdx == -1)
                break;
            ScriptableRenderPass pass = m_ActiveRenderPassQueue[passIdx];

            // 初始化color 和 input attachment的引用列表
            for (int i = 0; i < pass.m_ColorAttachmentIndices.Length; ++i)
                pass.m_ColorAttachmentIndices[i] = -1;

            for (int i = 0; i < pass.m_InputAttachmentIndices.Length; ++i)
                pass.m_InputAttachmentIndices[i] = -1;

            // 获取subpass colorAttachments数量，默认为只有一个CameraTarget，自定义需要在SciptableRenderPass.Configure里设置
            uint validColorBuffersCount = RenderingUtils.GetValidColorBufferCount(pass.colorAttachments);

            for (int i = 0; i < validColorBuffersCount; ++i)
            {
                AttachmentDescriptor currentAttachmentDescriptor =
                    new AttachmentDescriptor(pass.renderTargetFormat[i] != GraphicsFormat.None ? pass.renderTargetFormat[i] : GetDefaultGraphicsFormat(cameraData));

                // 寻找当前attachment是否被之前subpass创建过
                var colorTarget = pass.overrideCameraTarget ? pass.colorAttachments[i] : m_CameraColorTarget;
                int existingAttachmentIndex = FindAttachmentDescriptorIndexInList(colorTarget, m_ActiveColorAttachmentDescriptors);

                if (m_UseOptimizedStoreActions)
                    currentAttachmentDescriptor.storeAction = m_FinalColorStoreAction[i];

                if (existingAttachmentIndex == -1)
                {
                    // 如果没创建过，就ConfigureTarget、Clear并设置给renderpass的AttachmentDescriptor列表
                    m_ActiveColorAttachmentDescriptors[currentAttachmentIdx] = currentAttachmentDescriptor;
                    m_ActiveColorAttachmentDescriptors[currentAttachmentIdx].ConfigureTarget(colorTarget, (pass.clearFlag & ClearFlag.Color) == 0, true);

                    if (pass.colorAttachments[i] == m_CameraColorTarget && needCustomCameraColorClear && (clearFlag & ClearFlag.Color) != 0)
                        m_ActiveColorAttachmentDescriptors[currentAttachmentIdx].ConfigureClear(CoreUtils.ConvertSRGBToActiveColorSpace(cameraData.camera.backgroundColor), 1.0f, 0);
                    else if ((pass.clearFlag & ClearFlag.Color) != 0)
                        m_ActiveColorAttachmentDescriptors[currentAttachmentIdx].ConfigureClear(CoreUtils.ConvertSRGBToActiveColorSpace(pass.clearColor), 1.0f, 0);

                    // 给subpass设置引用
                    pass.m_ColorAttachmentIndices[i] = currentAttachmentIdx;
                    currentAttachmentIdx++;
                    m_RenderPassesAttachmentCount[currentPassHash]++;
                }
                else
                {
                    // 如果已经被之前subpass创建过，直接设置引用即可
                    pass.m_ColorAttachmentIndices[i] = existingAttachmentIndex;
                }
            }

            //设置subpass的input attachment
            if (PassHasInputAttachments(pass))
                SetupInputAttachmentIndices(pass);

            // TODO: this is redundant and is being setup for each attachment. Needs to be done only once per mergeable pass list (we need to make sure mergeable passes use the same depth!)
            m_ActiveDepthAttachmentDescriptor = new AttachmentDescriptor(SystemInfo.GetGraphicsFormat(DefaultFormat.DepthStencil));
            m_ActiveDepthAttachmentDescriptor.ConfigureTarget(pass.overrideCameraTarget ? pass.depthAttachment : m_CameraDepthTarget, (clearFlag & ClearFlag.DepthStencil) == 0, true);

            if ((clearFlag & ClearFlag.DepthStencil) != 0)
                m_ActiveDepthAttachmentDescriptor.ConfigureClear(Color.black, 1.0f, 0);

            if (m_UseOptimizedStoreActions)
                m_ActiveDepthAttachmentDescriptor.storeAction = m_FinalDepthStoreAction;
        }
    }
}

SetNativeRenderPassAttachmentList方法和SetNativeRenderPassMRTAttachmentList作用类似。

最后在ExecuteNativeRenderPass中，实际执行Native RenderPass：

internal void ExecuteNativeRenderPass(ScriptableRenderContext context, ScriptableRenderPass renderPass, CameraData cameraData, ref RenderingData renderingData)
{
    using (new ProfilingScope(null, Profiling.execute))
    {
        //获取当前renderpass所有subpass
        int currentPassIndex = renderPass.renderPassQueueIndex;
        Hash128 currentPassHash = m_PassIndexToPassHash[currentPassIndex];
        int[] currentMergeablePasses = m_MergeableRenderPassesMap[currentPassHash];

        int validColorBuffersCount = m_RenderPassesAttachmentCount[currentPassHash];

        bool isLastPass = renderPass.isLastPass;
        // TODO: review the lastPassToBB logic to mak it work with merged passes
        // keep track if this is the current camera's last pass and the RT is the backbuffer (BuiltinRenderTextureType.CameraTarget)
        bool isLastPassToBB = isLastPass && (m_ActiveColorAttachmentDescriptors[0].loadStoreTarget ==
            BuiltinRenderTextureType.CameraTarget);
        var depthOnly = renderPass.depthOnly || (cameraData.targetTexture != null && IsDepthOnlyRenderTexture(cameraData.targetTexture));
        bool useDepth = depthOnly || (!renderPass.overrideCameraTarget || (renderPass.overrideCameraTarget && renderPass.depthAttachment != BuiltinRenderTextureType.CameraTarget)) &&
            (!(isLastPassToBB || (isLastPass && cameraData.camera.targetTexture != null)));

        //额外增加一个位置放depth attachment
        var attachments =
            new NativeArray<AttachmentDescriptor>(useDepth && !depthOnly ? validColorBuffersCount + 1 : 1,
                Allocator.Temp);

        for (int i = 0; i < validColorBuffersCount; ++i)
            attachments[i] = m_ActiveColorAttachmentDescriptors[i];

        if (useDepth && !depthOnly)
            attachments[validColorBuffersCount] = m_ActiveDepthAttachmentDescriptor;

        var rpDesc = InitializeRenderPassDescriptor(cameraData, renderPass);

        int validPassCount = GetValidPassIndexCount(currentMergeablePasses);

        // 获取当前pass color attachment数
        var attachmentIndicesCount = GetSubPassAttachmentIndicesCount(renderPass);
        // 填充索引
        var attachmentIndices = new NativeArray<int>(!depthOnly ? (int)attachmentIndicesCount : 0, Allocator.Temp);
        if (!depthOnly)
        {
            for (int i = 0; i < attachmentIndicesCount; ++i)
            {
                attachmentIndices[i] = renderPass.m_ColorAttachmentIndices[i];
            }
        }

        // 第一个subpass负责BeginRenderPass
        if (validPassCount == 1 || currentMergeablePasses[0] == currentPassIndex) // Check if it's the first pass
        {
            if (PassHasInputAttachments(renderPass))
                Debug.LogWarning("First pass in a RenderPass should not have input attachments.");

            context.BeginRenderPass(rpDesc.w, rpDesc.h, Math.Max(rpDesc.samples, 1), attachments,
                useDepth ? (!depthOnly ? validColorBuffersCount : 0) : -1);
            attachments.Dispose();

            context.BeginSubPass(attachmentIndices);

            m_LastBeginSubpassPassIndex = currentPassIndex;
        }
        else
        {
            // AreAttachmentIndicesCompatible查看当前pass的color attachment是否是上一个color attachment的子集，如果是两个subpass可以合并
            // 同时会检测是否有input attachment，当前版本只要有就需要开一个新的subpass，其实可以优化
            if (!AreAttachmentIndicesCompatible(m_ActiveRenderPassQueue[m_LastBeginSubpassPassIndex], m_ActiveRenderPassQueue[currentPassIndex]))
            {
                context.EndSubPass();
                if (PassHasInputAttachments(m_ActiveRenderPassQueue[currentPassIndex]))
                    context.BeginSubPass(attachmentIndices, m_ActiveRenderPassQueue[currentPassIndex].m_InputAttachmentIndices);
                else
                    context.BeginSubPass(attachmentIndices);

                m_LastBeginSubpassPassIndex = currentPassIndex;
            }
            else if (PassHasInputAttachments(m_ActiveRenderPassQueue[currentPassIndex]))
            {
                context.EndSubPass();
                context.BeginSubPass(attachmentIndices, m_ActiveRenderPassQueue[currentPassIndex].m_InputAttachmentIndices);

                m_LastBeginSubpassPassIndex = currentPassIndex;
            }
        }

        attachmentIndices.Dispose();
        // 执行subpass
        renderPass.Execute(context, ref renderingData);

        // 如果是最后一个subpass，则负责EndRenderPass
        if (validPassCount == 1 || currentMergeablePasses[validPassCount - 1] == currentPassIndex) // Check if it's the last pass
        {
            context.EndSubPass();
            context.EndRenderPass();

            m_LastBeginSubpassPassIndex = 0;
        }

        for (int i = 0; i < m_ActiveColorAttachmentDescriptors.Length; ++i)
        {
            m_ActiveColorAttachmentDescriptors[i] = RenderingUtils.emptyAttachment;
        }

        m_ActiveDepthAttachmentDescriptor = RenderingUtils.emptyAttachment;
    }
}

通过上面代码，可以发现URP的限制比SRP更大，SRP起码支持depth作为输入的renderpass写法，URP却没有相关机制，这就导致URP在实现Deferred时，额外弄了个GBuffer3存深度，如果有能力，我认为需要修改这部分的逻辑。

使用URP写出RenderPass兼容的代码所需要的源码改动

我认为Native RenderPass的代码，在2021版本（甚至包括2023）的URP还没有很好的组织，也不打算将Native RenderPass相关接口暴露给外部开发者，这体现在很多的相关接口都是internal，并且组织逻辑上也有漏洞。

例如GBufferPass中是否申请RT的逻辑，需要结合后续情况，资源是否被后续RenderPass使用，来判断RT是否是暂态资源；例如如果后续需要SSAO，那么明确需要保证NormalGBuffer需要在SSAO使用，这样NormalGBuffer就不可以作为暂态资源，需要明确申请RT。

在2021的版本中，对于暂态资源的设置体现在SetupInputAttachmentIndices方法中：

internal void SetupInputAttachmentIndices(ScriptableRenderPass pass)
{
    var validInputBufferCount = GetValidInputAttachmentCount(pass);
    pass.m_InputAttachmentIndices = new NativeArray<int>(validInputBufferCount, Allocator.Temp);
    for (int i = 0; i < validInputBufferCount; i++)
    {
        pass.m_InputAttachmentIndices[i] = FindAttachmentDescriptorIndexInList(pass.m_InputAttachments[i], m_ActiveColorAttachmentDescriptors);
        if (pass.m_InputAttachmentIndices[i] == -1)
        {
            Debug.LogWarning("RenderPass Input attachment not found in the current RenderPass");
            continue;
        }

        // input attachment的附件都设置为瞬态资源
        m_ActiveColorAttachmentDescriptors[pass.m_InputAttachmentIndices[i]].loadAction = RenderBufferLoadAction.DontCare;
        m_ActiveColorAttachmentDescriptors[pass.m_InputAttachmentIndices[i]].storeAction = RenderBufferStoreAction.DontCare;
        m_ActiveColorAttachmentDescriptors[pass.m_InputAttachmentIndices[i]].loadStoreTarget = BuiltinRenderTextureType.None;
    }
}

下面的三个赋值语句是将所有会被作为输入附件的RT都作为瞬态资源来看，这显然不合理；DeferredLightingPass需要Normal作为输入，难道后面的SSAO（另一个RenderPass）就不能获取Normal了吗？

在后续的URP版本中，这里稍微做了点调整，让pass自己声明这一input attachment是否是暂态的：

//前面都一样，后续的三个赋值变成下面的语句
// Only update it as long as it has default value - if it was changed once, we assume it'll be memoryless in the whole RenderPass
if (!m_IsActiveColorAttachmentTransient[pass.m_InputAttachmentIndices[i]])
{
    m_IsActiveColorAttachmentTransient[pass.m_InputAttachmentIndices[i]] = pass.IsInputAttachmentTransient(i);
}

ScriptableRenderPass的ConfigureInputAttachments方法会多一个参数来设置是否暂态：

public void ConfigureInputAttachments(RenderTargetIdentifier[] inputs, bool[] isTransient)
{
    m_InputAttachments = inputs;
    m_InputAttachmentIsTransient = isTransient;
}

然后在SetupInputAttachmentIndices之前，先假设所有RT都不是暂态，在方法中，假如RT不是暂态，则采用pass声明的暂态状态。

DeferredLights会将自身的GBuffer输入是否是暂态写好：

this.DeferredInputIsTransient = new bool[4]
{
    true, true, true, false
};

然后在DeferredPass中ConfigureInputAttachments：

public override void Configure(CommandBuffer cmd, RenderTextureDescriptor cameraTextureDescripor)
{
    RenderTargetIdentifier lightingAttachmentId = m_DeferredLights.GbufferAttachmentIdentifiers[m_DeferredLights.GBufferLightingIndex];
    RenderTargetIdentifier depthAttachmentId = m_DeferredLights.DepthAttachmentIdentifier;
    if (m_DeferredLights.UseRenderPass)
        ConfigureInputAttachments(m_DeferredLights.DeferredInputAttachments, m_DeferredLights.DeferredInputIsTransient);

GBufferPass中会根据GBuffer的索引来决定是否要创建RT：

public override void Configure(CommandBuffer cmd, RenderTextureDescriptor cameraTextureDescriptor)
{
    RenderTargetHandle[] gbufferAttachments = m_DeferredLights.GbufferAttachments;

    if (cmd != null)
    {
        for (int i = 0; i < gbufferAttachments.Length; ++i)
        {
            // 放光照信息的RT已经申请好了，所以不需要申请
            if (i == m_DeferredLights.GBufferLightingIndex)
                continue;

            // 如果有depth normal perpass则不需要申请
            if (i == m_DeferredLights.GBufferNormalSmoothnessIndex && m_DeferredLights.HasNormalPrepass)
                continue;

            // 如果GBuffer不是下面两个，代表是 Memoryless 的，所以不需要申请
            if (m_DeferredLights.UseRenderPass && i != m_DeferredLights.GBufferShadowMask && i != m_DeferredLights.GBufferRenderingLayers)
                continue;

            RenderTextureDescriptor gbufferSlice = cameraTextureDescriptor;
            gbufferSlice.depthBufferBits = 0; // make sure no depth surface is actually created
            gbufferSlice.stencilFormat = GraphicsFormat.None;
            gbufferSlice.graphicsFormat = m_DeferredLights.GetGBufferFormat(i);
            cmd.GetTemporaryRT(m_DeferredLights.GbufferAttachments[i].id, gbufferSlice);
        }
    }

可以看出一个问题，上述的逻辑中，RT的暂态与否是被Pass声明的，但实际上应该是由管线关系来决定，RT的申请也应该与RT暂态相关联，这些在代码中都没有体现。

尝试写一个兼容Feature

我们就简单做一个全屏效果，根据屏幕空间法线，来计算积雪，事件是在做完延迟光照后，既RenderPassEvent.AfterRenderingDeferredLights。

这个是shader:

Shader "Unlit/FullscreenSnow"
{
    SubShader
    {
        Tags { "RenderPipeline"="UniversalPipeline" "RenderType"="Opaque" "Queue"="Geometry" }
        Pass
        {
            Tags { "LightMode"="UniversalForward" }

            Blend SrcAlpha OneMinusSrcAlpha
            ZWrite Off
            ZTest Always

            HLSLPROGRAM
            #pragma vertex vert
            #pragma fragment frag

            #pragma multi_compile _ _RENDER_PASS_ENABLED

            #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"

            struct Attributes
            {
                float3 positionOS : POSITION;
                float2 uv : TEXCOORD;
            };

            struct Varyings
            {
                float4 positionCS : SV_POSITION;
                float2 uv : TEXCOORD;
            };

            float _SnowNormalPower;
            float3 _SnowColor;

            SamplerState my_point_clamp_sampler;
            TEXTURE2D_X_HALF(_GBuffer2);

            #if _RENDER_PASS_ENABLED
            #define GBUFFER2 0
            FRAMEBUFFER_INPUT_HALF(GBUFFER2);
            #endif

            Varyings vert(Attributes input)
            {
                Varyings output = (Varyings)0;

                VertexPositionInputs vertexInput = GetVertexPositionInputs(input.positionOS.xyz);
                output.positionCS = vertexInput.positionCS;

                output.uv = input.uv;

                return output;
            }

            float4 frag(Varyings input) : SV_Target
            {
                #if _RENDER_PASS_ENABLED
                float4 screenNormal = LOAD_FRAMEBUFFER_INPUT(GBUFFER2, input.positionCS.xy);
                #else
                float4 screenNormal = SAMPLE_TEXTURE2D_X_LOD(_GBuffer2, my_point_clamp_sampler, input.uv, 0);
                #endif

                float3 normalWS = normalize(UnpackNormal(screenNormal));

                float snowFact = dot(float3(0, 1, 0), normalWS);
                snowFact = saturate(snowFact);
                snowFact = pow(snowFact, _SnowNormalPower);

                return float4(_SnowColor, snowFact);
            }

            ENDHLSL
        }
    }
}

需要注意，这里我将GBuffer2 define成0，因为这个subpass只有一个subpassInput输入，法线由第三个变成第一个输入（我一开始没注意导致一开native renderpass就崩溃）。

创建一个ScriptableRendererFeature，Feature提供了一个接口来决定是否支持NativeRenderPass，但要注意这个抽象接口是internal的，需要在URP中改成public：

public override bool SupportsNativeRenderPass()
{
    return true;
}

feature的逻辑只有创建pass、判断是否是延迟管线，如果是就setup+enqueuePass：

public override void Create()
{
    m_SnowPass = new SnowPass()
    {
        renderPassEvent = RenderPassEvent.AfterRenderingDeferredLights
    };
}

public override void AddRenderPasses(ScriptableRenderer renderer, ref RenderingData renderingData)
{
    UniversalRenderer universalRenderer = renderer as UniversalRenderer;
    bool isDeferred = universalRenderer.actualRenderingMode == RenderingMode.Deferred;
    if (isDeferred && m_Settings.snowShader != null)
    {
        m_SnowPass.Setup(universalRenderer.deferredLights, m_Settings, renderer.useRenderPassEnabled);
        renderer.EnqueuePass(m_SnowPass);
    }
}

SnowPass的Configure需要设置当前Pass的input attachment和color attachment：

public override void OnCameraSetup(CommandBuffer cmd, ref RenderingData renderingData)
{
    RenderTargetIdentifier normalAttachmentId = m_DeferredLights.GbufferAttachmentIdentifiers[m_DeferredLights.GBufferNormalSmoothnessIndex];

    RenderTargetIdentifier lightingAttachmentId = m_DeferredLights.GbufferAttachmentIdentifiers[m_DeferredLights.GBufferLightingIndex];
    RenderTargetIdentifier depthAttachmentId = m_DeferredLights.DepthAttachmentIdentifier;

    ConfigureInputAttachments(normalAttachmentId, true);
    ConfigureTarget(lightingAttachmentId, depthAttachmentId);
}

Execute里Draw全屏mesh即可：

public override void Execute(ScriptableRenderContext context, ref RenderingData renderingData)
{
    CommandBuffer cmd = CommandBufferPool.Get("Snow Pass");

    cmd.SetGlobalColor("_SnowColor", m_Settings.snowColor);
    cmd.SetGlobalFloat("_SnowNormalPower", m_Settings.snowNormalPower);

    cmd.SetViewProjectionMatrices(Matrix4x4.identity, Matrix4x4.identity);
    cmd.DrawMesh(RenderingUtils.fullscreenMesh, Matrix4x4.identity, m_Material, 0, 0);
    cmd.SetViewProjectionMatrices(renderingData.cameraData.camera.worldToCameraMatrix, renderingData.cameraData.camera.projectionMatrix);

    context.ExecuteCommandBuffer(cmd);
    CommandBufferPool.Release(cmd);
}

做完后即可看到SnowPass作为subpass，与GBufferPass和DeferredPass合并为一个renderpass。

5. 项目关于URP RenderPass的修改

本节是本人对RenderPass的实践，不保证其中的操作是最佳实践，目的是为了实践前文的理论，为没有思路的同学提供一些想法，如果有更好的解决方案也欢迎讨论。

项目与demo的区别是，demo没有过多的功能干扰，可以直接确定gbuffer的数量、layout，而实际运行项目中，往往会因为开关某个功能，导致相关layout、资源发生变化，难点主要在于如何去自适应的去分配layout、分配资源。

实际上项目的layout在大多数时候也是固定的，毕竟经常变化的layout也会造成性能不稳定、bug不好查。但作为引擎中台，需要让管线适配不同项目，在修改出一套通用管线时，需要让管线更智能化。

资源管理

在较老的URP版本中，是否需要某个资源，比如深度、屏幕空间颜色，是直接在管线资产和相机上勾选，如果勾选了，就会在管线中启用相关的Copy Pass，较新的版本中，是pass采用ConfigureInput来申请资源，这样更加自动化。

这种申请Pass的方法在非RenderPass的情况下基本是够用的，目前包含Depth、Normal、Color，如果需要可以拓展，比如加入Motion、DepthScaled、DepthPyramid；但对于RenderPass的组织是不太够用的。

对于移动端来说，设置RT时就会确定这个RT的load store action，既是否从将纹理从内存load进来，切换到其他RenderPass时，是否要将RT写回内存；最保底的方法就是始终Load和Store，URP会多做一些，比如如果某个Pass对Color ConfigureClear，就将Load设置为DontCare，代表丢弃之前的图像信息。

想要真的一点也不浪费，每次切换RT时，如果之前没有Pass写入RT，或当前Pass对当前RT Clear，即可设定当前RT的LoadAction为DontCare；如果后续没有RenderPass使用当前的RT，则可以设定StoreAction为DontCare。

简而言之，就是设置RT时，需要得到当前RT的前后使用状态。

目前来说，能想到的最理想的数据结构是RenderGraph，RenderGraph天生将Resource和Pass抽象为节点，将节点组成Graph数据结构，这样对资源的使用情况更容易追踪。

图来自Frostbite GDC FrameGraph分享，橙色为Pass，蓝色为Resource

我们暂时未修改为RenderGraph，因为将URP重构为RenderGraph管理需要的修改较多，可能造成的不稳定性也很大，而且URP的RenderGraph已经在Unity的Roadmap中（详见rendering-visual-effects>In Progress>Render Graph Integration）。

URP目前只是将RenderPass塞入一个list中，然后根据render event进行排序，这样其实也能通过扫描前后pass来确定RT使用情况，只是当前URP的ScriptableRenderPass类，并不能很好的记录对资源的引用情况，最多是ConfigureInput确定对Depth、Normal、Color的使用，ConfigureTarget确定Output，如果想要更智能的维护引用情况，ScriptableRenderPass至少要拓展两种情况：对RenderTargetIdentifier的Input的声明、对InputAttachemnt的声明。

// 这个方法是用来Configure管线提供的功能的RT
public void ConfigureInput(ScriptableRenderPassInput passInput)
{
    m_Input = passInput;
}

//这个方法是用来设置自定义的RT
public void ConfigurePassInput(RenderTargetIdentifier identify)
{
    m_InpuRenderTextures.Add(identify);
}

public void ConfigureInputAttachments(RenderTargetIdentifier[] inputs)
{
    m_InputAttachments = inputs;
}

URP的原有逻辑是：

//SetNativeRenderPassMRTAttachmentList load store设置
m_ActiveColorAttachmentDescriptors[currentAttachmentIdx] = currentAttachmentDescriptor;
m_ActiveColorAttachmentDescriptors[currentAttachmentIdx].ConfigureTarget(colorTarget, loadExistingContents:(pass.clearFlag & ClearFlag.Color) == 0, storeResults:true);

if (pass.colorAttachments[i] == m_CameraColorTarget && needCustomCameraColorClear && (clearFlag & ClearFlag.Color) != 0)
    m_ActiveColorAttachmentDescriptors[currentAttachmentIdx].ConfigureClear(CoreUtils.ConvertSRGBToActiveColorSpace(cameraData.camera.backgroundColor), 1.0f, 0);
else if ((pass.clearFlag & ClearFlag.Color) != 0)
    m_ActiveColorAttachmentDescriptors[currentAttachmentIdx].ConfigureClear(CoreUtils.ConvertSRGBToActiveColorSpace(pass.clearColor), 1.0f, 0);

//SetupTransientInputAttachments 暂态设置
for (int i = 0; i < attachmentCount; ++i)
{
    if (!m_IsActiveColorAttachmentTransient[i])
        continue;

    m_ActiveColorAttachmentDescriptors[i].loadAction = RenderBufferLoadAction.DontCare;
    m_ActiveColorAttachmentDescriptors[i].storeAction = RenderBufferStoreAction.DontCare;
    m_ActiveColorAttachmentDescriptors[i].loadStoreTarget = BuiltinRenderTextureType.None;
}

基于前面所说的修改，load store的设置改为对前后RenderPass扫描：

internal void NeedLoadStoreAlloc(int passIndex, RenderTargetIdentifier identifier, out bool needLoad, out bool needStore)
{
    RenderTargetIdentifier GetRealAttachment(ScriptableRenderPass pass, int attachmentIndex)
    {
        return pass.overrideCameraTarget ? pass.colorAttachments[attachmentIndex] : m_CameraColorTarget;
    }

    var hash = m_PassIndexToPassHash[passIndex];
    var renderpassList = m_MergeableRenderPassesMap[hash];
    int subpassCount = GetValidPassIndexCount(renderpassList);
    int firstSubpassIndex = renderpassList[0];
    int lastSubpassIndex = renderpassList[subpassCount - 1];

    //从下一个RenderPass开始，遍历所有subpass（包括不兼容renderpass的）
    bool store = false;
    bool storeFind = false;
    for (int i = lastSubpassIndex + 1; i < m_ActiveRenderPassQueue.Count; ++i)
    {
        var currentSubpass = m_ActiveRenderPassQueue[i];
        bool clearColor = (currentSubpass.clearFlag & ClearFlag.Color) != 0;

        if (storeFind)
            break;

        //作为attachment
        var attachmentList = currentSubpass.colorAttachments;
        for (int attachmentIndex = 0; attachmentIndex < attachmentList.Length; attachmentIndex++)
        {
            var attachmentIdentify = GetRealAttachment(currentSubpass, attachmentIndex);
            //如果是true代表这是后续renderpass中第一个要写入这个rt的subpass
            if (attachmentIdentify == identifier)
            {
                storeFind = true;
                store = !clearColor;//只要不clear就要store，下个renderpass继续向上渲染
                break;
            }
        }

        //作为input attachment
        var inputAttachmentList = currentSubpass.m_InputAttachments;
        for (int attachmentIndex = 0; attachmentIndex < inputAttachmentList.Length; attachmentIndex++)
        {
            var attachmentIdentify = inputAttachmentList[attachmentIndex];
            if (attachmentIdentify == identifier)
            {
                storeFind = true;
                store = true;//后续要读入，就肯定要存起来
                break;
            }
        }

        //作为input rt
        foreach (var attachmentIdentify in currentSubpass.m_InpuRenderTextures)
        {
            if (attachmentIdentify == identifier)
            {
                storeFind = true;
                store = true; //后续要读入，就肯定要存起来
                break;
            }
        }
    }

然后确定资源是否为暂态资源、是否申请RT：

bool store = true;
bool load = (pass.clearFlag & ClearFlag.Color) == 0;

if (isStoreOverride)
{
    store = pass.overriddenColorStoreActions[i];
}
else
{
    NeedLoadStoreAlloc(pass.renderPassQueueIndex, colorTarget, out load, out store);
}

bool isTransient = (!load && !store);//如果不load也不store，则可视为暂态资源
//查找这个RT是否被分配过，FindIndex写法有gc
//bool allocedRT = m_AutoAllocRT.FindIndex(rti => rti == colorTarget) != -1;
bool allocedRT = false;
foreach (var rti in m_AutoAllocRT)
{
    if (rti == colorTarget)
    {
        allocedRT = true;
        break;
    }
}

bool allocRT = pass.m_ColorAttachmentsAutoAlloc[i]//pass设置，RT是否要自动分配
               && !allocedRT//需要未申请过才分配RT
               && !RenderTargetUsedInForward(pass.renderPassQueueIndex, colorTarget)//需要前面的pass没有使用过
               && !isTransient;//不能是暂态资源

if (allocRT)
{
    var rpDesc = cameraData.cameraTargetDescriptor;
    rpDesc.depthBufferBits = 0;
    rpDesc.graphicsFormat = format;

    cmd.GetTemporaryRT(RenderingUtils.GetRenderTargetIdentifierID(colorTarget), rpDesc);
    m_AutoAllocRT.Add(colorTarget);
}

GBuffer Layout

GBuffer Layout的设置在管线层级维护上不是很难，只要根据功能的开关，计算出GBuffer数组的大小，然后添上对应的RT即可，在URP中的组织为：

internal int GBufferSliceCount { get { return 4 + (UseRenderPass ? 1 : 0) + (UseShadowMask ? 1 : 0) + (UseRenderingLayers ? 1 : 0); } }

基础的4张（包括albedo、normal、specular、lighting），如果开了renderpass，需要加一张depth，如果开了shadowmask、renderingLayer，都需要额外加一张gbuffer。

以及相关的index计算：

internal int GBufferAlbedoIndex { get { return 0; } }
internal int GBufferSpecularMetallicIndex { get { return 1; } }
internal int GBufferNormalSmoothnessIndex { get { return 2; } }
internal int GBufferLightingIndex { get { return 3; } }
internal int GbufferDepthIndex { get { return UseRenderPass ? GBufferLightingIndex + 1 : -1; } }
internal int GBufferShadowMask { get { return UseShadowMask ? GBufferLightingIndex + (UseRenderPass ? 1 : 0) + 1 : -1; } }
internal int GBufferRenderingLayers { get { return UseRenderingLayers ? GBufferLightingIndex + (UseRenderPass ? 1 : 0) + (UseShadowMask ? 1 : 0) + 1 : -1; } }

layout维护的难点之一在Shader上，我们找到UnityGBuffer.hlsl，可以看到GBuffer的片元输出：

struct FragmentOutput
{
    half4 GBuffer0 : SV_Target0;
    half4 GBuffer1 : SV_Target1;
    half4 GBuffer2 : SV_Target2;
    half4 GBuffer3 : SV_Target3; // Camera color attachment

    #ifdef GBUFFER_OPTIONAL_SLOT_1
    GBUFFER_OPTIONAL_SLOT_1_TYPE GBuffer4 : SV_Target4;
    #endif
    #ifdef GBUFFER_OPTIONAL_SLOT_2
    half4 GBuffer5 : SV_Target5;
    #endif
    #ifdef GBUFFER_OPTIONAL_SLOT_3
    half4 GBuffer6 : SV_Target6;
    #endif
};

包括4张固定的GBuffer，以及3个拓展GBuffer。对于slot的定义，slot是否存在，slot类型是什么，slot的别名是什么，则在这个文件的前面：

#if !defined(LIGHTMAP_ON) && defined(LIGHTMAP_SHADOW_MIXING) && !defined(SHADOWS_SHADOWMASK)
#define OUTPUT_SHADOWMASK 1 // subtractive
#elif defined(SHADOWS_SHADOWMASK)
#define OUTPUT_SHADOWMASK 2 // shadow mask
#elif defined(_DEFERRED_MIXED_LIGHTING)
#define OUTPUT_SHADOWMASK 3 // we don't know if it's subtractive or just shadowMap (from deferred lighting shader, LIGHTMAP_ON does not need to be defined)
#else
#endif

#if _RENDER_PASS_ENABLED
    #define GBUFFER_OPTIONAL_SLOT_1 GBuffer4
    #define GBUFFER_OPTIONAL_SLOT_1_TYPE float
#if OUTPUT_SHADOWMASK && defined(_LIGHT_LAYERS)
    #define GBUFFER_OPTIONAL_SLOT_2 GBuffer5
    #define GBUFFER_OPTIONAL_SLOT_3 GBuffer6
    #define GBUFFER_LIGHT_LAYERS GBuffer5
    #define GBUFFER_SHADOWMASK GBuffer6
#elif OUTPUT_SHADOWMASK
    #define GBUFFER_OPTIONAL_SLOT_2 GBuffer5
    #define GBUFFER_SHADOWMASK GBuffer5
#elif defined(_LIGHT_LAYERS)
    #define GBUFFER_OPTIONAL_SLOT_2 GBuffer5
    #define GBUFFER_LIGHT_LAYERS GBuffer5
#endif //#if OUTPUT_SHADOWMASK && defined(_LIGHT_LAYERS)
#else
    #define GBUFFER_OPTIONAL_SLOT_1_TYPE half4
#if OUTPUT_SHADOWMASK && defined(_LIGHT_LAYERS)
    #define GBUFFER_OPTIONAL_SLOT_1 GBuffer4
    #define GBUFFER_OPTIONAL_SLOT_2 GBuffer5
    #define GBUFFER_LIGHT_LAYERS GBuffer4
    #define GBUFFER_SHADOWMASK GBuffer5
#elif OUTPUT_SHADOWMASK
    #define GBUFFER_OPTIONAL_SLOT_1 GBuffer4
    #define GBUFFER_SHADOWMASK GBuffer4
#elif defined(_LIGHT_LAYERS)
    #define GBUFFER_OPTIONAL_SLOT_1 GBuffer4
    #define GBUFFER_LIGHT_LAYERS GBuffer4
#endif //#if OUTPUT_SHADOWMASK && defined(_LIGHT_LAYERS)
#endif //#if _RENDER_PASS_ENABLED

可以发现GBuffer布局有别于其他Feature宏定义的区别，它是一个累积的过程，没有slot1，就不能有slot2，所以当renderpass未开启时，如果开了shadowmask，则shadowmask就得作为slot1、gbuffer4。

这就和VertexToFragment的Struct有显著区别，因为Texcoord0和TexcoordXXX中间的Semantics不写也不会造成什么影响，它只是一个标签，而GBuffer输出的Index与CPU的绑定个数、顺序有直接关系。

我尝试将GBuffer4绑定为暂态资源，然后直接绑定GBuffer5，但这样在程序维护上不好，而且也会造成性能影响（见后文的坑部分）。

GBuffer的布局需要将其中的情况排列组合全写出来，URP是三种情况的排列组合，但如果更多呢，显然多一种情况，上述的复杂度直接*2。

所以适配多种情况时，靠手写维护的可靠性就不太好了，尽量是靠程序来生成上述的布局，我使用的是python脚本：

class FeatureElement:
    def __init__(self, feature_name, feature_alias, buffer_type, define_mode):
        self.feature_name = feature_name
        self.feature_alias = feature_alias
        self.buffer_type = buffer_type
        self.define_mode = define_mode

if __name__ == "__main__":
    with open("UnityGBufferOutputLayout.hlsl", mode="w", encoding="utf8") as f:
        #按照URP设计，slot define从1开始，这个无所谓，从0开始也行
        start_slot = 1
        #我们修改的管线中，gbuffer固定有3张，已有gbuffer0、1、2是固定的，从3开始生成
        start_gbuffer_num = 3

        gbuffer_output_feature_list = [
            FeatureElement("_PERMAT_WORKFLOW", "GBUFFER_LIGHTING", "half4", DefineMode.IFDEF),
            FeatureElement("_RENDER_PASS_ENABLED", "GBUFFER_DEPTH", "float", DefineMode.IF),
            FeatureElement("OUTPUT_SHADOWMASK", "GBUFFER_SHADOWMASK", "half4", DefineMode.IF),
            FeatureElement("_LIGHT_LAYERS", "GBUFFER_LIGHT_LAYERS", "half4", DefineMode.IFDEF)
        ]

        f.write(f"""//下列define代码由UnityGBuffer.py程序生成
{generate_feature_define(gbuffer_output_feature_list, start_slot, start_gbuffer_num)}

//下列FragmentOutput代码由UnityGBuffer.py程序生成
{generate_gbuffer_output(start_gbuffer_num, len(gbuffer_output_feature_list))} """)

上述是大致代码调用结构，生成函数generate_feature_define先不贴了，也不复杂。

6. 其他坑与总结

Unity目前对RenderPass的debug机制还不算友好，例如非暂态资源未绑定RT造成的报错不太直观，刚接触报错时，会对报错无从下手；而且经常因为某些Attachment绑定问题导致崩溃，对于这些问题，在没有精力去搭建一套容错机制之前，最好的办法只是不要一次性编写过多逻辑。

根据前一篇文章的理论，暂态资源的绑定应该是无消耗的，理论上如果在shader里增加一个无计算消耗的Attachment，并且不绑定实际RT，应该是没什么性能影响的；但实际测试发现，哪怕是暂态资源，多绑定一张依旧会有不小的性能差距，甚至降低了10%的帧率，猜测是硬件底层的实现问题。

开发难度上，目前RenderPass的程序组织难度还是体现在URP对资源管理较乱，假如是一套固定的layout，实际上RenderPass的组织不是特别难。

其次是RenderPass的实用性，从上述设计描述中，我们也可以发现，renderpass的设计听起来很美好，但实际能应用的场景不太广泛，最多还是用在Single Pass Deferred上，但很多Pass都可能导致RenderPass被打断。例如HBAO、半分辨率屏幕空间阴影等，这些Pass的特点是用到了无法兼容RenderPass的屏幕空间的表面信息，且需要在光照之前做。所以RenderPass还是推荐在低配关闭效果时使用。

然后是兼容性，理论上作为VK的标准，RenderPass适配应该没有太多兼容性问题，但实际测试也遇到一些闪退问题，比如MIUI13的早期版本，在测试RenderPass开启后，会导致应用闪退。