昨天 18:27 门头沟学院 FPGA工程师发布于广东

关注

具身智能面试都问什么

一、研究方法论与工程实践

在技术面试中，针对论文和项目的挖掘通常聚焦于逻辑闭环与工程落地。

1. 论文深挖要点

研究动机（Motivation）： 明确为什么选择该方案。例如，在动作生成策略中，单步生成（Single-step）相比多步自回归（Autoregressive）在实时控制中的延迟优势。
消融实验（Ablation Study）： 验证每个模块的有效性。
安全与鲁棒性： 如 VLA（视觉-语言-动作）模型面对对抗性攻击（如 Backdoor attacks）时的防御机制与数据构建。
学术质疑准备： 预演关于 Baseline 选取、泛化边界及 Sim-to-Real 损耗的回答。

2. LLM 与具身系统工程

算力优化： 在仿真平台（如 Isaac Lab/Gym）部署时，解决显存受限与并行采样效率问题。
特征提取： 利用稀疏自编码器（SAE）等手段进行特征分析。
推理优化： 引入推理期搜索（Inference-time search）机制，缓解模型长序列生成的指令漂移（Instruction drift）。

二、核心理论知识库

1. 转化器架构（Transformer）

自注意力机制（Self-Attention）：

$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$
- $d_k$ 缩放因子的作用：防止点积结果过大导致 Softmax 进入梯度消失的饱和区。
旋转位置编码（RoPE）：

通过绝对位置的旋转实现相对位置关系，其二维旋转操作为：

$f_q(x_m, m) = \begin{pmatrix} \cos(m\theta) & -\sin(m\theta) \\ \sin(m\theta) & \cos(m\theta) \end{pmatrix} \begin{pmatrix} q_0 \\ q_1 \end{pmatrix}$

2. 具身智能与机器人控制

接触力模型（阻抗控制）：

$F_{contact} = K_p (x_{ref} - x) + K_d (\dot{x}_{ref} - \dot{x})$
- $K_p, K_d$ 分别为刚度和阻尼系数。触觉信号在多模态模型中常被处理为 Embedding 与视听特征对齐。
策略梯度定理（Policy Gradient）：

$\nabla_\theta J(\theta) = \mathbb{E}_{\pi_\theta} \left[ \sum_{t=0}^T \nabla_\theta \log \pi_\theta(a_t|s_t) R(\tau) \right]$

3. 生成模型：流匹配（Flow Matching）

目标是回归从噪声 $p_0$ 到数据 $p_1$ 的最优传输向量场 $u_t$ ：

线性路径： $x_t = t x_1 + (1 - t) x_0$
目标向量场： $u_t(x_t | x_1) = x_1 - x_0$
损失函数： $\mathcal{L}(\theta) = \mathbb{E}_{t, x_0, x_1} [ || v_\theta(x_t, t) - u_t(x_t | x_1) ||^2 ]$

三、技术面试手撕代码集

1. 深度学习核心组件

Multi-Head Attention (含因果掩码与温度系数)

import torch
import torch.nn as nn
import math

class CausalTemperatureMHA(nn.Module):
    def __init__(self, d_model: int, num_heads: int):
        super().__init__()
        self.num_heads = num_heads
        self.d_k = d_model // num_heads
        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        self.W_o = nn.Linear(d_model, d_model)

    def forward(self, x, mask=None, temperature=1.0):
        batch_size, seq_len, _ = x.size()
        
        # 投影并分头
        Q = self.W_q(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)
        K = self.W_k(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)
        V = self.W_v(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)

        # 缩放点积注意力 + 温度系数
        scores = torch.matmul(Q, K.transpose(-2, -1)) / (temperature * math.sqrt(self.d_k))

        # 因果掩码 (Causal Mask)
        causal_mask = torch.triu(torch.ones(seq_len, seq_len, device=x.device), diagonal=1).bool()
        scores = scores.masked_fill(causal_mask, float('-inf'))

        attn = torch.softmax(scores, dim=-1)
        out = torch.matmul(attn, V).transpose(1, 2).contiguous().view(batch_size, seq_len, -1)
        return self.W_o(out)

LayerNorm 实现

class LayerNorm(nn.Module):
    def __init__(self, shape, eps=1e-5):
        super().__init__()
        self.gamma = nn.Parameter(torch.ones(shape))
        self.beta = nn.Parameter(torch.zeros(shape))
        self.eps = eps

    def forward(self, x):
        mean = x.mean(dim=-1, keepdim=True)
        var = x.var(dim=-1, unbiased=False, keepdim=True)
        return self.gamma * (x - mean) / torch.sqrt(var + self.eps) + self.beta

2. 算法与数学常考题

二分法求平方根 (实数域)

def mySqrt(x: float, epsilon: float = 1e-7) -> float:
    if x < 0: return None
    left, right = 0.0, max(x, 1.0)
    while right - left > epsilon:
        mid = (left + right) / 2
        if mid * mid > x:
            right = mid
        else:
            left = mid
    return left

双指针：接雨水

1. 题目背景与目标

题目描述：

给定一个非负整数数组 height，每个元素代表该位置柱子的高度，柱子宽度均为 1。计算在下雨之后，这些柱子之间总共能接多少单位的雨水。

2. 核心物理原理：木桶效应 (Shortest Plank Theory)

要确定某一个位置 i 能接多少水，取决于它左边最高的柱子和右边最高的柱子。

计算公式： 该位置的水量 = $\min(\text{左边最高}, \text{右边最高}) - \text{当前高度}$ 。
直观理解： 水往低处流，但能存多少取决于两侧挡板中“较矮”的那一块（即木桶效应）。

3. 代码逻辑逐行解析

你提供的代码采用了空间复杂度最优的双指针法：

Python

def trap(height: list[int]) -> int:
    # 1. 初始化
    left, right = 0, len(height) - 1  # 左右指针分别指向数组两端
    l_max, r_max = 0, 0              # 记录左侧和右侧遍历过程中的最大高度
    res = 0                          # 总体积
    
    # 2. 循环遍历
    while left < right:
        # 更新左右侧已知的最大高度
        l_max = max(l_max, height[left])
        r_max = max(r_max, height[right])
        
        # 3. 核心决策：哪边矮就处理哪边
        if l_max < r_max:
            # 如果左边最大值更小，说明左边是“短板”
            # 此时 height[left] 处的蓄水量完全由 l_max 决定
            res += l_max - height[left]
            left += 1
        else:
            # 如果右边最大值更小或相等，右边是“短板”
            res += r_max - height[right]
            right -= 1
            
    return res

为什么这个逻辑是正确的？

当你发现 l_max < r_max 时，虽然你可能不知道 left 指针右侧是否还有比 r_max 更高的柱子，但你已经确定了：对于当前的 left 位置，限制它接水的上限一定是 l_max。因此，你可以放心地计算出该点水量并移动指针。

4. 复杂度分析

时间复杂度： $O(n)$
- 只需要遍历数组一次，left 和 right 指针向中间靠拢。
空间复杂度： $O(1)$
- 只使用了常数级的额外变量（l_max, r_max, res 等），不需要额外的数组存储。

5. 面试常见追问

动态规划解法： 如果不用双指针，能否用空间换时间？（可以先预处理 left_max 和 right_max 数组，空间复杂度变为 $O(n)$ ）。
单调栈解法： 如何用栈来处理这道题？（通过维护一个高度递减的栈，按层计算水体积）。
实际应用： 在计算机视觉或机器人领域，这种“寻找局部极值间隙”的逻辑常用于处理激光雷达（LiDAR）点云数据中的凹凸地形分析。

def trap(height: list[int]) -> int:
    left, right = 0, len(height) - 1
    l_max, r_max = 0, 0
    res = 0
    while left < right:
        l_max = max(l_max, height[left])
        r_max = max(r_max, height[right])
        if l_max < r_max:
            res += l_max - height[left]
            left += 1
        else:
            res += r_max - height[right]
            right -= 1
    return res

我们以数组 height = [4, 2, 0, 3, 2, 5] 为例进行分步拆解。

1. 模拟环境准备

初始状态：
- left = 0 (指向高度 4), right = 5 (指向高度 5)
- l_max = 0, r_max = 0, res = 0

2. 逐步执行过程

算法通过移动左右指针，始终处理“短板”那一侧，以确保计算的蓄水量是准确的。

步骤	指针位置	当前高度 (h[l], h[r])	更新最大值 (l_max, r_max)	条件判断与计算	累积水量 (res)	下一步
1	`l=0, r=5`	`4, 5`	`l_max=4, r_max=5`	`l_max < r_max` (4 < 5) 为真。`res += 4 - 4 = 0`	0	`left++`
2	`l=1, r=5`	`2, 5`	`l_max=4, r_max=5`	`l_max < r_max` (4 < 5) 为真。`res += 4 - 2 = 2`	2	`left++`
3	`l=2, r=5`	`0, 5`	`l_max=4, r_max=5`	`l_max < r_max` (4 < 5) 为真。`res += 4 - 0 = 4`	6	`left++`
4	`l=3, r=5`	`3, 5`	`l_max=4, r_max=5`	`l_max < r_max` (4 < 5) 为真。`res += 4 - 3 = 1`	7	`left++`
5	`l=4, r=5`	`2, 5`	`l_max=4, r_max=5`	`l_max < r_max` (4 < 5) 为真。`res += 4 - 2 = 2`	9	`left++`
结束	`l=5, r=5`	-	-	循环条件 `left < right` 不再满足	9	-