CVE-2022-48644 Information

Description

In the Linux kernel the following vulnerability has been resolved:

net/sched: taprio: avoid disabling offload when it was never enabled

In an incredibly strange API design decision qdisc->destroy() gets called even if qdisc->init() never succeeded not exclusively since commit 87b60cfacf9f ( et_sched: fix error recovery at qdisc creation) but apparently also earlier (in the case of qdisc_create_dflt()).

The taprio qdisc does not fully acknowledge this when it attempts full offload because it starts off with q->flags = TAPRIO_FLAGS_INVALID in taprio_init() then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS parsed from netlink (in taprio_change() tail called from taprio_init()).

But in taprio_destroy() we call taprio_disable_offload() and this determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).

But looking at the implementation of FULL_OFFLOAD_IS_ENABLED() (a bitwise check of bit 1 in q->flags) it is invalid to call this macro on q->flags when it contains TAPRIO_FLAGS_INVALID because that is set to U32_MAX and therefore FULL_OFFLOAD_IS_ENABLED() will return true on an invalid set of flags.

As a result it is possible to crash the kernel if user space forces an error between setting q->flags = TAPRIO_FLAGS_INVALID and the calling of taprio_enable_offload(). This is because drivers do not expect the offload to be disabled when it was never enabled.

The error that we force here is to attach taprio as a non-root qdisc but instead as child of an mqprio root qdisc:

$ tc qdisc add dev swp0 root handle 1:
mqprio num_tc 8 map 0 1 2 3 4 5 6 7
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0 $ tc qdisc replace dev swp0 parent 1:1
taprio num_tc 8 map 0 1 2 3 4 5 6 7
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0
sched-entry S 0x7f 990000 sched-entry S 0x80 100000
flags 0x0 clockid CLOCK_TAI Unable to handle kernel paging request at virtual address fffffffffffffff8 [fffffffffffffff8] pgd=0000000000000000 p4d=0000000000000000 Internal error: Oops: 96000004 [1] PREEMPT SMP Call trace: taprio_dump+0x27c/0x310 vsc9959_port_setup_tc+0x1f4/0x460 felix_port_setup_tc+0x24/0x3c dsa_slave_setup_tc+0x54/0x27c taprio_disable_offload.isra.0+0x58/0xe0 taprio_destroy+0x80/0x104 qdisc_create+0x240/0x470 tc_modify_qdisc+0x1fc/0x6b0 rtnetlink_rcv_msg+0x12c/0x390 netlink_rcv_skb+0x5c/0x130 rtnetlink_rcv+0x1c/0x2c

Fix this by keeping track of the operations we made and undo the offload only if we actually did it.

I’ve added ool offloaded\ inside a 4 byte hole between \int clockid\nand tomic64_t picos_per_byte. Now the first cache line looks like below:

$ pahole -C taprio_sched net/sched/sch_taprio.o struct taprio_sched struct Qdisc qdiscs; / 0 8 / struct Qdisc root; / 8 8 / u32 flags; / 16 4 / enum tk_offsets tk_offset; / 20 4 / int clockid; / 24 4 / bool offloaded; / 28 1 /

    / XXX 3 bytes hole try to pack /

    atomic64_t                 picos_per_byte;       /    32     0 /

    / XXX 8 bytes hole try to pack /

    spinlock_t                 current_entry_lock;   /    40     0 /

    / XXX 8 bytes hole try to pack /

    struct sched_entry        current_entry;        /    48     8 /
    struct sched_gate_list    oper_sched;           /    56     8 /
    / --- cacheline 1 boundary (64 bytes) --- /

Reference

https://git.kernel.org/stable/c/d12a1eb07003e597077329767c6aa86a7e972c76 https://git.kernel.org/stable/c/586def6ebed195f3594a4884f7c5334d0e1ad1bb https://git.kernel.org/stable/c/f58e43184226e5e9662088ccf1389e424a3a4cbd https://git.kernel.org/stable/c/c7c9c7eb305ab8b4e93e4e4e1b78d8cfcbc26323 https://git.kernel.org/stable/c/db46e3a88a09c5cf7e505664d01da7238cd56c92

Share on: