diff options
| author | Alexei Starovoitov <ast@kernel.org> | 2021-01-14 18:34:30 -0800 |
|---|---|---|
| committer | Alexei Starovoitov <ast@kernel.org> | 2021-01-14 18:34:30 -0800 |
| commit | 7064a7341a0d2fcfeff56be7e3917421fbb8b024 (patch) | |
| tree | cb057d167ff8292f09cd4a4e78f4bd610a6fa959 /kernel | |
| parent | tools/bpftool: Add -Wall when building BPF programs (diff) | |
| parent | bpf: Document new atomic instructions (diff) | |
| download | linux-7064a7341a0d2fcfeff56be7e3917421fbb8b024.tar.gz linux-7064a7341a0d2fcfeff56be7e3917421fbb8b024.zip | |
Merge branch 'Atomics for eBPF'
Brendan Jackman says:
====================
There's still one unresolved review comment from John[3] which I
will resolve with a followup patch.
Differences from v6->v7 [1]:
* Fixed riscv build error detected by 0-day robot.
Differences from v5->v6 [1]:
* Carried Björn Töpel's ack for RISC-V code, plus a couple more acks from
Yonhgong.
* Doc fixups.
* Trivial cleanups.
Differences from v4->v5 [1]:
* Fixed bogus type casts in interpreter that led to warnings from
the 0day robot.
* Dropped feature-detection for Clang per Andrii's suggestion in [4].
The selftests will now fail to build unless you have llvm-project
commit 286daafd6512. The ENABLE_ATOMICS_TEST macro is still needed
to support the no_alu32 tests.
* Carried some Acks from John and Yonghong.
* Dropped confusing usage of __atomic_exchange from prog_test in
favour of __sync_lock_test_and_set.
* [Really] got rid of all the forest of instruction macros
(BPF_ATOMIC_FETCH_ADD and friends); now there's just BPF_ATOMIC_OP
to define all the instructions as we use them in the verifier
tests. This makes the atomic ops less special in that API, and I
don't think the resulting usage is actually any harder to read.
Differences from v3->v4 [1]:
* Added one Ack from Yonghong. He acked some other patches but those
have now changed non-trivally so I didn't add those acks.
* Fixups to commit messages.
* Fixed disassembly and comments: first arg to atomic_fetch_* is a
pointer.
* Improved prog_test efficiency. BPF progs are now all loaded in a
single call, then the skeleton is re-used for each subtest.
* Dropped use of tools/build/feature in favour of a one-liner in the
Makefile.
* Dropped the commit that created an emit_neg helper in the x86
JIT. It's not used any more (it wasn't used in v3 either).
* Combined all the different filter.h macros (used to be
BPF_ATOMIC_ADD, BPF_ATOMIC_FETCH_ADD, BPF_ATOMIC_AND, etc) into
just BPF_ATOMIC32 and BPF_ATOMIC64.
* Removed some references to BPF_STX_XADD from tools/, samples/ and
lib/ that I missed before.
Differences from v2->v3 [1]:
* More minor fixes and naming/comment changes
* Dropped atomic subtract: compilers can implement this by preceding
an atomic add with a NEG instruction (which is what the x86 JIT did
under the hood anyway).
* Dropped the use of -mcpu=v4 in the Clang BPF command-line; there is
no longer an architecture version bump. Instead a feature test is
added to Kbuild - it builds a source file to check if Clang
supports BPF atomics.
* Fixed the prog_test so it no longer breaks
test_progs-no_alu32. This requires some ifdef acrobatics to avoid
complicating the prog_tests model where the same userspace code
exercises both the normal and no_alu32 BPF test objects, using the
same skeleton header.
Differences from v1->v2 [1]:
* Fixed mistakes in the netronome driver
* Addd sub, add, or, xor operations
* The above led to some refactors to keep things readable. (Maybe I
should have just waited until I'd implemented these before starting
the review...)
* Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which
include the BPF_FETCH flag
* Added a bit of documentation. Suggestions welcome for more places
to dump this info...
The prog_test that's added depends on Clang/LLVM features added by
Yonghong in commit 286daafd6512 (was
https://reviews.llvm.org/D72184).
This only includes a JIT implementation for x86_64 - I don't plan to
implement JIT support myself for other architectures.
Operations
==========
This patchset adds atomic operations to the eBPF instruction set. The
use-case that motivated this work was a trivial and efficient way to
generate globally-unique cookies in BPF progs, but I think it's
obvious that these features are pretty widely applicable. The
instructions that are added here can be summarised with this list of
kernel operations:
* atomic[64]_[fetch_]add
* atomic[64]_[fetch_]and
* atomic[64]_[fetch_]or
* atomic[64]_xchg
* atomic[64]_cmpxchg
The following are left out of scope for this effort:
* 16 and 8 bit operations
* Explicit memory barriers
Encoding
========
I originally planned to add new values for bpf_insn.opcode. This was
rather unpleasant: the opcode space has holes in it but no entire
instruction classes[2]. Yonghong Song had a better idea: use the
immediate field of the existing STX XADD instruction to encode the
operation. This works nicely, without breaking existing programs,
because the immediate field is currently reserved-must-be-zero, and
extra-nicely because BPF_ADD happens to be zero.
Note that this of course makes immediate-source atomic operations
impossible. It's hard to imagine a measurable speedup from such
instructions, and if it existed it would certainly not benefit x86,
which has no support for them.
The BPF_OP opcode fields are re-used in the immediate, and an
additional flag BPF_FETCH is used to mark instructions that should
fetch a pre-modification value from memory.
So, BPF_XADD is now called BPF_ATOMIC (the old name is kept to avoid
breaking userspace builds), and where we previously had .imm = 0, we
now have .imm = BPF_ADD (which is 0).
Operands
========
Reg-source eBPF instructions only have two operands, while these
atomic operations have up to four. To avoid needing to encode
additional operands, then:
- One of the input registers is re-used as an output register
(e.g. atomic_fetch_add both reads from and writes to the source
register).
- Where necessary (i.e. for cmpxchg) , R0 is "hard-coded" as one of
the operands.
This approach also allows the new eBPF instructions to map directly
to single x86 instructions.
[1] Previous iterations:
v1: https://lore.kernel.org/bpf/20201123173202.1335708-1-jackmanb@google.com/
v2: https://lore.kernel.org/bpf/20201127175738.1085417-1-jackmanb@google.com/
v3: https://lore.kernel.org/bpf/X8kN7NA7bJC7aLQI@google.com/
v4: https://lore.kernel.org/bpf/20201207160734.2345502-1-jackmanb@google.com/
v5: https://lore.kernel.org/bpf/20201215121816.1048557-1-jackmanb@google.com/
v6: https://lore.kernel.org/bpf/20210112154235.2192781-1-jackmanb@google.com/
[2] Visualisation of eBPF opcode space:
https://gist.github.com/bjackman/00fdad2d5dfff601c1918bc29b16e778
[3] Comment from John about propagating bounds in verifier:
https://lore.kernel.org/bpf/5fcf0fbcc8aa8_9ab320853@john-XPS-13-9370.notmuch/
[4] Mail from Andrii about not supporting old Clang in selftests:
https://lore.kernel.org/bpf/CAEf4BzYBddPaEzRUs=jaWSo5kbf=LZdb7geAUVj85GxLQztuAQ@mail.gmail.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'kernel')
| -rw-r--r-- | kernel/bpf/core.c | 67 | ||||
| -rw-r--r-- | kernel/bpf/disasm.c | 43 | ||||
| -rw-r--r-- | kernel/bpf/verifier.c | 75 |
3 files changed, 154 insertions, 31 deletions
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 69c3c308de5e..5bbd4884ff7a 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1309,8 +1309,8 @@ EXPORT_SYMBOL_GPL(__bpf_call_base); INSN_3(STX, MEM, H), \ INSN_3(STX, MEM, W), \ INSN_3(STX, MEM, DW), \ - INSN_3(STX, XADD, W), \ - INSN_3(STX, XADD, DW), \ + INSN_3(STX, ATOMIC, W), \ + INSN_3(STX, ATOMIC, DW), \ /* Immediate based. */ \ INSN_3(ST, MEM, B), \ INSN_3(ST, MEM, H), \ @@ -1618,13 +1618,59 @@ out: LDX_PROBE(DW, 8) #undef LDX_PROBE - STX_XADD_W: /* lock xadd *(u32 *)(dst_reg + off16) += src_reg */ - atomic_add((u32) SRC, (atomic_t *)(unsigned long) - (DST + insn->off)); - CONT; - STX_XADD_DW: /* lock xadd *(u64 *)(dst_reg + off16) += src_reg */ - atomic64_add((u64) SRC, (atomic64_t *)(unsigned long) - (DST + insn->off)); +#define ATOMIC_ALU_OP(BOP, KOP) \ + case BOP: \ + if (BPF_SIZE(insn->code) == BPF_W) \ + atomic_##KOP((u32) SRC, (atomic_t *)(unsigned long) \ + (DST + insn->off)); \ + else \ + atomic64_##KOP((u64) SRC, (atomic64_t *)(unsigned long) \ + (DST + insn->off)); \ + break; \ + case BOP | BPF_FETCH: \ + if (BPF_SIZE(insn->code) == BPF_W) \ + SRC = (u32) atomic_fetch_##KOP( \ + (u32) SRC, \ + (atomic_t *)(unsigned long) (DST + insn->off)); \ + else \ + SRC = (u64) atomic64_fetch_##KOP( \ + (u64) SRC, \ + (atomic64_t *)(unsigned long) (DST + insn->off)); \ + break; + + STX_ATOMIC_DW: + STX_ATOMIC_W: + switch (IMM) { + ATOMIC_ALU_OP(BPF_ADD, add) + ATOMIC_ALU_OP(BPF_AND, and) + ATOMIC_ALU_OP(BPF_OR, or) + ATOMIC_ALU_OP(BPF_XOR, xor) +#undef ATOMIC_ALU_OP + + case BPF_XCHG: + if (BPF_SIZE(insn->code) == BPF_W) + SRC = (u32) atomic_xchg( + (atomic_t *)(unsigned long) (DST + insn->off), + (u32) SRC); + else + SRC = (u64) atomic64_xchg( + (atomic64_t *)(unsigned long) (DST + insn->off), + (u64) SRC); + break; + case BPF_CMPXCHG: + if (BPF_SIZE(insn->code) == BPF_W) + BPF_R0 = (u32) atomic_cmpxchg( + (atomic_t *)(unsigned long) (DST + insn->off), + (u32) BPF_R0, (u32) SRC); + else + BPF_R0 = (u64) atomic64_cmpxchg( + (atomic64_t *)(unsigned long) (DST + insn->off), + (u64) BPF_R0, (u64) SRC); + break; + + default: + goto default_label; + } CONT; default_label: @@ -1634,7 +1680,8 @@ out: * * Note, verifier whitelists all opcodes in bpf_opcode_in_insntable(). */ - pr_warn("BPF interpreter: unknown opcode %02x\n", insn->code); + pr_warn("BPF interpreter: unknown opcode %02x (imm: 0x%x)\n", + insn->code, insn->imm); BUG_ON(1); return 0; } diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c index b44d8c447afd..19ff8fed7f4b 100644 --- a/kernel/bpf/disasm.c +++ b/kernel/bpf/disasm.c @@ -80,6 +80,13 @@ const char *const bpf_alu_string[16] = { [BPF_END >> 4] = "endian", }; +static const char *const bpf_atomic_alu_string[16] = { + [BPF_ADD >> 4] = "add", + [BPF_AND >> 4] = "and", + [BPF_OR >> 4] = "or", + [BPF_XOR >> 4] = "or", +}; + static const char *const bpf_ldst_string[] = { [BPF_W >> 3] = "u32", [BPF_H >> 3] = "u16", @@ -153,14 +160,44 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs, bpf_ldst_string[BPF_SIZE(insn->code) >> 3], insn->dst_reg, insn->off, insn->src_reg); - else if (BPF_MODE(insn->code) == BPF_XADD) - verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) += r%d\n", + else if (BPF_MODE(insn->code) == BPF_ATOMIC && + (insn->imm == BPF_ADD || insn->imm == BPF_ADD || + insn->imm == BPF_OR || insn->imm == BPF_XOR)) { + verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) %s r%d\n", + insn->code, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, insn->off, + bpf_alu_string[BPF_OP(insn->imm) >> 4], + insn->src_reg); + } else if (BPF_MODE(insn->code) == BPF_ATOMIC && + (insn->imm == (BPF_ADD | BPF_FETCH) || + insn->imm == (BPF_AND | BPF_FETCH) || + insn->imm == (BPF_OR | BPF_FETCH) || + insn->imm == (BPF_XOR | BPF_FETCH))) { + verbose(cbs->private_data, "(%02x) r%d = atomic%s_fetch_%s((%s *)(r%d %+d), r%d)\n", + insn->code, insn->src_reg, + BPF_SIZE(insn->code) == BPF_DW ? "64" : "", + bpf_atomic_alu_string[BPF_OP(insn->imm) >> 4], + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, insn->off, insn->src_reg); + } else if (BPF_MODE(insn->code) == BPF_ATOMIC && + insn->imm == BPF_CMPXCHG) { + verbose(cbs->private_data, "(%02x) r0 = atomic%s_cmpxchg((%s *)(r%d %+d), r0, r%d)\n", insn->code, + BPF_SIZE(insn->code) == BPF_DW ? "64" : "", bpf_ldst_string[BPF_SIZE(insn->code) >> 3], insn->dst_reg, insn->off, insn->src_reg); - else + } else if (BPF_MODE(insn->code) == BPF_ATOMIC && + insn->imm == BPF_XCHG) { + verbose(cbs->private_data, "(%02x) r%d = atomic%s_xchg((%s *)(r%d %+d), r%d)\n", + insn->code, insn->src_reg, + BPF_SIZE(insn->code) == BPF_DW ? "64" : "", + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, insn->off, insn->src_reg); + } else { verbose(cbs->private_data, "BUG_%02x\n", insn->code); + } } else if (class == BPF_ST) { if (BPF_MODE(insn->code) != BPF_MEM) { verbose(cbs->private_data, "BUG_st_%02x\n", insn->code); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index ae2aee48cf82..0f82d5d46e2c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3604,13 +3604,30 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn return err; } -static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn) +static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn) { + int load_reg; int err; - if ((BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) || - insn->imm != 0) { - verbose(env, "BPF_XADD uses reserved fields\n"); + switch (insn->imm) { + case BPF_ADD: + case BPF_ADD | BPF_FETCH: + case BPF_AND: + case BPF_AND | BPF_FETCH: + case BPF_OR: + case BPF_OR | BPF_FETCH: + case BPF_XOR: + case BPF_XOR | BPF_FETCH: + case BPF_XCHG: + case BPF_CMPXCHG: + break; + default: + verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm); + return -EINVAL; + } + + if (BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) { + verbose(env, "invalid atomic operand size\n"); return -EINVAL; } @@ -3624,6 +3641,13 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins if (err) return err; + if (insn->imm == BPF_CMPXCHG) { + /* Check comparison of R0 with memory location */ + err = check_reg_arg(env, BPF_REG_0, SRC_OP); + if (err) + return err; + } + if (is_pointer_value(env, insn->src_reg)) { verbose(env, "R%d leaks addr into mem\n", insn->src_reg); return -EACCES; @@ -3633,21 +3657,38 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins is_pkt_reg(env, insn->dst_reg) || is_flow_key_reg(env, insn->dst_reg) || is_sk_reg(env, insn->dst_reg)) { - verbose(env, "BPF_XADD stores into R%d %s is not allowed\n", + verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n", insn->dst_reg, reg_type_str[reg_state(env, insn->dst_reg)->type]); return -EACCES; } - /* check whether atomic_add can read the memory */ + /* check whether we can read the memory */ err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, BPF_SIZE(insn->code), BPF_READ, -1, true); if (err) return err; - /* check whether atomic_add can write into the same memory */ - return check_mem_access(env, insn_idx, insn->dst_reg, insn->off, - BPF_SIZE(insn->code), BPF_WRITE, -1, true); + /* check whether we can write into the same memory */ + err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, + BPF_SIZE(insn->code), BPF_WRITE, -1, true); + if (err) + return err; + + if (!(insn->imm & BPF_FETCH)) + return 0; + + if (insn->imm == BPF_CMPXCHG) + load_reg = BPF_REG_0; + else + load_reg = insn->src_reg; + + /* check and record load of old value */ + err = check_reg_arg(env, load_reg, DST_OP); + if (err) + return err; + + return 0; } static int __check_stack_boundary(struct bpf_verifier_env *env, u32 regno, @@ -9524,14 +9565,19 @@ static int do_check(struct bpf_verifier_env *env) } else if (class == BPF_STX) { enum bpf_reg_type *prev_dst_type, dst_reg_type; - if (BPF_MODE(insn->code) == BPF_XADD) { - err = check_xadd(env, env->insn_idx, insn); + if (BPF_MODE(insn->code) == BPF_ATOMIC) { + err = check_atomic(env, env->insn_idx, insn); if (err) return err; env->insn_idx++; continue; } + if (BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0) { + verbose(env, "BPF_STX uses reserved fields\n"); + return -EINVAL; + } + /* check src1 operand */ err = check_reg_arg(env, insn->src_reg, SRC_OP); if (err) @@ -10008,13 +10054,6 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env) return -EINVAL; } - if (BPF_CLASS(insn->code) == BPF_STX && - ((BPF_MODE(insn->code) != BPF_MEM && - BPF_MODE(insn->code) != BPF_XADD) || insn->imm != 0)) { - verbose(env, "BPF_STX uses reserved fields\n"); - return -EINVAL; - } - if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) { struct bpf_insn_aux_data *aux; struct bpf_map *map; |
