aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
diff options
context:
space:
mode:
authorAlexei Starovoitov <ast@kernel.org>2021-01-14 18:34:30 -0800
committerAlexei Starovoitov <ast@kernel.org>2021-01-14 18:34:30 -0800
commit7064a7341a0d2fcfeff56be7e3917421fbb8b024 (patch)
treecb057d167ff8292f09cd4a4e78f4bd610a6fa959 /kernel
parenttools/bpftool: Add -Wall when building BPF programs (diff)
parentbpf: Document new atomic instructions (diff)
downloadlinux-7064a7341a0d2fcfeff56be7e3917421fbb8b024.tar.gz
linux-7064a7341a0d2fcfeff56be7e3917421fbb8b024.zip
Merge branch 'Atomics for eBPF'
Brendan Jackman says: ==================== There's still one unresolved review comment from John[3] which I will resolve with a followup patch. Differences from v6->v7 [1]: * Fixed riscv build error detected by 0-day robot. Differences from v5->v6 [1]: * Carried Björn Töpel's ack for RISC-V code, plus a couple more acks from Yonhgong. * Doc fixups. * Trivial cleanups. Differences from v4->v5 [1]: * Fixed bogus type casts in interpreter that led to warnings from the 0day robot. * Dropped feature-detection for Clang per Andrii's suggestion in [4]. The selftests will now fail to build unless you have llvm-project commit 286daafd6512. The ENABLE_ATOMICS_TEST macro is still needed to support the no_alu32 tests. * Carried some Acks from John and Yonghong. * Dropped confusing usage of __atomic_exchange from prog_test in favour of __sync_lock_test_and_set. * [Really] got rid of all the forest of instruction macros (BPF_ATOMIC_FETCH_ADD and friends); now there's just BPF_ATOMIC_OP to define all the instructions as we use them in the verifier tests. This makes the atomic ops less special in that API, and I don't think the resulting usage is actually any harder to read. Differences from v3->v4 [1]: * Added one Ack from Yonghong. He acked some other patches but those have now changed non-trivally so I didn't add those acks. * Fixups to commit messages. * Fixed disassembly and comments: first arg to atomic_fetch_* is a pointer. * Improved prog_test efficiency. BPF progs are now all loaded in a single call, then the skeleton is re-used for each subtest. * Dropped use of tools/build/feature in favour of a one-liner in the Makefile. * Dropped the commit that created an emit_neg helper in the x86 JIT. It's not used any more (it wasn't used in v3 either). * Combined all the different filter.h macros (used to be BPF_ATOMIC_ADD, BPF_ATOMIC_FETCH_ADD, BPF_ATOMIC_AND, etc) into just BPF_ATOMIC32 and BPF_ATOMIC64. * Removed some references to BPF_STX_XADD from tools/, samples/ and lib/ that I missed before. Differences from v2->v3 [1]: * More minor fixes and naming/comment changes * Dropped atomic subtract: compilers can implement this by preceding an atomic add with a NEG instruction (which is what the x86 JIT did under the hood anyway). * Dropped the use of -mcpu=v4 in the Clang BPF command-line; there is no longer an architecture version bump. Instead a feature test is added to Kbuild - it builds a source file to check if Clang supports BPF atomics. * Fixed the prog_test so it no longer breaks test_progs-no_alu32. This requires some ifdef acrobatics to avoid complicating the prog_tests model where the same userspace code exercises both the normal and no_alu32 BPF test objects, using the same skeleton header. Differences from v1->v2 [1]: * Fixed mistakes in the netronome driver * Addd sub, add, or, xor operations * The above led to some refactors to keep things readable. (Maybe I should have just waited until I'd implemented these before starting the review...) * Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which include the BPF_FETCH flag * Added a bit of documentation. Suggestions welcome for more places to dump this info... The prog_test that's added depends on Clang/LLVM features added by Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184). This only includes a JIT implementation for x86_64 - I don't plan to implement JIT support myself for other architectures. Operations ========== This patchset adds atomic operations to the eBPF instruction set. The use-case that motivated this work was a trivial and efficient way to generate globally-unique cookies in BPF progs, but I think it's obvious that these features are pretty widely applicable. The instructions that are added here can be summarised with this list of kernel operations: * atomic[64]_[fetch_]add * atomic[64]_[fetch_]and * atomic[64]_[fetch_]or * atomic[64]_xchg * atomic[64]_cmpxchg The following are left out of scope for this effort: * 16 and 8 bit operations * Explicit memory barriers Encoding ======== I originally planned to add new values for bpf_insn.opcode. This was rather unpleasant: the opcode space has holes in it but no entire instruction classes[2]. Yonghong Song had a better idea: use the immediate field of the existing STX XADD instruction to encode the operation. This works nicely, without breaking existing programs, because the immediate field is currently reserved-must-be-zero, and extra-nicely because BPF_ADD happens to be zero. Note that this of course makes immediate-source atomic operations impossible. It's hard to imagine a measurable speedup from such instructions, and if it existed it would certainly not benefit x86, which has no support for them. The BPF_OP opcode fields are re-used in the immediate, and an additional flag BPF_FETCH is used to mark instructions that should fetch a pre-modification value from memory. So, BPF_XADD is now called BPF_ATOMIC (the old name is kept to avoid breaking userspace builds), and where we previously had .imm = 0, we now have .imm = BPF_ADD (which is 0). Operands ======== Reg-source eBPF instructions only have two operands, while these atomic operations have up to four. To avoid needing to encode additional operands, then: - One of the input registers is re-used as an output register (e.g. atomic_fetch_add both reads from and writes to the source register). - Where necessary (i.e. for cmpxchg) , R0 is "hard-coded" as one of the operands. This approach also allows the new eBPF instructions to map directly to single x86 instructions. [1] Previous iterations: v1: https://lore.kernel.org/bpf/20201123173202.1335708-1-jackmanb@google.com/ v2: https://lore.kernel.org/bpf/20201127175738.1085417-1-jackmanb@google.com/ v3: https://lore.kernel.org/bpf/X8kN7NA7bJC7aLQI@google.com/ v4: https://lore.kernel.org/bpf/20201207160734.2345502-1-jackmanb@google.com/ v5: https://lore.kernel.org/bpf/20201215121816.1048557-1-jackmanb@google.com/ v6: https://lore.kernel.org/bpf/20210112154235.2192781-1-jackmanb@google.com/ [2] Visualisation of eBPF opcode space: https://gist.github.com/bjackman/00fdad2d5dfff601c1918bc29b16e778 [3] Comment from John about propagating bounds in verifier: https://lore.kernel.org/bpf/5fcf0fbcc8aa8_9ab320853@john-XPS-13-9370.notmuch/ [4] Mail from Andrii about not supporting old Clang in selftests: https://lore.kernel.org/bpf/CAEf4BzYBddPaEzRUs=jaWSo5kbf=LZdb7geAUVj85GxLQztuAQ@mail.gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'kernel')
-rw-r--r--kernel/bpf/core.c67
-rw-r--r--kernel/bpf/disasm.c43
-rw-r--r--kernel/bpf/verifier.c75
3 files changed, 154 insertions, 31 deletions
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 69c3c308de5e..5bbd4884ff7a 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1309,8 +1309,8 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
INSN_3(STX, MEM, H), \
INSN_3(STX, MEM, W), \
INSN_3(STX, MEM, DW), \
- INSN_3(STX, XADD, W), \
- INSN_3(STX, XADD, DW), \
+ INSN_3(STX, ATOMIC, W), \
+ INSN_3(STX, ATOMIC, DW), \
/* Immediate based. */ \
INSN_3(ST, MEM, B), \
INSN_3(ST, MEM, H), \
@@ -1618,13 +1618,59 @@ out:
LDX_PROBE(DW, 8)
#undef LDX_PROBE
- STX_XADD_W: /* lock xadd *(u32 *)(dst_reg + off16) += src_reg */
- atomic_add((u32) SRC, (atomic_t *)(unsigned long)
- (DST + insn->off));
- CONT;
- STX_XADD_DW: /* lock xadd *(u64 *)(dst_reg + off16) += src_reg */
- atomic64_add((u64) SRC, (atomic64_t *)(unsigned long)
- (DST + insn->off));
+#define ATOMIC_ALU_OP(BOP, KOP) \
+ case BOP: \
+ if (BPF_SIZE(insn->code) == BPF_W) \
+ atomic_##KOP((u32) SRC, (atomic_t *)(unsigned long) \
+ (DST + insn->off)); \
+ else \
+ atomic64_##KOP((u64) SRC, (atomic64_t *)(unsigned long) \
+ (DST + insn->off)); \
+ break; \
+ case BOP | BPF_FETCH: \
+ if (BPF_SIZE(insn->code) == BPF_W) \
+ SRC = (u32) atomic_fetch_##KOP( \
+ (u32) SRC, \
+ (atomic_t *)(unsigned long) (DST + insn->off)); \
+ else \
+ SRC = (u64) atomic64_fetch_##KOP( \
+ (u64) SRC, \
+ (atomic64_t *)(unsigned long) (DST + insn->off)); \
+ break;
+
+ STX_ATOMIC_DW:
+ STX_ATOMIC_W:
+ switch (IMM) {
+ ATOMIC_ALU_OP(BPF_ADD, add)
+ ATOMIC_ALU_OP(BPF_AND, and)
+ ATOMIC_ALU_OP(BPF_OR, or)
+ ATOMIC_ALU_OP(BPF_XOR, xor)
+#undef ATOMIC_ALU_OP
+
+ case BPF_XCHG:
+ if (BPF_SIZE(insn->code) == BPF_W)
+ SRC = (u32) atomic_xchg(
+ (atomic_t *)(unsigned long) (DST + insn->off),
+ (u32) SRC);
+ else
+ SRC = (u64) atomic64_xchg(
+ (atomic64_t *)(unsigned long) (DST + insn->off),
+ (u64) SRC);
+ break;
+ case BPF_CMPXCHG:
+ if (BPF_SIZE(insn->code) == BPF_W)
+ BPF_R0 = (u32) atomic_cmpxchg(
+ (atomic_t *)(unsigned long) (DST + insn->off),
+ (u32) BPF_R0, (u32) SRC);
+ else
+ BPF_R0 = (u64) atomic64_cmpxchg(
+ (atomic64_t *)(unsigned long) (DST + insn->off),
+ (u64) BPF_R0, (u64) SRC);
+ break;
+
+ default:
+ goto default_label;
+ }
CONT;
default_label:
@@ -1634,7 +1680,8 @@ out:
*
* Note, verifier whitelists all opcodes in bpf_opcode_in_insntable().
*/
- pr_warn("BPF interpreter: unknown opcode %02x\n", insn->code);
+ pr_warn("BPF interpreter: unknown opcode %02x (imm: 0x%x)\n",
+ insn->code, insn->imm);
BUG_ON(1);
return 0;
}
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index b44d8c447afd..19ff8fed7f4b 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -80,6 +80,13 @@ const char *const bpf_alu_string[16] = {
[BPF_END >> 4] = "endian",
};
+static const char *const bpf_atomic_alu_string[16] = {
+ [BPF_ADD >> 4] = "add",
+ [BPF_AND >> 4] = "and",
+ [BPF_OR >> 4] = "or",
+ [BPF_XOR >> 4] = "or",
+};
+
static const char *const bpf_ldst_string[] = {
[BPF_W >> 3] = "u32",
[BPF_H >> 3] = "u16",
@@ -153,14 +160,44 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg,
insn->off, insn->src_reg);
- else if (BPF_MODE(insn->code) == BPF_XADD)
- verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) += r%d\n",
+ else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ (insn->imm == BPF_ADD || insn->imm == BPF_ADD ||
+ insn->imm == BPF_OR || insn->imm == BPF_XOR)) {
+ verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) %s r%d\n",
+ insn->code,
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg, insn->off,
+ bpf_alu_string[BPF_OP(insn->imm) >> 4],
+ insn->src_reg);
+ } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ (insn->imm == (BPF_ADD | BPF_FETCH) ||
+ insn->imm == (BPF_AND | BPF_FETCH) ||
+ insn->imm == (BPF_OR | BPF_FETCH) ||
+ insn->imm == (BPF_XOR | BPF_FETCH))) {
+ verbose(cbs->private_data, "(%02x) r%d = atomic%s_fetch_%s((%s *)(r%d %+d), r%d)\n",
+ insn->code, insn->src_reg,
+ BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
+ bpf_atomic_alu_string[BPF_OP(insn->imm) >> 4],
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg, insn->off, insn->src_reg);
+ } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ insn->imm == BPF_CMPXCHG) {
+ verbose(cbs->private_data, "(%02x) r0 = atomic%s_cmpxchg((%s *)(r%d %+d), r0, r%d)\n",
insn->code,
+ BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg, insn->off,
insn->src_reg);
- else
+ } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ insn->imm == BPF_XCHG) {
+ verbose(cbs->private_data, "(%02x) r%d = atomic%s_xchg((%s *)(r%d %+d), r%d)\n",
+ insn->code, insn->src_reg,
+ BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg, insn->off, insn->src_reg);
+ } else {
verbose(cbs->private_data, "BUG_%02x\n", insn->code);
+ }
} else if (class == BPF_ST) {
if (BPF_MODE(insn->code) != BPF_MEM) {
verbose(cbs->private_data, "BUG_st_%02x\n", insn->code);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ae2aee48cf82..0f82d5d46e2c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3604,13 +3604,30 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
return err;
}
-static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
+static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
{
+ int load_reg;
int err;
- if ((BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) ||
- insn->imm != 0) {
- verbose(env, "BPF_XADD uses reserved fields\n");
+ switch (insn->imm) {
+ case BPF_ADD:
+ case BPF_ADD | BPF_FETCH:
+ case BPF_AND:
+ case BPF_AND | BPF_FETCH:
+ case BPF_OR:
+ case BPF_OR | BPF_FETCH:
+ case BPF_XOR:
+ case BPF_XOR | BPF_FETCH:
+ case BPF_XCHG:
+ case BPF_CMPXCHG:
+ break;
+ default:
+ verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm);
+ return -EINVAL;
+ }
+
+ if (BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) {
+ verbose(env, "invalid atomic operand size\n");
return -EINVAL;
}
@@ -3624,6 +3641,13 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins
if (err)
return err;
+ if (insn->imm == BPF_CMPXCHG) {
+ /* Check comparison of R0 with memory location */
+ err = check_reg_arg(env, BPF_REG_0, SRC_OP);
+ if (err)
+ return err;
+ }
+
if (is_pointer_value(env, insn->src_reg)) {
verbose(env, "R%d leaks addr into mem\n", insn->src_reg);
return -EACCES;
@@ -3633,21 +3657,38 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins
is_pkt_reg(env, insn->dst_reg) ||
is_flow_key_reg(env, insn->dst_reg) ||
is_sk_reg(env, insn->dst_reg)) {
- verbose(env, "BPF_XADD stores into R%d %s is not allowed\n",
+ verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
insn->dst_reg,
reg_type_str[reg_state(env, insn->dst_reg)->type]);
return -EACCES;
}
- /* check whether atomic_add can read the memory */
+ /* check whether we can read the memory */
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
BPF_SIZE(insn->code), BPF_READ, -1, true);
if (err)
return err;
- /* check whether atomic_add can write into the same memory */
- return check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
- BPF_SIZE(insn->code), BPF_WRITE, -1, true);
+ /* check whether we can write into the same memory */
+ err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
+ BPF_SIZE(insn->code), BPF_WRITE, -1, true);
+ if (err)
+ return err;
+
+ if (!(insn->imm & BPF_FETCH))
+ return 0;
+
+ if (insn->imm == BPF_CMPXCHG)
+ load_reg = BPF_REG_0;
+ else
+ load_reg = insn->src_reg;
+
+ /* check and record load of old value */
+ err = check_reg_arg(env, load_reg, DST_OP);
+ if (err)
+ return err;
+
+ return 0;
}
static int __check_stack_boundary(struct bpf_verifier_env *env, u32 regno,
@@ -9524,14 +9565,19 @@ static int do_check(struct bpf_verifier_env *env)
} else if (class == BPF_STX) {
enum bpf_reg_type *prev_dst_type, dst_reg_type;
- if (BPF_MODE(insn->code) == BPF_XADD) {
- err = check_xadd(env, env->insn_idx, insn);
+ if (BPF_MODE(insn->code) == BPF_ATOMIC) {
+ err = check_atomic(env, env->insn_idx, insn);
if (err)
return err;
env->insn_idx++;
continue;
}
+ if (BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0) {
+ verbose(env, "BPF_STX uses reserved fields\n");
+ return -EINVAL;
+ }
+
/* check src1 operand */
err = check_reg_arg(env, insn->src_reg, SRC_OP);
if (err)
@@ -10008,13 +10054,6 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
return -EINVAL;
}
- if (BPF_CLASS(insn->code) == BPF_STX &&
- ((BPF_MODE(insn->code) != BPF_MEM &&
- BPF_MODE(insn->code) != BPF_XADD) || insn->imm != 0)) {
- verbose(env, "BPF_STX uses reserved fields\n");
- return -EINVAL;
- }
-
if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
struct bpf_insn_aux_data *aux;
struct bpf_map *map;