aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/utsname.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-29Merge tag 'namespace-6.18-rc1' of ↵Linus Torvalds1-22/+9
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains a larger set of changes around the generic namespace infrastructure of the kernel. Each specific namespace type (net, cgroup, mnt, ...) embedds a struct ns_common which carries the reference count of the namespace and so on. We open-coded and cargo-culted so many quirks for each namespace type that it just wasn't scalable anymore. So given there's a bunch of new changes coming in that area I've started cleaning all of this up. The core change is to make it possible to correctly initialize every namespace uniformly and derive the correct initialization settings from the type of the namespace such as namespace operations, namespace type and so on. This leaves the new ns_common_init() function with a single parameter which is the specific namespace type which derives the correct parameters statically. This also means the compiler will yell as soon as someone does something remotely fishy. The ns_common_init() addition also allows us to remove ns_alloc_inum() and drops any special-casing of the initial network namespace in the network namespace initialization code that Linus complained about. Another part is reworking the reference counting. The reference counting was open-coded and copy-pasted for each namespace type even though they all followed the same rules. This also removes all open accesses to the reference count and makes it private and only uses a very small set of dedicated helpers to manipulate them just like we do for e.g., files. In addition this generalizes the mount namespace iteration infrastructure introduced a few cycles ago. As reminder, the vfs makes it possible to iterate sequentially and bidirectionally through all mount namespaces on the system or all mount namespaces that the caller holds privilege over. This allow userspace to iterate over all mounts in all mount namespaces using the listmount() and statmount() system call. Each mount namespace has a unique identifier for the lifetime of the systems that is exposed to userspace. The network namespace also has a unique identifier working exactly the same way. This extends the concept to all other namespace types. The new nstree type makes it possible to lookup namespaces purely by their identifier and to walk the namespace list sequentially and bidirectionally for all namespace types, allowing userspace to iterate through all namespaces. Looking up namespaces in the namespace tree works completely locklessly. This also means we can move the mount namespace onto the generic infrastructure and remove a bunch of code and members from struct mnt_namespace itself. There's a bunch of stuff coming on top of this in the future but for now this uses the generic namespace tree to extend a concept introduced first for pidfs a few cycles ago. For a while now we have supported pidfs file handles for pidfds. This has proven to be very useful. This extends the concept to cover namespaces as well. It is possible to encode and decode namespace file handles using the common name_to_handle_at() and open_by_handle_at() apis. As with pidfs file handles, namespace file handles are exhaustive, meaning it is not required to actually hold a reference to nsfs in able to decode aka open_by_handle_at() a namespace file handle. Instead the FD_NSFS_ROOT constant can be passed which will let the kernel grab a reference to the root of nsfs internally and thus decode the file handle. Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already. It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them trivially. Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace. The namespace file handle layout is exposed as uapi and has a stable and extensible format. For now it simply contains the namespace identifier, the namespace type, and the inode number. The stable format means that userspace may construct its own namespace file handles without going through name_to_handle_at() as they are already allowed for pidfs and cgroup file handles" * tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits) ns: drop assert ns: move ns type into struct ns_common nstree: make struct ns_tree private ns: add ns_debug() ns: simplify ns_common_init() further cgroup: add missing ns_common include ns: use inode initializer for initial namespaces selftests/namespaces: verify initial namespace inode numbers ns: rename to __ns_ref nsfs: port to ns_ref_*() helpers net: port to ns_ref_*() helpers uts: port to ns_ref_*() helpers ipv4: use check_net() net: use check_net() net-sysfs: use check_net() user: port to ns_ref_*() helpers time: port to ns_ref_*() helpers pid: port to ns_ref_*() helpers ipc: port to ns_ref_*() helpers cgroup: port to ns_ref_*() helpers ...
2025-09-25ns: move ns type into struct ns_commonChristian Brauner1-1/+0
It's misplaced in struct proc_ns_operations and ns->ops might be NULL if the namespace is compiled out but we still want to know the type of the namespace for the initial namespace struct. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-22ns: simplify ns_common_init() furtherChristian Brauner1-1/+1
Simply derive the ns operations from the namespace type. Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19ns: add ns_common_free()Christian Brauner1-1/+1
And drop ns_free_inum(). Anything common that can be wasted centrally should be wasted in the new common helper. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19nscommon: simplify initializationChristian Brauner1-1/+1
There's a lot of information that namespace implementers don't need to know about at all. Encapsulate this all in the initialization helper. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19ns: add to_<type>_ns() to respective headersChristian Brauner1-5/+0
Every namespace type has a container_of(ns, <ns_type>, ns) static inline function that is currently not exposed in the header. So we have a bunch of places that open-code it via container_of(). Move it to the headers so we can use it directly. Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19uts: support ns lookupChristian Brauner1-1/+6
Support the generic ns lookup infrastructure to support file handles for namespaces. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19uts: use ns_common_init()Christian Brauner1-14/+2
Don't cargo-cult the same thing over and over. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01copy_process: pass clone_flags as u64 across calltreeSimon Schuster1-1/+1
With the introduction of clone3 in commit 7f192e3cd316 ("fork: add clone3") the effective bit width of clone_flags on all architectures was increased from 32-bit to 64-bit, with a new type of u64 for the flags. However, for most consumers of clone_flags the interface was not changed from the previous type of unsigned long. While this works fine as long as none of the new 64-bit flag bits (CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still undesirable in terms of the principle of least surprise. Thus, this commit fixes all relevant interfaces of callees to sys_clone3/copy_process (excluding the architecture-specific copy_thread) to consistently pass clone_flags as u64, so that no truncation to 32-bit integers occurs on 32-bit architectures. Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2020-08-19uts: Use generic ns_common::countKirill Tkhai1-5/+2
Switch over uts namespaces to use the newly introduced common lifetime counter. Currently every namespace type has its own lifetime counter which is stored in the specific namespace struct. The lifetime counters are used identically for all namespaces types. Namespaces may of course have additional unrelated counters and these are not altered. This introduces a common lifetime counter into struct ns_common. The ns_common struct encompasses information that all namespaces share. That should include the lifetime counter since its common for all of them. It also allows us to unify the type of the counters across all namespaces. Most of them use refcount_t but one uses atomic_t and at least one uses kref. Especially the last one doesn't make much sense since it's just a wrapper around refcount_t since 2016 and actually complicates cleanup operations by having to use container_of() to cast the correct namespace struct out of struct ns_common. Having the lifetime counter for the namespaces in one place reduces maintenance cost. Not just because after switching all namespaces over we will have removed more code than we added but also because the logic is more easily understandable and we indicate to the user that the basic lifetime requirements for all namespaces are currently identical. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/159644978167.604812.1773586504374412107.stgit@localhost.localdomain Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-05-09nsproxy: add struct nssetChristian Brauner1-2/+3
Add a simple struct nsset. It holds all necessary pieces to switch to a new set of namespaces without leaving a task in a half-switched state which we will make use of in the next patch. This patch switches the existing setns logic over without causing a change in setns() behavior. This brings setns() closer to how unshare() works(). The prepare_ns() function is responsible to prepare all necessary information. This has two reasons. First it minimizes dependencies between individual namespaces, i.e. all install handler can expect that all fields are properly initialized independent in what order they are called in. Second, this makes the code easier to maintain and easier to follow if it needs to be changed. The prepare_ns() helper will only be switched over to use a flags argument in the next patch. Here it will still use nstype as a simple integer argument which was argued would be clearer. I'm not particularly opinionated about this if it really helps or not. The struct nsset itself already contains the flags field since its name already indicates that it can contain information required by different namespaces. None of this should have functional consequences. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 315 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-11uts: create "struct uts_namespace" from kmem_cacheAlexey Dobriyan1-4/+16
So "struct uts_namespace" can enjoy fine-grained SLAB debugging and usercopy protection. I'd prefer shorter name "utsns" but there is "user_namespace" already. Link: http://lkml.kernel.org/r/20180228215158.GA23146@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-02sched/headers: Prepare to move the task_lock()/unlock() APIs to ↵Ingo Molnar1-0/+1
<linux/sched/task.h> But first update the code that uses these facilities with the new header. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>Ingo Molnar1-0/+1
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22Merge branch 'nsfs-ioctls' into HEADEric W. Biederman1-0/+6
From: Andrey Vagin <avagin@openvz.org> Each namespace has an owning user namespace and now there is not way to discover these relationships. Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships too. Why we may want to know relationships between namespaces? One use would be visualization, in order to understand the running system. Another would be to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? One more use-case (which usually called abnormal) is checkpoint/restart. In CRIU we are going to dump and restore nested namespaces. There [1] was a discussion about which interface to choose to determing relationships between namespaces. Eric suggested to add two ioctl-s [2]: > Grumble, Grumble. I think this may actually a case for creating ioctls > for these two cases. Now that random nsfs file descriptors are bind > mountable the original reason for using proc files is not as pressing. > > One ioctl for the user namespace that owns a file descriptor. > One ioctl for the parent namespace of a namespace file descriptor. Here is an implementaions of these ioctl-s. $ man man7/namespaces.7 ... Since Linux 4.X, the following ioctl(2) calls are supported for namespace file descriptors. The correct syntax is: fd = ioctl(ns_fd, ioctl_type); where ioctl_type is one of the following: NS_GET_USERNS Returns a file descriptor that refers to an owning user names‐ pace. NS_GET_PARENT Returns a file descriptor that refers to a parent namespace. This ioctl(2) can be used for pid and user namespaces. For user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same meaning. In addition to generic ioctl(2) errors, the following specific ones can occur: EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. EPERM The requested namespace is outside of the current namespace scope. [1] https://lkml.org/lkml/2016/7/6/158 [2] https://lkml.org/lkml/2016/7/9/101 Changes for v2: * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing outside of the init namespace, so we can return EPERM in this case too. > The fewer special cases the easier the code is to get > correct, and the easier it is to read. // Eric Changes for v3: * rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: "W. Trevor King" <wking@tremily.us> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@canonical.com>
2016-09-22kernel: add a helper to get an owning user namespace for a namespaceAndrey Vagin1-0/+6
Return -EPERM if an owning user namespace is outside of a process current user namespace. v2: In a first version ns_get_owner returned ENOENT for init_user_ns. This special cases was removed from this version. There is nothing outside of init_user_ns, so we can return EPERM. v3: rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22userns: When the per user per user namespace limit is reached return ENOSPCEric W. Biederman1-1/+1
The current error codes returned when a the per user per user namespace limit are hit (EINVAL, EUSERS, and ENFILE) are wrong. I asked for advice on linux-api and it we made clear that those were the wrong error code, but a correct effor code was not suggested. The best general error code I have found for hitting a resource limit is ENOSPC. It is not perfect but as it is unambiguous it will serve until someone comes up with a better error code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-08-08utsns: Add a limit on the number of uts namespacesEric W. Biederman1-5/+29
Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-12-04copy address of proc_ns_ops into ns_commonAl Viro1-0/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04new helpers: ns_alloc_inum/ns_free_inumAl Viro1-2/+2
take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04make proc_ns_operations work with struct ns_common * instead of void *Al Viro1-9/+3
We can do that now. And kill ->inum(), while we are at it - all instances are identical. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04switch the rest of proc_ns_operations to working with &...->nsAl Viro1-6/+9
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-04common object embedded into various struct ....nsAl Viro1-3/+3
for now - just move corresponding ->proc_inum instances over there Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-07-29namespaces: Use task_lock and not rcu to protect nsproxyEric W. Biederman1-3/+3
The synchronous syncrhonize_rcu in switch_task_namespaces makes setns a sufficiently expensive system call that people have complained. Upon inspect nsproxy no longer needs rcu protection for remote reads. remote reads are rare. So optimize for same process reads and write by switching using rask_lock instead. This yields a simpler to understand lock, and a faster setns system call. In particular this fixes a performance regression observed by Rafael David Tinoco <rafael.tinoco@canonical.com>. This is effectively a revert of Pavel Emelyanov's commit cf7b708c8d1d7a27736771bcf4c457b332b0f818 Make access to task's nsproxy lighter from 2007. The race this originialy fixed no longer exists as do_notify_parent uses task_active_pid_ns(parent) instead of parent->nsproxy. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-08-30userns: Kill nsown_capable it makes the wrong thing easyEric W. Biederman1-1/+1
nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and CAP_SETGID. For the existing users it doesn't noticably simplify things and from the suggested patches I have seen it encourages people to do the wrong thing. So remove nsown_capable. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-05-01proc: Split the namespace stuff out into linux/proc_ns.hDavid Howells1-1/+1
Split the proc namespace stuff out into linux/proc_ns.h. Signed-off-by: David Howells <dhowells@redhat.com> cc: netdev@vger.kernel.org cc: Serge E. Hallyn <serge.hallyn@ubuntu.com> cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-27kernel/utsname.c: fix wrong comment about clone_uts_ns()Yuanhan Liu1-1/+1
Fix the wrong comment about the return value of clone_uts_ns() Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-14userns: Require CAP_SYS_ADMIN for most uses of setns.Eric W. Biederman1-1/+2
Andy Lutomirski <luto@amacapital.net> found a nasty little bug in the permissions of setns. With unprivileged user namespaces it became possible to create new namespaces without privilege. However the setns calls were relaxed to only require CAP_SYS_ADMIN in the user nameapce of the targed namespace. Which made the following nasty sequence possible. pid = clone(CLONE_NEWUSER | CLONE_NEWNS); if (pid == 0) { /* child */ system("mount --bind /home/me/passwd /etc/passwd"); } else if (pid != 0) { /* parent */ char path[PATH_MAX]; snprintf(path, sizeof(path), "/proc/%u/ns/mnt"); fd = open(path, O_RDONLY); setns(fd, 0); system("su -"); } Prevent this possibility by requiring CAP_SYS_ADMIN in the current user namespace when joing all but the user namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20proc: Usable inode numbers for the namespace file descriptors.Eric W. Biederman1-1/+16
Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20userns: Make create_new_namespaces take a user_ns parameterEric W. Biederman1-5/+4
Modify create_new_namespaces to explicitly take a user namespace parameter, instead of implicitly through the task_struct. This allows an implementation of unshare(CLONE_NEWUSER) where the new user namespace is not stored onto the current task_struct until after all of the namespaces are created. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20userns: Allow unprivileged use of setns.Eric W. Biederman1-1/+6
- Push the permission check from the core setns syscall into the setns install methods where the user namespace of the target namespace can be determined, and used in a ns_capable call. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-04-07userns: Use cred->user_ns instead of cred->user->user_nsEric W. Biederman1-1/+1
Optimize performance and prepare for the removal of the user_ns reference from user_struct. Remove the slow long walk through cred->user->user_ns and instead go straight to cred->user_ns. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-10-31kernel: Map most files to use export.h instead of module.hPaul Gortmaker1-1/+1
The changed files were only including linux/module.h for the EXPORT_SYMBOL infrastructure, and nothing else. Revector them onto the isolated export header for faster compile times. Nothing to see here but a whole lot of instances of: -#include <linux/module.h> +#include <linux/export.h> This commit is only changing the kernel dir; next targets will probably be mm, fs, the arch dirs, etc. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-05-10ns proc: Add support for the uts namespaceEric W. Biederman1-0/+39
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-03-23userns: allow sethostname in a containerSerge E. Hallyn1-5/+7
Changelog: Feb 23: let clone_uts_ns() handle setting uts->user_ns To do so we need to pass in the task_struct who'll get the utsname, so we can get its user_ns. Feb 23: As per Oleg's coment, just pass in tsk, instead of two of its members. Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23userns: add a user_namespace as creator/owner of uts_namespaceSerge E. Hallyn1-0/+4
The expected course of development for user namespaces targeted capabilities is laid out at https://wiki.ubuntu.com/UserNamespace. Goals: - Make it safe for an unprivileged user to unshare namespaces. They will be privileged with respect to the new namespace, but this should only include resources which the unprivileged user already owns. - Provide separate limits and accounting for userids in different namespaces. Status: Currently (as of 2.6.38) you can clone with the CLONE_NEWUSER flag to get a new user namespace if you have the CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID capabilities. What this gets you is a whole new set of userids, meaning that user 500 will have a different 'struct user' in your namespace than in other namespaces. So any accounting information stored in struct user will be unique to your namespace. However, throughout the kernel there are checks which - simply check for a capability. Since root in a child namespace has all capabilities, this means that a child namespace is not constrained. - simply compare uid1 == uid2. Since these are the integer uids, uid 500 in namespace 1 will be said to be equal to uid 500 in namespace 2. As a result, the lxc implementation at lxc.sf.net does not use user namespaces. This is actually helpful because it leaves us free to develop user namespaces in such a way that, for some time, user namespaces may be unuseful. Bugs aside, this patchset is supposed to not at all affect systems which are not actively using user namespaces, and only restrict what tasks in child user namespace can do. They begin to limit privilege to a user namespace, so that root in a container cannot kill or ptrace tasks in the parent user namespace, and can only get world access rights to files. Since all files currently belong to the initila user namespace, that means that child user namespaces can only get world access rights to *all* files. While this temporarily makes user namespaces bad for system containers, it starts to get useful for some sandboxing. I've run the 'runltplite.sh' with and without this patchset and found no difference. This patch: copy_process() handles CLONE_NEWUSER before the rest of the namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS) the new uts namespace will have the new user namespace as its owner. That is what we want, since we want root in that new userns to be able to have privilege over it. Changelog: Feb 15: don't set uts_ns->user_ns if we didn't create a new uts_ns. Feb 23: Move extern init_user_ns declaration from init/version.c to utsname.h. Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18utsns: extract creeate_uts_ns()Alexey Dobriyan1-2/+11
create_uts_ns() will be used by C/R to create fresh uts_ns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23removed unused #include <linux/version.h>'sAdrian Bunk1-1/+0
This patch lets the files using linux/version.h match the files that #include it. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29kernel: explicitly include required header files under kernel/Robert P. J. Day1-0/+1
Following an experimental deletion of the unnecessary directive #include <linux/slab.h> from the header file <linux/percpu.h>, these files under kernel/ were exposed as needing to include one of <linux/slab.h> or <linux/gfp.h>, so explicit includes were added where necessary. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19Fix UTS corruption during clone(CLONE_NEWUTS)Alexey Dobriyan1-0/+2
struct utsname is copied from master one without any exclusion. Here is sample output from one proggie doing sethostname("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"); sethostname("bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"); and another clone(,, CLONE_NEWUTS, ...) uname() hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaabbbbb' hostname = 'bbbaaaaaaaaaaaaaaaaaaaaaaaaaaa' hostname = 'aaaaaaaabbbbbbbbbbbbbbbbbbbbbb' hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaaabbbb' hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaabb' hostname = 'aaabbbbbbbbbbbbbbbbbbbbbbbbbbb' hostname = 'bbbbbbbbbbbbbbbbaaaaaaaaaaaaaa' Hostname is sometimes corrupted. Yes, even _the_ simplest namespace activity had bug in it. :-( Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16namespace: ensure clone_flags are always stored in an unsigned longEric W. Biederman1-1/+1
While working on unshare support for the network namespace I noticed we were putting clone flags in an int. Which is weird because the syscall uses unsigned long and we at least need an unsigned to properly hold all of the unshare flags. So to make the code consistent, this patch updates the code to use unsigned long instead of int for the clone flags in those places where we get it wrong today. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16fix create_new_namespaces() return valueCedric Le Goater1-4/+6
dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong, create_new_namespaces() uses ERR_PTR() to catch an error. This means that the subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or copy_utsname(). Modify create_new_namespaces() to also use the errors returned by the copy_*_ns routines and not to systematically return ENOMEM. [oleg@tv-sign.ru: better changelog] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Merge sys_clone()/sys_unshare() nsproxy and namespace handlingBadari Pulavarty1-37/+4
sys_clone() and sys_unshare() both makes copies of nsproxy and its associated namespaces. But they have different code paths. This patch merges all the nsproxy and its associated namespace copy/clone handling (as much as possible). Posted on container list earlier for feedback. - Create a new nsproxy and its associated namespaces and pass it back to caller to attach it to right process. - Changed all copy_*_ns() routines to return a new copy of namespace instead of attaching it to task->nsproxy. - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines. - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON() just incase. - Get rid of all individual unshare_*_ns() routines and make use of copy_*_ns() instead. [akpm@osdl.org: cleanups, warning fix] [clg@fr.ibm.com: remove dup_namespaces() declaration] [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval] [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-10-02[PATCH] namespaces: utsname: implement CLONE_NEWUTS flagSerge E. Hallyn1-0/+53
Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare. [clg@fr.ibm.com: IPC unshare fix] [bunk@stusta.de: cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: implement utsname namespacesSerge E. Hallyn1-0/+42
This patch defines the uts namespace and some manipulators. Adds the uts namespace to task_struct, and initializes a system-wide init namespace. It leaves a #define for system_utsname so sysctl will compile. This define will be removed in a separate patch. [akpm@osdl.org: build fix, cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>