简体   繁体   中英

Cannot open uid_map for writing from an app with cap_setuid capability set

While toying around with an example from user_namespaces(7) , I've come across a strange behaviour.

What the application does

The application user-ns-ex calls clone(2) with CLONE_NEWUSER, thus creating a new process in a new user namespace. The parent process writes a map ( 0 1000 1 ) to /proc//uid_map file and tells (via a pipe) the child that it can proceed. The child process then execs bash .

I've copied the source code here .

The problem

The application opens /proc//uid_map for writing if I either set it no capabilites or all of them.

When I set only set_capuid,set_capgid and optionally cap_sys_admin the call to open(2) fails:

Set caps:

arksnote linux-namespaces   # setcap 'cap_setuid,cap_setgid,cap_sys_admin=epi' ./user-ns-ex
arksnote linux-namespaces   # getcap ./user-ns-ex
./user-ns-ex = cap_setgid,cap_setuid,cap_sys_admin+eip

Try to run:

kamyshev@arksnote ~/workspace/personal/linux-kernel/linux-namespaces  $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19666
ERROR: open /proc/19666/uid_map: Permission denied
About to exec bash

And now a successfull case:

No capabilities:

arksnote linux-namespaces   # setcap '=' ./user-ns-ex
arksnote linux-namespaces   # getcap ./user-ns-ex
./user-ns-ex =

Runs Ok:

 kamyshev@arksnote ~/workspace/personal/linux-kernel/linux-namespaces  $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19557
About to exec bash
arksnote linux-namespaces   # exit

I've been trying to find the reason in man-pages and playing with different capabilities but with no luck as of this moment. What puzzles me the most, is that the application runs with less capabilities and does not with more.

Can someone help me and clarify the issue?

The research

I have found the reason. During my reasearch I have found that uid_map file is not open because its ownership is changed to root .

Unprivileged process, no capabilities:

parent(m): capabilities: '='
parent(m): file /proc/4644/uid_map owner uid: 1000
parent(m): file /proc/4644/uid_map owner gid: 1000

Unprivileged process, capabilities are set (cap_setuid=pe):

parent(m): capabilities: '= cap_setuid+ep'
parent(m): file /proc/4644/uid_map owner uid: 0
parent(m): file /proc/4644/uid_map owner gid: 0
ERROR: open /proc/4668/uid_map: Permission denied

The following research has led me to this topic: what causes proc pid resources to become owned by root?

The rules on "dumpable" flag

This is what happens:

1) When a process is not dumpable, its /proc/<pid> inodes are given a root ownership:

// linux/base.c

struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *task)
...
        if (task_dumpable(task)) {
                rcu_read_lock();
                cred = __task_cred(task);
                inode->i_uid = cred->euid;
                inode->i_gid = cred->egid;
                rcu_read_unlock();
        }

2) The process is dumpable only when its "dumpable" attribute has a value 1 (SUID_DUMP_USER). See ptrace(2) .

3) prctl(2) clears the situation further:

  Normally, this flag is set to 1. However, it is reset to the current value contained in the file /proc/sys/fs/suid_dumpable (which by default has the value 0), in the following circumstances: * The process's effective user or group ID is changed. * The process's filesystem user or group ID is changed (see credentials(7)). * The process executes (execve(2)) a set-user-ID or set- group-ID program, resulting in a change of either the effective user ID or the effective group ID. * The process executes (execve(2)) a program that has file capabilities (see capabilities(7)), but only if the permitted capabilities gained exceed those already permitted for the process. 

Thus my problem arose from the last of the above rules:

int commit_creds(struct cred *new)
<...> 
    /* dumpability changes */
    if (!uid_eq(old->euid, new->euid) ||
        !gid_eq(old->egid, new->egid) ||
        !uid_eq(old->fsuid, new->fsuid) ||
        !gid_eq(old->fsgid, new->fsgid) ||
        !cred_cap_issubset(old, new)) {
            if (task->mm)
                    set_dumpable(task->mm, suid_dumpable);

Fixes

There are a number of ways to overcome the issue:

  1. Globally change /proc/sys/fs/suid_dumpable :

echo 1 > /proc/sys/fs/suid_dumpable

  1. Set "dumpable" flag just for the process:

prctl(PR_SET_DUMPABLE, 1, 0, 0, 0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM