minibase/doc/gidcaps.txt at 0.6.2 · arsv/minibase

209 lines (153 loc) · 8.62 KB
Groups IDs as userspace capabilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Talking to certain kernel subsystems is only possible if the calling
process has the right capability bit set, as defined by capabilities(7).
For instance, reconfiguring kernel IP stack requires CAP_NET_ADMIN.
What if the subsystem happens to reside in the user space?
Take for instance a high-level network configuration manager (wimon, connman,
NetworkManager). Makes sense to allow non-privileged users to issue commands
to the manager. But there should be at least some restrictions. Not all users,
not in all cases. In particular, it is highly desirable to distinguish
between the human operator and the stuff running on his behalf, like say
Steam or Chromium.
In modern (and not so modern) Linux systems this problem is usually solved
with tools like sudo, which is fundamentally very different from capability
based approach the kernel takes. There is also dbus with its own set of issues.
And there is a simple elegant solution to this problem that does not involve
any tools and only relies on the basic kernel interfaces: using auxiliary
group ids as userspace capability tokens. If implemented, it would provide
reliable, flexible IPC permission control. However, current kernels lack
two relatively minor functions crucial for this approach to work well, and
cripples it so much it's not viable compared even to sudo.
So this is an outline of how the solution would work, which changes are
necessary for it to work, and why.
How it should work
~~~~~~~~~~~~~~~~~~
On login, the top spawned user process gets several auxiliary gids
representing user's capabilities. Whenever the whenever a process
tries to contact a service that requires certain capabilities, its
current auxiliary gids are checked against service's requirements.
For instance, having "network" group among gids means the process
can ask high-level network configuration manager to switch networks,
but cannot directly configure the kernel IP stack (which would require
CAP_NET_ADMIN).
Like with kernel capabilities, the user is free to set up the environment
to drop unnecessary capabilities before spawning certain processes.
In the example above, the user drops "network" gid before spawning Steam,
preventing the latter from reconfiguring network. However, desktop applet
that shows networking status is spawned with "network" still in gids, so
it is still allowed to issue network configuration commands.
Problem 1. GID checks are too difficult to configure in current kernels
Problem 2. Users *cannot* drop gids in current kernels
Context: passing commands to a daemon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let's use wimon as an example. The long-running, highly privileged manager
process (wimon) creates a control socket on startup and listens for
connections:
	srwxr-xr-x 1 root root     0 ... /run/wimon
A control tool, wictl, opens the socket and sends some command there.
How should wimon ensure that the command is ok to execute?
Case 1: sudo (privilege escalation for client code)
  Let wimon check effective uid of the connecting process, and reject
  the connection unless it's 0. Then use sudo to run wictl, so that all
  permitted commands would be issues with effective uid 0.
  It is up to sudo to decide whether the user is allowed to control wimon,
  ask some sort of password if necessary and so on.
Case 2: dbus (dedicated authentication broker).
  Instead of connecting to wimon directly, wictl, running with non-privileged
  uid and euid=uid, connects to a privileged broker service and asks it to
  send given command to wimon. The broker checks permissions based on wictl
  euid, and relays the command to wimon. Like in case 1, wimon only accepts
  commands sent from euid 0, or some other uid reserved for the broker.
  It is up to the broker to check user's permission to run a given command.
Case 3: server-side credentials check
  wictl sends its uid/gid(s) over the socket, or wimon uses getpeerinfo
  to get them. Either way, wimon decides whether to allow the command
  based on the ids it gets.
  This allows really fine control over permissions since the checks
  happen as close to the command parsing code as possible. Also, there's
  no privilege escalation and no additional services.
Case 4: kernel-mediated permissions check
  File permissions on /run/wimon are used to control access to wimon.
  The socket gets chmod srwxr-x--- early, and its gid gets set somehow.
  When wictl attempts to connect to the socket, the kernel checks its
  uids and gids and either allows or disallows connection.
  If implemented, this would require almost no effort from the userspace
  once the file ownership has been set.
Problem: setting permissions for a socket
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Because of the weird ways local sockets work, the only reliable way to
do it is from the service process, with a sequence like this:
	unlink(path); /* or otherwise make sure $path does not exist */
	fd = socket(...);
	bind(fd, { AF_UNIX, path })
	chown(path, uid, gid);  /* note: not fchown(fd) or fchmod(fd) */
	chmod(path, mode);
	listen(fd);
For this to work, the service itself must know uid and gid.
How do we configure that?
One option is to hard-code them. It's not a good option.
Another option is pass them to the daemon somehow, naturally as names
and not as ids:
	./server -u user -g group 
The service -- any service -- will then need to convert "user" and "group"
into uid and gid by parsing /etc/passwd and /etc/group. This may be viable
for regular large-libc settings, but for minitools, it's just not right.
The root of the problem above is the unlink() call at the top.
Were the service capable of binding on a pre-made socket, it would
be possible to use chown(1) to set up permissions, and the service
would not need to care about that at all.
Solution: sticky sockets
~~~~~~~~~~~~~~~~~~~~~~~~
Linux already allows creating sockets with mknod, even though the resulting
FS nodes are completely useless. The idea is to fix this, in a way that won't
break existing code too much.
Let's call a local socket with the sticky bit set a "sticky socket".
This combination is meaningless and should never happen with conventional
use of local sockets, so we can assign pretty much semantics to it.
Trying to bind() a sticky socket should
	* fail with EBUSY if it is already bound by another process
	* fail with EPERM unless the calling process owns the socket
	  (euid = uid of the socket), or it has CAP_CHOWN
	* succeed otherwise
With sticky sockets, permission setup becomes simple and straightforward:
	mknod /run/wimon s
	chown wimon:network /run/wimon
	exec wimon +/run/wimon
The service only needs to know the name of the socket to bind.
All passwd/group parsing code remains in chown where it belong.
Problem: no way to drop gids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This one is simple, setgroups(2) unconditionally requires CAP_SETGID.
In contrast, capset(2) allows dropping bits for non-privileged users.
No real options either, setgroups should be allowed for non-privileged
users as long as no attempt is made to gain gids.
For a process having groups C, setgroups(G) should succeed
if (g in C) for all (g in G).
This sounds simple, but turns much worse than expected upon closer
examination. In Linux, gid space is huge (2^32) and allowed size
of both sets is quite large as well (2^16). For comparison, there
are at most 32 capability bits and no more that 32 of them per set.
Luckily setgroups is rarely used even outside of minitools context,
and even then, it's mostly to impersonate a user, which means it reads
and applies whatever is written in /etc/groups.
Solution: setgroups() semantics change
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let setgroups() work as is for privileged users. For non-privileged users,
compare C and G, assuming both are sorted, and return
	* -EPERM, if inversion is detected in either C or G
	* -EPERM if exists g in G: g not in C
	* 0, and update C, otherwise
Those privileged uses of setgroups that matter (i.e. may be followed by
non-privileged use) will be in minitools-specific code anyway, so making
sure C is ordered should be possible.
Sub-socket level permission control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What if the service needs finer access control than just connect/no-connect?
Something like allowing some commands to all users and some to those in
specific group only.
Sure it can't be done with a single sticky socket, but it's not like sticky
sockets prevent the service from using any of the cases above. Send a set
of pre-configured groups over the socket in particular should work well
and may be freely combined with socket permissions.
Another possible alternative is to make one sticky socket per role.
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

gidcaps.txt

Latest commit

History

gidcaps.txt

File metadata and controls