-
Notifications
You must be signed in to change notification settings - Fork 15
Expand file tree
/
Copy pathbaselib.txt
More file actions
239 lines (174 loc) · 9.18 KB
/
baselib.txt
File metadata and controls
239 lines (174 loc) · 9.18 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
Quick introduction to the base library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This project comes with a custom bundled C library and does not rely on
an external one like most userspace C projects do. The whole thing builds
with a freestanding compiler, like the Linux kernel but unlike say
GNU coreutils or busybox. All library functions used in the project
are provided by the project.
The base library is not POSIX compliant. Nor does it follow the ANSI C
standard. It is designed to mesh well with the underlying Linux OS and
not to provide a translation layer on top of it like a POSIX libc would.
It is much simpler than a POSIX libc could possibly be, too.
System calls
~~~~~~~~~~~~
A major feature of the base library is the way it deals with syscalls:
if((fd = sys_open(name, flags)) < 0)
/* fd is negative error code, e.g. -ENOENT */
else
/* fd is a valid descriptor */
Syscall names are prefixed with sys_, unlike in POSIX, so there is a clear
distinction between syscalls and library functions. Syscall code is inlined,
that sys_open above is not a function call. There is no global errno, the
calls return error code on failure.
The library does (almost) no wrapping for syscalls. Whatever the kernel
interface and semantics for the call is, it's the same in the userspace.
For instance, sys_ppoll does update its timestamp argument.
What is struct top* ctx?
~~~~~~~~~~~~~~~~~~~~~~~~
The lack of errno in the library allows for a particular memory-saving trick
with globals in small applications.
Global variables declared the usual way force at least a page to be allocated
for .bss segment, even though the total size of the variables may be way less
than a whole page. To avoid wasting a whole page, the variables get moved to
the stack. There's always at least one dirty page of stack so no loss happens.
Busybox has `struct G` which is the same exact thing.
POSIX libc applications cannot drop .bss completely because of errno (and other
accidentally linked global stuff, but mostly errno because errno is essentially
unavoiable). In this project, getting away with no .bss is possible, and quite
a lot of smaller tools have no .bss/.data segments whatsoever.
Unlike .bss/.data, the stuff in the stack cannot be addressed statically.
The workaround is to pass the pointer to every single function that uses them:
struct top {
int foo;
};
void do_blah(struct top* ctx, ...)
{
ctx->foo++;
}
int main(...) {
struct top context = { ... }, *ctx = &context;
do_blah(ctx, ...);
...
}
This way, first-argument register holds the base address and all globals
are accesses via offsets to that address. The base address is not known
at link time but once set early in main(), it does not change.
Extended syscalls
~~~~~~~~~~~~~~~~~
There are several groups of "syscall extensions" in Linux when a newly
introduced syscall supersedes an older one, like so:
sys_open(fd, name, mode) <- sys_openat(AT_FDCWD, fd, name, mode)
Pretty much al at-file syscalls operate like that.
In a lot of cases, the code using them looks better with the original calls.
The preferred approach taken in this project is define older syscalls
as macros over the newer ones. Even though there is NR_open, sys_open()
calls NR_openat with at=AT_FDCWD to emulate the behavior of the original
call.
Rationale: on the kernel side of things, using newer NR-s seems to be the
preferred approach, to the point they are willing to make older NR-s defines
conditional in the headers. There are no real downsides to doing this either,
except maybe for strace not spelling the exact syscalls used.
Prototype for main()
~~~~~~~~~~~~~~~~~~~~
The entry trampoline `_start.s` only supplies argc and argv to the C code:
int main(int argc, char** argv);
There is no envp there. If needed, it can be derived like this:
char** envp = argv + argc + 1;
It's rarely used and is easy to do in C. No point in doing it in assembly.
Handling signals
~~~~~~~~~~~~~~~~
The way handlers work in Linux seems to be designed around some compatibility
issues (POSIX sigaction perhaps?) and makes little sense out of that context.
Normally it is all heavily wrapped by a POSIX libc, and most users aren't even
aware of the issue. In this project, wrappers are much thinner, so more of this
stuff is visible.
Signal handlers are not really functions. They cannot return, instead they are
supposed to call sys_sigreturn() to restore process context and continue
with the regular program flow. Linux sys_sigaction however is designed to use
regular functions as signal handlers.
struct sigaction sa = {
.handler = sighandler,
.flags = SA_RESTORER | ...,
.restorer = sigrestorer
}
With this setup, sighandler is allowed to return, and the return address is
sigrestorer, which has to be a chunk of assembly invoking NR_sigreturn.
Pretty much none of this is ever relevant in the code that uses signals.
The library defines a macro, SIGHANDLER, to create this structure and fill
the .restorer field so that the .handler would work as expected.
# Note this is another case of extended syscalls, the actual syscalls used
# are NR_rt_sigaction and NR_rt_sigreturn respectively.
There is no way to pass any additional data to the handlers, like for instance
struct top* ctx pointer. Tools that use struct top generally cannot use signal
handlers, and must rely on signalfd instead. That said, signalfd often results
in better code and fewer syscalls anyway.
Memory allocation
~~~~~~~~~~~~~~~~~
There is no universal malloc/free in the base library. None of the library
functions rely on being able to allocate arbitrary amounts of memory either.
Memory management is always left to the application.
Whenever necessary, library functions expect to be provided with pre-allocated
memory space to work with.
The way applications from this project manage memory varies. Some have
constant memory footprint. Some use heap (via sys_brk) in data-stack mode,
without conventional free(). Some use sys_mmap for large buffers.
Formatted output
~~~~~~~~~~~~~~~~
There is no general-purpose printf in the library. The problem with printf
is that it's a runtime interpreter, which means the compiler does not know
which sequences have been used and has to compile support for all of them.
Even if that wasn't an issue, removing stuff from printf is not an easy task,
not something a compiler or linker can do. This all really doesn't bode well
with the whole small and static idea.
So instead, the library has a set of explicit formatting functions that work
like this:
char buf[N];
char* p = buf; /* current pointer, initially at the start of buf */
char* e = buf + sizeof(buf) - 1; /* end of buffer, 1 char to spare */
p = fmtstr(p, e, "Some number: ");
p = fmtint(p, e, 123);
*p = '\0'; /* terminate the string using the spare char */
All fmt* functions advance p, so that it always points to remaining space in
buf, while also ensuring p <= e.
The linker can then easily pick only the functions used, making sure there's
no dead formatting code in the executable, like it would be with printf. And
this scheme also make it very easy to write application-specific formatters,
something that's very difficult with printf.
# Why (p, e) tuple instead of a struct? It actually makes code simpler.
# The first argument and the return value share registers on all supported
# targets, so fmt* effectively mutate the first arg register in place.
#
# The second pointer, e, could be left in place as well across several fmt*
# calls, but the language is not expressive enough to describe it.
Formatting is always done into a buffer of fixed size, similar to snprintf.
There are no corresponding stream-printing functions similar to printf proper
or fprintf. Buffered output, whenever necessary, is always explicit.
Note there is a simple printf implementation defined in printf.h, but it is
only meant to be used for printf debugging. Check the comments there on its
limitations.
Error reporting
~~~~~~~~~~~~~~~
Almost all cases where errors are reported to the user in this project are
handled by the following two functions:
warn(msg, arg, ret);
fail(msg, arg, ret);
The first one just prints the message, the second one also does _exit(-1).
The message printed is always "tag: msg arg: err", but if msg or arg are NULL
or ret is 0 relevant parts are skipped, resulting in "tag: msg: err" etc.
This may sound extremely limiting compared to say BSDish err() family, but
actually turns out fine.
The last argument is the negative error from syscall:
if((ret = sys_foo(blah)) < 0)
fail("foo", blah, ret);
Error codes are reported as literally "ENOENT", "EINVAL" in cases where
convential libc would use "No such file or directory", "Invalid argument"
respectively. Any error code that has no corresponding string in the list
gets reported as a raw integer (e.g. -111), the user may look it up later
using the `errno` tool.
This setup allows executables to link only the strings for the set of codes
that syscalls the application uses can actually return. Linux defines a lot
of error codes, but the vast majority of syscalls only return a handful.
Linking the rest in would result in dead data.
ERRTAG and ERRLIST macros define the static data needed for warn and fail,
namely the leading tag and the list of error strings. These should always
be in the same file with main().