-
Notifications
You must be signed in to change notification settings - Fork 15
Expand file tree
/
Copy pathbaselib.txt
More file actions
224 lines (162 loc) · 8.43 KB
/
baselib.txt
File metadata and controls
224 lines (162 loc) · 8.43 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
Quick introduction to the base library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This project comes with a custom bundled C library and does not rely on
an external one like most userspace C projects do. It builds more like
the Linux kernel than like say GNU coreutils or even busybox, using
a freestanding compiler. All library functions used in the project are
provided by the project.
A major feature of the base library is the way it deals with syscalls:
if((fd = sys_open(name, flags)) < 0)
/* fd is negative error code, e.g. -ENOENT */
else
/* fd is a valid descriptor */
Syscall names are prefixed with sys_, unlike in POSIX, so there is a clear
distinction between syscalls and library functions. Syscall code is inlined,
that sys_open above is not a function call. There is no global errno, the
calls return error code on failure.
The library does (almost) no wrapping for syscalls. Whatever the kernel
interface and semantics for the call is, it's the same in the userspace.
For instance, sys_ppoll does update its timestamp argument.
What is struct top* ctx?
~~~~~~~~~~~~~~~~~~~~~~~~
Global variables declared the usual way force at least a page to be allocated
for .bss segment, even though the total size of the variables may be way less
than a whole page. To avoid wasting a whole page, the variables get moved to
the stack. There's always at least one dirty page of stack so no loss happens.
Busybox has `struct G` which is exactly the same thing.
POSIX libc applications cannot drop .bss completely because of errno (and other
accidentally linked global stuff). In this project, it is possible, and quite
a lot of smaller tools have no .bss/.data segments.
Unlike .bss/.data, the stuff in the stack cannot be addressed statically.
The workaround is to pass the pointer to every single function that uses
them:
struct top {
int foo;
};
void do_blah(struct top* ctx, ...)
{
ctx->foo++;
}
int main(...) {
struct top context = { ... }, *ctx = &context;
do_blah(ctx, ...);
...
}
This way, first-argument register holds the base address and all globals
are accesses via offsets to that address. The base address is not known
at link time but once set early in main(), it does not change.
Extended syscalls
~~~~~~~~~~~~~~~~~
There are several groups of "syscall extensions" in Linux when a newly
introduced syscall supersedes an older one, like so:
sys_open(fd, name, mode) = sys_openat(AT_FDCWD, fd, name, mode)
Pretty much al at-file syscalls operate like that.
In a lot of cases, the code using them looks better with the original calls.
The preferred approach taken in this project is define older syscalls
as macros over the newer ones. Even though there is NR_open, sys_open()
calls NR_openat with at=AT_FDCWD to emulate the behavior of the original
call.
Rationale: on the kernel side of things, using newer NR-s seems to be the
preferred approach, to the point they are willing to make older NR-s defines
conditional in the headers. There are no real downsides to doing this either,
except maybe for strace not spelling the exact syscalls used.
Prototype for main()
~~~~~~~~~~~~~~~~~~~~
The entry trampoline `_start.s` only supplies argc and argv to the C code:
int main(int argc, char** argv);
There is no envp there. If needed, it can be derived like this:
char** envp = argv + argc + 1;
It's rarely used and easy to do in C. No point in doing it in assembly.
Handling signals
~~~~~~~~~~~~~~~~
The way signal handlers work in Linux is weird. It seems to be designed around
some compatibility issues (POSIX sigaction perhaps?) and makes little sense out
of that context. Normally it is all heavily wrapped by a POSIX libc, and most
users aren't even aware of the issue. In this project, wrappers are much
thinner, so more of this stuff is visible.
Signal handlers are not really functions. They cannot return, instead they are
supposed to call sys_sigreturn() to restore process context and continue
with the regular program flow. Linux sys_sigaction however is designed to use
regular functions as signal handlers.
struct sigaction sa = {
.handler = sighandler,
.flags = SA_RESTORER | ...,
.restorer = sigrestorer
}
With this setup, sighandler is allowed to return, and the return address is
sigrestorer, which has to be a chunk of assembly invoking NR_sigreturn.
Pretty much none of this is ever relevant in the code that uses signals.
The library defines a macro, SIGHANDLER, to create this structure and fill
the .restorer field so that the .handler would work as expected.
# Note this is another case of extended syscalls, the actual syscalls used
# are NR_rt_sigaction and NR_rt_sigreturn respectively.
Memory allocation
~~~~~~~~~~~~~~~~~
There is no universal malloc/free in the base library. None of the library
functions rely on being able to allocate arbitrary amounts of memory either.
Memory management is always left to the application.
Whenever necessary, library functions expect to be provided with pre-allocated
memory space to work with.
The way applications from this project manage memory varies. Some have
constant memory footprint. Some use heap (via sys_brk) in data-stack mode,
without conventional free(). Some use sys_mmap for large buffers.
Formatted output
~~~~~~~~~~~~~~~~
There is no general-purpose printf in the library. The problem with printf
is that it's a runtime interpreter, which means the compiler does not know
which sequences have been used and has to compile support for all of them.
Even if that wasn't an issue, removing stuff from printf is not an easy task,
not something a compiler or linker can do. This all really doesn't bode well
with the whole small and static idea.
So instead, the library has a set of explicit formatting functions that work
like this:
char buf[N];
char* p = buf; /* current pointer, initially at the start of buf */
char* e = buf + sizeof(buf) - 1; /* end of buffer, 1 char to spare */
p = fmtstr(p, e, "Some number: ");
p = fmtint(p, e, 123);
*p = '\0'; /* terminate the string using the spare char */
All fmt* functions advance p, so that it always points to remaining space in
buf, while also ensuring p <= e.
The linker can then easily pick only the functions used, making sure there's
no dead formatting code in the executable, like it would be with printf. And
this scheme also make it very easy to write application-specific formatters,
something that's very difficult with printf.
# Why (p, e) tuple instead of a struct? It actually makes code simpler.
# The first argument and the return value share registers on all supported
# targets, so fmt* effectively mutate the first arg register in place.
#
# The second pointer, e, could be left in place as well across several fmt*
# calls, but the language is not expressive enough to describe it.
Formatting is always done into a buffer of fixed size, similar to snprintf.
There are no corresponding stream-printing functions similar to printf proper
or fprintf. Buffered output, whenever necessary, is always explicit.
Note there is a simple printf implementation defined in printf.h, but it is
only meant to be used for printf debugging. Check the comments there on its
limitations.
Error reporting
~~~~~~~~~~~~~~~
Almost all cases where errors are reported to the user in this project are
handled by the following two functions:
warn(msg, arg, ret);
fail(msg, arg, ret);
The first one just prints the message, the second one also does _exit(-1).
The message printed is always "tag: msg arg: err", but if msg or arg are NULL
or ret is 0 relevant parts are skipped, resulting in "tag: msg: err" etc.
This may sound extremely limiting compared to say BSDish err() family, but
actually turns out fine.
The last argument is the negative error from syscall:
if((ret = sys_foo(blah)) < 0)
fail("foo", blah, ret);
Error codes are reported as literally "ENOENT", "EINVAL" in cases where
convential libc would use "No such file or directory", "Invalid argument"
respectively. Any error code that has no corresponding string in the list
gets reported as a raw integer (e.g. -111), the user may look it up later
using `errno` tool.
This setup allows executables to link only the strings for the set of codes
that syscalls the application uses can actually return. Linux defines a lot
of error codes, but the vast majority of syscalls only return a handful.
Linking the rest in would result in dead data.
ERRTAG and ERRLIST macros define the static data needed for warn and fail,
namely the leading tag and the list of error strings. These should always
be in the same file with main().