flowd
Directory actions
More options
Directory actions
More options
flowd
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
parent directory.. | ||||
Synopsis:
Flow blob request records into a directed acylic graph of processes and SQL queries
Description:
While reading blob request records the flowd server will order the
execution of unix process or database queries by qualifying on the
exit status or query results. The purpose of flowd is to derive
immutable facts about the blobs scanned on the input stream,
starting with unix processes than analyze the blobs and ending with
facts populated into a relational database.
The design inspiration of flowd is a mashup of the famous unix commands
"make" and "awk". The process/query order is syntacally a directed
acyclic graph. The side affects of the invoked unix processes/queries
is hidden from flowd, other than the exit status or query results.
The invocation of a particular process is determined by
qualifying on the exit status of older processes in the the
same flow. A flow on a particular brr tuple will always
terminate, since the call order is a DAG. However,
infinite loops are possible for processes that always reread a
blob, thus generating more brr records.
Flows are independent of other flows. In other words, the
results of a particular flow can not be referenced with in flowd by
a different flow. Such logical independence yields natural parallelism.
Flow Detail Record:
A flow detail record, named a "fdr", summarizes the outcome
of invoking a set of related calls, a "flow", on a particular blob
request record.
The format of an "fdr" is the typical unixy new-line separated row
of tab separated fields. The layout of the fields are as follows:
start_time # RFC3339Nano:YYYY-MM-DDThh:mm:ss.ns9[+-]HH:MM
blob # udig of the blob
ok_count # count of ok process exits <=255
fault_count # count of process faults <=255
wall_duration # wall clock duration seconds.nsec
log_sequence # service flow sequence, <= 2^63 - 1
The combination of "start_time" and "log_sequence" can safely be
assumed to be unique for records associated with this service.
Pedantly speaking, the uniqueness constraint could be violated by
EXTREMELY rapid process bouncing.
Process Execution Detail Record:
A process execution detail record, abbreviated as "xdr", summarizes
the outcome of calling a particular command on a particular blob.
The format of an "xdr" is the typical unixy new-line separated row
of tab separated fields. The layout of the fields are as follows:
start_time # YYYY-MM-DDThh:mm:ss.ns9[+-]ZZ:ZZ
log_sequence # sequence number for log file
command_name # command name in config file
exit_class # termination class: OK, ERR, SIG, NOPS
blob # udig of blob being processed
exit_status # process/signal exit staus <= 255
wall_duration # wall clock duration in seconds
system_duration # os duration in seconds
user_duration # user/kids total duration in seconds
The combination of start_time and log_sequence can safely assumed to
be unique within the log stream, although, in theory, uniqueness
could be compromised by EXTREMELY rapid process bouncing. See
comment for "fdr"
Wall/System/Users duration is float64 seconds formated as %.9f.
Query Detail Record:
A query detail record, abbreviated as "qdr", summarizes the
invocation of an sql database query
start_time # YYYY-MM-DDThh:mm:ss.ns9[+-]ZZ:ZZ
log_sequence # sequence number in current log file
query_name # name of query, unique across flow
termination_class # termination class: OK, ERR
blob # udig of blob
sqlstate # sql termination state
rows_affected # Rows fetched/merged/updated
wall_duration # wall clock duration as seconds
query_duration # db query duration as seconds
As with the "xdr" records, uniqueness can be determined by start_time
and log_sequence.
Wall/Query durations are float64 seconds formated as %20.9f.
Note:
- inbound{} ought to tie together various sources of brr records.
- should tail{} explcitly indicate a brr record.
tail brr ss_inbound
{
...
}
- add to command{}
exit_status is true;
exit_status is null;
to force always true and always null. always null effectively
disabled a command.
- think about an idle{} block added to flow language. idle{}
would execute code when no brr tuples are flowing
- should [fxq]dr logs be rolled on flow boundaries?
- think about defaulting to single defined database{} in query{}
statements, thus eliminating the redundant 'database is ...'
statement.
- think about how to handle sql database restarts.
- should the statement 'exit_status is OK when ...' be rewritten like
OK when exit_status in {...}' for a more query like syntax?
- unfortunatly the flow language is ascii only, except for strings.
being ascii only, particualr in command and query names, makes
identifying exec and query detail records much easier, at the cost
of completness. investigate the meaning of the functions
golang.unicode.IsLetter() and glolang.unicode.IsDigit().
if command/query names are allowed to be full unicode, then maybe a
field like ascii_name = "..." ought to be required in the definition.
- insure that tail, command & sql query ascii names match the pattern
^[a-zA-Z0-9][a-zA-Z0-9_]{0,63}$.
- verify that all 64 bit counts and sequences are no bigger than
2^63 - 1
- log when a tailed file rolls.
- Why is the statement 'database is ...' required in the query
definition? when the statement is missing why not just default to
the only database defined previously in the file?
- Should the statement 'sql query <name> row' be
'sql query row <name>'>? Also, what changing the
'result row is(answer bool)' to 'result row is(answer in bool)'
- does flowing {bool,string,ui64}_value full structures instead
of pointers lessen garbage collection overhead.
- think about a channel that traces i/o, possibly using reflection.
- What about a daily summary of the stats, written as the
last line and the first line of every rolled log file.
distinct udigs ought to be tracked in stats in log file.
also, maximum number of open queries.
- On every daily log roll, add dump of environment variables and
any other bootup configs that dumped upon startup to newly
rolled log file.
- If rolling a log file for which a boot up occured, then
write the sha of that file into next log file.
- Should flowd serialize the firing of flow to a particular brr?
For example, suppose the following 2 brr records.
10:00:00am sha:abc ... eat==no
10:00:01am sha:abc ... put==yes
if the 10:00:00am eat fires after the 10:00:01 put, then
could that problematic. ordering for a single udig may help
for independent facts about that udig, but will it help for
interelated facts, like graph dependency? not so clear.
- Think about adding code to catch runaways; i.e, commands that keep
failing to frequently. Another approach penalize blobs which keep
reappearing via separate queues or weights.
- randomize the initial flow sequence value