blobio/flowd at master · jmscott/blobio

History

Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
Makefile		Makefile
README		README
bio-dr2rrd		bio-dr2rrd
blobio-flowd.mkmk		blobio-flowd.mkmk
blobio.flowd.plist.example		blobio.flowd.plist.example
blobio.flowd.service.example		blobio.flowd.service.example
brr.go		brr.go
command.go		command.go
compile.go		compile.go
cron-dr2rrd		cron-dr2rrd
doc.go		doc.go
fdr-template.rrd		fdr-template.rrd
fdr.go		fdr.go
fdr2pg-copy-stage		fdr2pg-copy-stage
fdr2rrd-minute		fdr2rrd-minute
file.go		file.go
flow.go		flow.go
flowd-execv.c		flowd-execv.c
flowd.go		flowd.go
godoc-httpd		godoc-httpd
graphviz.go		graphviz.go
launchd-flowd		launchd-flowd
log.go		log.go
parser.go		parser.go
parser.y		parser.y
pid-log.go		pid-log.go
qdr.go		qdr.go
rrd-create-fdr		rrd-create-fdr
rrd-graph-fdr		rrd-graph-fdr
rrd-template-fdr		rrd-template-fdr
run-stat-flowd-tuple		run-stat-flowd-tuple
schema-rrd-stage.sql		schema-rrd-stage.sql
server.go		server.go
sql.go		sql.go
stage-fdr2rrd.sql		stage-fdr2rrd.sql
start-flowd		start-flowd
stop-flowd		stop-flowd
sync.go		sync.go
tail.go		tail.go
tas_lock.go		tas_lock.go
tsort.go		tsort.go
xdr.go		xdr.go
xdr2pg		xdr2pg
xdr2psql		xdr2psql

README

Synopsis:
	Flow blob request records into a directed acylic graph of processes and SQL queries

Description:
	While reading blob request records the flowd server will order the
	execution of unix process or database queries by qualifying on the
	exit status or query results.  The purpose of flowd is to derive
	immutable facts about the blobs scanned on the input stream,
	starting with unix processes than analyze the blobs and ending with
	facts populated into a relational database.

	The design inspiration of flowd is a mashup of the famous unix commands
	"make" and "awk".  The process/query order is syntacally a directed
	acyclic graph.  The side affects of the invoked unix processes/queries
	is hidden from flowd, other than the exit status or query results.

	The invocation of a particular process is determined by
	qualifying on the exit status of older processes in the the
	same flow.  A flow on a particular brr tuple will always
	terminate, since the call order is a DAG.  However,
	infinite loops are possible for processes that always reread a
	blob, thus generating more brr records.

	Flows are independent of other flows.  In other words, the
	results of a particular flow can not be referenced with in flowd by
	a different flow. Such logical independence yields natural parallelism.

Flow Detail Record:
	A flow detail record, named a "fdr", summarizes the outcome
	of invoking a set of related calls, a "flow", on a particular blob
	request record.

	The format of an "fdr" is the typical unixy new-line separated row
	of tab separated fields.  The layout of the fields are as follows:

		start_time	#  RFC3339Nano:YYYY-MM-DDThh:mm:ss.ns9[+-]HH:MM
		blob		#  udig of the blob
		ok_count	#  count of ok process exits <=255
		fault_count	#  count of process faults <=255
		wall_duration	#  wall clock duration seconds.nsec
		log_sequence	#  service flow sequence, <= 2^63 - 1 

	The combination of "start_time" and "log_sequence" can safely be
	assumed to be unique for records associated with this service.
	Pedantly speaking, the uniqueness constraint could be violated by
	EXTREMELY rapid process bouncing.

Process Execution Detail Record:
	A process execution detail record, abbreviated as "xdr", summarizes
	the outcome of calling a particular command on a particular blob.

	The format of an "xdr" is the typical unixy new-line separated row
	of tab separated fields.  The layout of the fields are as follows:

		start_time		#  YYYY-MM-DDThh:mm:ss.ns9[+-]ZZ:ZZ
		log_sequence		#  sequence number for log file
		command_name		#  command name in config file
		exit_class		#  termination class: OK, ERR, SIG, NOPS
		blob			#  udig of blob being processed
		exit_status		#  process/signal exit staus <= 255
		wall_duration		#  wall clock duration in seconds
		system_duration		#  os duration in seconds
		user_duration		#  user/kids total duration in seconds

	The combination of start_time and log_sequence  can safely assumed to
	be unique within the log stream, although, in theory, uniqueness
	could be compromised by EXTREMELY rapid process bouncing.  See
	comment for "fdr"
	
	Wall/System/Users duration is float64 seconds formated as %.9f.

Query Detail Record:

	A query detail record, abbreviated as "qdr", summarizes the
	invocation of an sql database query 

		start_time		#  YYYY-MM-DDThh:mm:ss.ns9[+-]ZZ:ZZ
		log_sequence		#  sequence number in current log file
		query_name		#  name of query, unique across flow
		termination_class	#  termination class: OK, ERR
		blob			#  udig of blob
		sqlstate		#  sql termination state
		rows_affected		#  Rows fetched/merged/updated
		wall_duration		#  wall clock duration as seconds
		query_duration		#  db query duration as seconds

	As with the "xdr" records, uniqueness can be determined by start_time
	and log_sequence.

	Wall/Query durations are float64 seconds formated as %20.9f.

Note:
	- inbound{} ought to tie together various sources of brr records.

	- should tail{} explcitly indicate a brr record.

		tail brr ss_inbound
		{
			...
		}

	- add to command{}

		exit_status is true;
		exit_status is null;

	  to force always true and always null.  always null effectively
	  disabled a command.

	- think about an idle{} block added to flow language.  idle{}
	  would execute code when no brr tuples are flowing

	- should [fxq]dr logs be rolled on flow boundaries?

	- think about defaulting to single defined database{} in query{}
	  statements, thus eliminating the redundant 'database is ...'
	  statement.

	- think about how to handle sql database restarts.

	- should the statement 'exit_status is OK when ...' be rewritten like
	  OK when exit_status in {...}' for a more query like syntax?

	- unfortunatly the flow language is ascii only, except for strings.
	  being ascii only, particualr in command and query names, makes
	  identifying exec and query detail records much easier, at the cost
	  of completness.  investigate the meaning of the functions
	  golang.unicode.IsLetter() and glolang.unicode.IsDigit().
	  if command/query names are allowed to be full unicode, then maybe a
	  field like ascii_name = "..." ought to be required in the definition.

	- insure that tail, command & sql query ascii names match the pattern
	  ^[a-zA-Z0-9][a-zA-Z0-9_]{0,63}$.

	- verify that all 64 bit counts and sequences are no bigger than
	  2^63 - 1

	- log when a tailed file rolls.

	- Why is the statement 'database is ...' required in the query
	  definition?  when the statement is missing why not just default to
	  the only database defined previously in the file?

	- Should the statement 'sql query <name> row' be 
	  'sql query row <name>'>?  Also, what changing the
	  'result row is(answer bool)' to 'result row is(answer in bool)'

	- does flowing {bool,string,ui64}_value full structures instead
	  of pointers lessen garbage collection overhead.

	- think about a channel that traces i/o, possibly using reflection.

	- What about a daily summary of the stats, written as the
	  last line and the first line of every rolled log file.

	  distinct udigs ought to be tracked in stats in log file.
	  also, maximum number of open queries.

	- On every daily log roll, add dump of environment variables and
	  any other bootup configs that dumped upon startup to newly
	  rolled log file.

	- If rolling a log file for which a boot up occured, then
	  write the sha of that file into next log file.

	- Should flowd serialize the firing of flow to a particular brr?
	  For example, suppose the following 2 brr records.

		10:00:00am	sha:abc ...	eat==no
		10:00:01am	sha:abc ...	put==yes

	  if the 10:00:00am eat fires after the 10:00:01 put, then
	  could that problematic.  ordering for a single udig may help
	  for independent facts about that udig, but will it help for
	  interelated facts, like graph dependency?  not so clear.

	- Think about adding code to catch runaways; i.e, commands that keep
	  failing to frequently.  Another approach penalize blobs which keep
	  reappearing via separate queues or weights.

	- randomize the initial flow sequence value

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README

FilesExpand file tree

flowd

Directory actions

More options

Directory actions

More options

Latest commit

History

flowd

Folders and files

parent directory

README