fix: pagerduty on potential sla breach#3
Open
deryrahman wants to merge 696 commits intoderyrahman:mainfrom
Open
fix: pagerduty on potential sla breach#3deryrahman wants to merge 696 commits intoderyrahman:mainfrom
deryrahman wants to merge 696 commits intoderyrahman:mainfrom
Conversation
* feat: add job upsert in job repository * test: add cases in job repository upsert * feat: add job upsert in job service * fix: issue when differentiating spec when upserting * Revert "test: add cases in job repository upsert" This reverts commit 43f1c3c. * Revert "feat: add job upsert in job repository" This reverts commit 6dc7bb6. * feat: add upsert job specifications in job handler * refactor: use upsert in job deploy command * refactor: reword log messages in job upsert * feat: add batch size in job deploy command * chore: update to latest proto * chore: update to latest proto with unary changes * refactor: add successfull job names in job upsert response * fix: include job name information in error message when fail to generate jobs * fix: lint issue in job service * fix: wrong spec mentioned in log in job service parallel runner * fix: should not log job creation/update in upsert, if there is 0 affected job * chore: remove unnecessary server logging for catchup in job add and upsert * fix: avoid changelog insertion if there is no change * refactor: checking spec difference on job upsert * test: fix assertion to handle intermittent test failures in job upsert * refactor: break down job service interface * refactor: return job names as part of job upsert response * chore: update proton commit
* feat: add upsert resource in resource service * feat: add upsert resource in resource handler * feat: accept multiple resources in upsert resource api * feat: add resource upload command * feat: add error message to the upsert response * fix: return error when failure found in resource upload command * chore: update latest proto * chore: update to latest proto with unary changes * test: fix broken test cases in resource upsert * refactor: reuse resource upsert & diff detection logic from deploy, in new upsert api * chore: reword resource upload command examples * feat: add successful resource names in resource upsert response * chore: update proton commit
* fix: skip plan migrate on same namespace * feat: improve handle for multiple namespace same name * feat: support resource migration based on datastore * fix: update unit test name based on assertion
* feat: auto dependency capability when destination is changed * refactor: appending job to be resolved * test: add test when add new upstream * test: add test when update upstream * feat: suport downstream auto deps on upsert api * fix: support downstream autodeps on replace all * fix: test on replace all * feat: deduplicate jobs before resolve * test: add testcase fror deduplicate + resolve deps * test: fix ordering issue on test
* refactor: only show direct downstream when deleting jobs * refactor: add 2 case unit test * refactor: use list of list of downstreams * refactor: skip lint
* feat: add resource changelog API service & repo * feat: unit test for service * feat: move to a separate service * feat: add handler & proto * feat: reorganize functions & track resource updates * feat: fix metric counter & fix job changelog query * feat: skip when there's no changes * feat: update queries structure * feat: fix query & handler function name * fix: handler naming * feat: update on code review * feat: remove sorting from get query * fix: lint * feat: update proton commit
* fix: duplicate unique constraint issue on job deployment * chore: fix lint issue in job service test
* fix: re-create deleted resource fetch * feat: passing onlyActive param as const
…255) * feat: add service logic * feat: integrate bulk delete to client * feat: update job service logic & handler * feat: update proton * feat: fix client * feat: fix test * feat: refactor approach * feat: restructure downstream deletion * feat: fix after sync with master * feat: fix tracker already exists * feat: unit test * feat: fix variable naming & remove excessive logs * feat: update logic to only use deletionTrakerMap * fix: intermittent unit test fix * fix: handling empty job * feat: update proton commit
* fix: add logical date validation for replay
* feat: add bq model exist capability * feat: add unit test on BQ Model part
* feat: add DAG template for Airflow 2.9.x * fix: update test * feat: change config kubernetes to kubernetes_executor * feat: update scheduler version docs for supported versions * feat: add support for multiple airflow version for a single DAG template * fix: magic number moved to constant * feat: refactor config approach * feat: update docs
* fix: replay ignore upstreams * fix(optimus): get replay with filter * fix(optimus): get replay by filter cleanup
* fix: refactor metrics * fix: lint * fix: refactor metric collection
* feat: emit events on replay create, job update/delete and resource update/delete
* fix: add logging in __lib.py
* feat: add 1st version of resource spec v2 parser * feat: restructure * feat: add ReadByURN method for resource, update GetByURN to use it * feat: restructure function, add unit tests * feat: use urncomponent for dataset and resource name * feat: add unit test for update & create * feat: restore empty name checking * fix
* fix: watch replay across restarts * fix: add test
* feat: multi team alert * refactor: support multi team alert * feat: support multiple notify
* feat: add sensor support for schedule changes * add test & fix code * also fetch older runs on schedule change * fix linter * split get old runs schedule into multiple functions * restructure * add indexing on changelog project name, name, and created_at * fix structure for lint * restructure for early return
* fix: use correct alert in alertManager * use local reference in goroutine
* feat: validate job with upstreams schedule * only validate for more than daily cron, and use jobs in request if any * move validation config parsing to server level * use a granular function * fix linter * adjust time reference * provide reference timezone in the message
* debug: disable db migrate * feat: initial dummy dex resolver * feat: initial configurable dummy 3rd party resolvers * feat: resolve dex by string matching + resolve it on different function call * feat: add placeholder for creating sensor in lib.py * feat: persist third party type * fix: lint * fix: column naming for 3rd party * fix: test + resource urn naming * feat: use dedicated third party table * fix: lint * fix: lint * fix: bring back migrate db * refactor: rename table migration for 3rd party resolver * Dex api integration (#489) * feat: resolve dex by string matching + resolve it on different function call * feat: add placeholder for creating sensor in lib.py * feat: persist third party type * fix: test + resource urn naming * feat: use dedicated third party table * fix: lint * fix: integrate dex API * fix * refactor: use url parser + producer type check * refactor: abstracting out sensor service * refactor: returning the response object for third party client --------- Co-authored-by: Dery Rahman Ahaddienata <[email protected]> Co-authored-by: Yash Bhardwaj <[email protected]> * refactor: move window logic to optimus server (#496) * refactor: remove unecessary code * refactor: reuse window interval calculation on server side * refactor: lowercase response * add feature flag, and metrics * fix: return iscomplete status * refactor: const status, remove identifier, refactor poke * update: proto --------- Co-authored-by: Yash Bhardwaj <[email protected]> * fix: lint * fix: test case * fix: lint * feat: add auth email support * fix: integration test * fix: duplicate db migrate file * fix: config parsing * fix: parseDateTime properly * fix: __lib.py * fix: add more logs * fix * fix: test --------- Co-authored-by: Yash Bhardwaj <[email protected]> Co-authored-by: Yash Bhardwaj <[email protected]>
* feat: enhance potential sla log + response * feat: enhance with job run information * feat: update proto commit --------- Co-authored-by: Yash Bhardwaj <[email protected]>
* fix: dex resolver * refactor: fix dex resolver * feat: improve parallel bul resolve 3rd party * refactor: enhance upstream error * refactor: append unresolved * fix: add test cases for DEX resolver * fix: lint * fix: add test for bulk resolve --------- Co-authored-by: Yash Bhardwaj <[email protected]>
…each (#500) * feat: add neccessary logs * feat: update proto + add dampering coeff through api * feat: include target job as calculation * feat: deduplicate if alert already exist * fix: linter * refactor: full causes to store in persistent repo * feat: skip detection for job on the list * fix: return no breach * fix: temp solution to skip preprod project * feat: bump proton version
* fix: only fill end times if job/operators succeed * fix query * group runs by project * explicitly exclude preprod * add brackets for or cases
* fix: temp solution ignore backfill alert * fix: prevent nil pointer * refactor: use alerrules and expose it on constructor * fix: linter * test: add test case detect backfill
* fix: dex interval fix * fix: tests --------- Co-authored-by: Yash Bhardwaj <[email protected]>
Co-authored-by: Yash Bhardwaj <[email protected]>
* fix: log typo error * fix: set dex request interval in UTC timezone --------- Co-authored-by: Yash Bhardwaj <[email protected]>
* fix: use timezone * feat: add targeted sla as column * feat: storing full lineage * refactor: full root cause to all upstream state * fix: remove unused var + compacting the full lineage * refactor: rename breaches state * refactor: adjust test case * fix: linter * feat: store targeted sla * refactor: remove unecessary debug
* feat: add buffer padding when checking the breach * fix: skip if the job is disabled * fix: remove padding * fix unit test on lineage resolver * fix: sla predictor --------- Co-authored-by: Ahmad N. F. <[email protected]>
Co-authored-by: Yash Bhardwaj <[email protected]>
…lude latest run for non-success cases (#516) * fix(lineage-resolver): update job identifier by target job name & include latest run for non-success cases * fix linter * fix query job end time & add job status
* fix: sla alert schedule time handling * fix: rename properties --------- Co-authored-by: Yash Bhardwaj <[email protected]>
* chore: expose GetRecentScheduleChange for potential sla breach * fix test * feat: check scheduled change when detecting breach (#520) --------- Co-authored-by: Dery Rahman Ahaddienata <[email protected]>
* fix: false sla alert check * fix: add skip sla alerts for backfill temp fix in direct pager and slack integration flow * fix: lint --------- Co-authored-by: Yash Bhardwaj <[email protected]>
* feat: add reading kubernetes metadata sa from job spec * feat: add storing kubernetes metadata sa * feat: interpolating k8s sa on dag file * feat: inject svc acc only on pod operator * refactor: new kubernetes metadata * test: fix job spec suite test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.