- WARNING: I've built a more polished version of this agent in Django. Code can be found here:Duality
- Else, if you dont like Django or like working with no abstractions, this repo is for you.
Duality is an AI agent crew that can take over your browser and complete tasks for you.
This repo contains two types of agent logic:
- Semi-Autonomus (main branch): An AI gent that can learn to complete computer tasks through a simple video recording of a task.
- Autonomus (full_branch): An AI agent that can autonomousely complete computer tasks with no demonstration. (STILL IN CONSTRUCTION)
- Create a Screen Recording:
- Record how you complete a task
- Transcription:
- The recording is parsed into actionable steps using GPT4o.
- Interaction:
- The agent then initiates a browser session, parses through HTML, finds the relevant page elements and interacts with them according to the query.
- Constructing a Plan:
- Based on the provided text query, the agent constructs a plan to achieve the specified goal.
- Browser Session and Transcription:
- The agent begins a browser session and transcribes its screenshots using GPT4o.
- Parsing to Memory:
- The agent then saves the screen content into episodic and simantic memory, and takes action based on the context.
- Analysis & Action:
- The agent analyzes the web state against the goal + memory and takes itterative actions until the goal is achieved.
Create anaconda environment
conda create -n agent_env python=3.10 -y
conda activate agent_env
Install dependencies
pip install -r requirements.txt
Set up the api keys
AGENTQL_API_KEY=<AGENTQL_API_GOES_HERE>
OPENAI_API_KEY=<OPENAI_API_GOES_HERE>
To use the application:
- Run main.py to host a local server.
- Open the html scrip to begin playing with the application in the browser.
P.S: Autonomus Logic is extremely novel and thereby experimental. It makes mistakes, so please use at your own peril.