Data-Analyst-Agent

๐Ÿš€ Data Analyst Agent

License: MIT Python Version GitHub issues GitHub forks GitHub stars

An API service that uses LLMs to fetch, parse, analyze, and visualize data with tool calls (web scraping, DuckDB, pandas, plotting) under a sandbox for safety. The server also starts concurrent backup and fake-response workflows to provide a fallback result if the primary path fails or times out.

๐Ÿ“š Table of Contents


โœ… Features


๏ฟฝ Getting Started

Prerequisites

Quick Start

Windows (cmd.exe)

  1. Create venv and install dependencies:
    python -m venv venv
    venv\Scripts\activate
    pip install -r requirements.txt
    playwright install
    REM Optional if you won't use Docker sandbox:
    pip install uv
    
  2. Set provider and API key:
    set LLM_PROVIDER=openai
    set OPENAI_API_KEY=YOUR_KEY
    
  3. Start API:
    uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
    

Linux/macOS

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install
# Optional if you won't use Docker sandbox:
pip install uv
export LLM_PROVIDER=openai
export OPENAI_API_KEY=YOUR_KEY
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Docker Sandbox Images (Optional)

docker build -t data-agent-api .
docker build -f Dockerfile.sandbox -t data-agent-sandbox .
## If you build the sandbox with the tag above, set:
# export SANDBOX_DOCKER_IMAGE=data-agent-sandbox:latest

โš™๏ธ Configuration

Detailed Environment Variables

Variable Description Default
LLM_PROVIDER LLM provider to use openai
OPENAI_API_KEY OpenAI API key -
OPENROUTER_API_KEY OpenRouter API key -
GEMINI_API_KEY Gemini API key -
OPENAI_MODEL OpenAI model to use gpt-4o-mini
OPENROUTER_MODEL OpenRouter model to use google/gemini-flash-1.5
GEMINI_MODEL Gemini model to use gemini-1.5-flash-latest
OPENAI_BASE_URL OpenAI base URL https://api.openai.com/v1
OPENROUTER_BASE_URL OpenRouter base URL https://openrouter.ai/api/v1
USE_SANDBOX Enable sandbox for code execution true
SANDBOX_MODE Sandbox mode (docker or uv) docker
SANDBOX_DOCKER_IMAGE Docker image for sandbox myorg/data-agent-sandbox:latest
REQUEST_TIMEOUT Request timeout in seconds 170
LLM_MAX_OUTPUT_TOKENS Max output tokens for LLM 8192
OPENAI_MAX_OUTPUT_TOKENS Max output tokens for OpenAI 8192
OPENROUTER_MAX_OUTPUT_TOKENS Max output tokens for OpenRouter 8192
GEMINI_MAX_OUTPUT_TOKENS Max output tokens for Gemini 8192
MAX_FUNCTION_RESULT_CHARS Max characters for function result 20000
LARGE_FUNCTION_RESULT_CHARS Threshold for large function result 10000
MINIMIZE_TOOL_OUTPUT Minimize tool output in context true
AUTO_STORE_RESULTS Automatically store tool results true
MAX_INLINE_DATA_CHARS Max characters for inline data 4000
MAX_OUTPUT_WORDS Max words for final answer 200
BACKUP_RESPONSE_OPENAI_BASE_URL Base URL for backup OpenAI https://api.openai.com/v1
BACKUP_RESPONSE_OPENAI_API_KEY API key for backup OpenAI -
BACKUP_RESPONSE_OPENAI_MODEL Model for backup OpenAI openai/gpt-4.1-nano

Example .env (snippet)

LLM_PROVIDER=openai
OPENAI_API_KEY=...
OPENAI_MODEL=gpt-4o-mini
USE_SANDBOX=true
SANDBOX_MODE=docker
SANDBOX_DOCKER_IMAGE=data-agent-sandbox:latest
REQUEST_TIMEOUT=170
LLM_MAX_OUTPUT_TOKENS=800
MAX_FUNCTION_RESULT_CHARS=12000
MINIMIZE_TOOL_OUTPUT=true
AUTO_STORE_RESULTS=true
MAX_INLINE_DATA_CHARS=4000
MAX_OUTPUT_WORDS=200

โ–ถ๏ธ Run locally

Windows (cmd.exe)

set LLM_PROVIDER=openai
set OPENAI_API_KEY=...
set USE_SANDBOX=true
set SANDBOX_MODE=docker
uvicorn app.main:app --host 0.0.0.0 --port 8000

Linux/macOS

export LLM_PROVIDER=openai
export OPENAI_API_KEY=...
export USE_SANDBOX=true
export SANDBOX_MODE=docker
uvicorn app.main:app --host 0.0.0.0 --port 8000

๐Ÿ“„ API usage

Endpoint: POST /api/

This endpoint expects multipart/form-data with a required file field named questions.txt. Additional files are optional and will be saved to a per-request temp folder; their absolute paths are appended to the question text for tool code to access.

curl (multipart)

curl -X POST "http://localhost:8000/api/" \
	-F "questions.txt=@your_question.txt" \
	-F "data.csv=@data.csv" \
	-F "image.png=@image.png"

Python requests

import requests
with open("your_question.txt","rb") as q:
		files = {"questions.txt": q}
		# optionally add other files
		resp = requests.post("http://localhost:8000/api/", files=files, timeout=200)
print(resp.json())

Response


๐Ÿงฐ Tools and shared_results

Exposed tool calls in the current orchestrator:

Notes

Referencing Saved Results in Later Tool Code


๐Ÿงช Testing

Quick Test

python test_api.py

Run Test Suite

pytest -q

Manual Test

curl -X POST "http://localhost:8000/api/" -H "Content-Type: application/json" -d '{"question":"Scrape ... Return a JSON array ..."}'

Note: the API expects multipart/form-data; the JSON example above is only illustrative of a prompt and will not work against this server.


๐Ÿ›ก๏ธ Sandbox

Modes

Env Vars


๐Ÿ’ธ Cost controls and token tips


๐Ÿงฉ Extending & Security

Extend Tools

  1. Add schema in Orchestrator.functions
  2. Implement in app/tools/
  3. Map in function_map and handler

Security


๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿค Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Please read our Contributing Guidelines for details on our code of conduct, and the process for submitting pull requests to us.


๐Ÿ“– Code of Conduct

We have a Code of Conduct that we expect all contributors and community members to adhere to. Please read it to understand the expectations.


Made with โค๏ธ by Varun Agnihotri