An API service that uses LLMs to fetch, parse, analyze, and visualize data with tool calls (web scraping, DuckDB, pandas, plotting) under a sandbox for safety. The server also starts concurrent backup and fake-response workflows to provide a fallback result if the primary path fails or times out.
OPENAI_API_KEY or GEMINI_API_KEY is required to start the server (OpenRouter is supported as a provider, but server startup currently checks for OpenAI/Gemini).Windows (cmd.exe)
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
playwright install
REM Optional if you won't use Docker sandbox:
pip install uv
set LLM_PROVIDER=openai
set OPENAI_API_KEY=YOUR_KEY
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
Linux/macOS
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install
# Optional if you won't use Docker sandbox:
pip install uv
export LLM_PROVIDER=openai
export OPENAI_API_KEY=YOUR_KEY
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
Docker Sandbox Images (Optional)
docker build -t data-agent-api .
docker build -f Dockerfile.sandbox -t data-agent-sandbox .
## If you build the sandbox with the tag above, set:
# export SANDBOX_DOCKER_IMAGE=data-agent-sandbox:latest
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
LLM provider to use | openai |
OPENAI_API_KEY |
OpenAI API key | - |
OPENROUTER_API_KEY |
OpenRouter API key | - |
GEMINI_API_KEY |
Gemini API key | - |
OPENAI_MODEL |
OpenAI model to use | gpt-4o-mini |
OPENROUTER_MODEL |
OpenRouter model to use | google/gemini-flash-1.5 |
GEMINI_MODEL |
Gemini model to use | gemini-1.5-flash-latest |
OPENAI_BASE_URL |
OpenAI base URL | https://api.openai.com/v1 |
OPENROUTER_BASE_URL |
OpenRouter base URL | https://openrouter.ai/api/v1 |
USE_SANDBOX |
Enable sandbox for code execution | true |
SANDBOX_MODE |
Sandbox mode (docker or uv) |
docker |
SANDBOX_DOCKER_IMAGE |
Docker image for sandbox | myorg/data-agent-sandbox:latest |
REQUEST_TIMEOUT |
Request timeout in seconds | 170 |
LLM_MAX_OUTPUT_TOKENS |
Max output tokens for LLM | 8192 |
OPENAI_MAX_OUTPUT_TOKENS |
Max output tokens for OpenAI | 8192 |
OPENROUTER_MAX_OUTPUT_TOKENS |
Max output tokens for OpenRouter | 8192 |
GEMINI_MAX_OUTPUT_TOKENS |
Max output tokens for Gemini | 8192 |
MAX_FUNCTION_RESULT_CHARS |
Max characters for function result | 20000 |
LARGE_FUNCTION_RESULT_CHARS |
Threshold for large function result | 10000 |
MINIMIZE_TOOL_OUTPUT |
Minimize tool output in context | true |
AUTO_STORE_RESULTS |
Automatically store tool results | true |
MAX_INLINE_DATA_CHARS |
Max characters for inline data | 4000 |
MAX_OUTPUT_WORDS |
Max words for final answer | 200 |
BACKUP_RESPONSE_OPENAI_BASE_URL |
Base URL for backup OpenAI | https://api.openai.com/v1 |
BACKUP_RESPONSE_OPENAI_API_KEY |
API key for backup OpenAI | - |
BACKUP_RESPONSE_OPENAI_MODEL |
Model for backup OpenAI | openai/gpt-4.1-nano |
.env (snippet)LLM_PROVIDER=openai
OPENAI_API_KEY=...
OPENAI_MODEL=gpt-4o-mini
USE_SANDBOX=true
SANDBOX_MODE=docker
SANDBOX_DOCKER_IMAGE=data-agent-sandbox:latest
REQUEST_TIMEOUT=170
LLM_MAX_OUTPUT_TOKENS=800
MAX_FUNCTION_RESULT_CHARS=12000
MINIMIZE_TOOL_OUTPUT=true
AUTO_STORE_RESULTS=true
MAX_INLINE_DATA_CHARS=4000
MAX_OUTPUT_WORDS=200
Windows (cmd.exe)
set LLM_PROVIDER=openai
set OPENAI_API_KEY=...
set USE_SANDBOX=true
set SANDBOX_MODE=docker
uvicorn app.main:app --host 0.0.0.0 --port 8000
Linux/macOS
export LLM_PROVIDER=openai
export OPENAI_API_KEY=...
export USE_SANDBOX=true
export SANDBOX_MODE=docker
uvicorn app.main:app --host 0.0.0.0 --port 8000
Endpoint: POST /api/
This endpoint expects multipart/form-data with a required file field named questions.txt. Additional files are optional and will be saved to a per-request temp folder; their absolute paths are appended to the question text for tool code to access.
curl (multipart)
curl -X POST "http://localhost:8000/api/" \
-F "questions.txt=@your_question.txt" \
-F "data.csv=@data.csv" \
-F "image.png=@image.png"
Python requests
import requests
with open("your_question.txt","rb") as q:
files = {"questions.txt": q}
# optionally add other files
resp = requests.post("http://localhost:8000/api/", files=files, timeout=200)
print(resp.json())
Response
{"error": "..."}shared_resultsExposed tool calls in the current orchestrator:
fetch_and_parse_html(url, selector?, max_elements?, method?, headers?, timeout_seconds?, result_key?)run_duckdb_query(sql, result_key?)generate_plot(code, result_key?) โ should save a matplotlib figure to output.png; returns { data_uri }run_python_with_packages(code, packages, result_key?) โ executes with uv; print your final resultrun_python_with_uv(code) is also supported internally; prefer run_python_with_packages.result_key in tool calls. When MINIMIZE_TOOL_OUTPUT=true, full results are stored in shared_results and only a compact stub is added to the model context.AUTO_STORE_RESULTS=true and result_key is omitted, the orchestrator will generate a key like fetch_and_parse_html_1.json.loads(...) in Python)rows = json.loads()import pandas as pd; df = pd.DataFrame(rows)Quick Test
python test_api.py
Run Test Suite
pytest -q
Manual Test
curl -X POST "http://localhost:8000/api/" -H "Content-Type: application/json" -d '{"question":"Scrape ... Return a JSON array ..."}'
Note: the API expects multipart/form-data; the JSON example above is only illustrative of a prompt and will not work against this server.
docker (default): strong isolation, best for productionuv: fast local isolation without DockerUSE_SANDBOX=true |
false |
SANDBOX_MODE=docker |
uv |
SANDBOX_DOCKER_IMAGE=data-agent-sandbox:latestLLM_MAX_OUTPUT_TOKENS=300โ1000 for concise answers; use provider-specific caps as needed.result_key and `` to reference data instead of pasting it into prompts.fetch_and_parse_html over separate fetch/parse to reduce turns.MAX_FUNCTION_RESULT_CHARS down (e.g., 8000โ12000) if tool outputs are still large.gpt-4o-mini, gemini-1.5-flash-latest).Orchestrator.functionsapp/tools/function_map and handlerThis project is licensed under the MIT License - see the LICENSE file for details.
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Please read our Contributing Guidelines for details on our code of conduct, and the process for submitting pull requests to us.
We have a Code of Conduct that we expect all contributors and community members to adhere to. Please read it to understand the expectations.
Made with โค๏ธ by Varun Agnihotri