Objective
- The objective of this project is to build a home assistant, similar to Alexa/Google Home, with Raspberry Pi (RPi) and ChatGPT.
- The idea is to eventually develop a completely private assistant powered by an SLM running on a local server, and interface it with additional sensors like cameras etc for a unified home automation assistant.
- In this article, I shared the details of the first phase of this project, where I set up a simple end-to-end RPi interface with a microphone, speaker, and ChatGPT API to answer simple questions.
Hardware
Product |
Purpose |
Raspberry Pi 5 |
Main SBC (Single Board Computer) to run the assistant |
Power Supply |
Needed for steady power supply |
Any bluetooth speaker |
For audio output, ideally with AUX cable |
SD card |
|
Microphone |
|
|
|
Optional |
|
Active Cooler |
To help with keeping the RPi from overheating |
Micro HDMI |
Needed for interfacing with monitor for easier development |
Code
Workflow
Initial Setup
- Install Raspberry Pi OS on the SD card and insert it in the RPi.
- Interface with the RPi:
- GUI: You can connect the RPi with keyboard/mouse/monitor and work on it directly.
- SSH: After your RPi is connected to the local WiFi, you can just SSH into it using:
ssh <username>@raspberrypi
- This way, the RPi can be accessed via terminal/VSCode etc.
Hardware Setup
- Install the active cooler on the RPi.
- Set up the speaker and microphone.
Python Setup
- Clone the github repo.
- Create a new python environment and install packages in requirements.txt
- Setup OPENAI key in
.bashrc
as export OPENAI_API_KEY=<OPEN_AI_KEY>
.
Running the script
cd PATH_TO_REPO
source PATH_TO_PYTHON_ENVIRONMENT/bin/activate
python main.py
- This will run the script in an infinite loop.
- It will continuously listen for audio input and provide a response when an input is detected.
Audio Interface
- This script has 2 ways to interact with the assistant.
- Using RPi microphone + Hot-word Detection
- Current keyword is
corgi
. This can be easily changed in the script.
- Just say the hot-word and the RPi should respond with a corgi bark! That's your cue to start speaking your query.
- After speaking out the query and pausing for a few seconds, the script will convert the audio recording to text and send it to the ChatGPT API.
- Using Audio Recording app from phone
- This approach facilitates using your phone itself as a microphone. However, the setup is slightly involved.
- Install Easy Voice Recorder on phone
- Minimal voice recorder app. Has a nice widget to start/stop recording.
- Install Syncthing on both RPi and phone.
- With this app, the recordings folder on the phone can be synced with the RPi.
- Find the folder where the Easy Voice Recorder app stores all its recordings. Sync that folder with the RPi using Syncthing.
- Add the Easy Voice Recorder widget on homescreen.
- Now, you can record your query on phone by using the homescreen widget. It will immediately get synced and downloaded onto the RPi. The script keeps checking if any new recording is added to the synced folder. Once its downloaded, the script will pick it up, convert it to text, and send it to ChatGPT API.
High-level Python Script Overview
- The script workflow is as follows;
- Waits for an audio input from either of the 2 sources:
- via microphone and hot-word detection
- via phone voice recorder app
- Once an audio recording is received, it is transcribed to text via OpenAI
whisper-1
API.
- The text query is then sent to a OpenAI ChatGPT
gpt-3.5-turbo
API call to get the response.
- The response is converted to audio via OpenAI
tts-1
model API call.
- The audio is played on the connected speaker.
- All temporary files are deleted and loop goes back to waiting again for audio input.
Next Steps
- I plan to expand this work to ideally incorporate the following changes:
- I want to remove the reliance on ChatGPT API. Instead, I want to use an SLM running on my local laptop for the task. This will make it completely secure.
- Implement a RAG pipeline + web scrapper to get up-to-date information.