I’ve loved the dream of chatting with computers since I was really little getting my Commodore 64 to say naughty words using S.A.M.
TV, film, and videogames grew that curiosity through KITT from Knight Rider, HAL from 2001: A Space Odyssey, that Mission Impossible intro, GLADoS from Portal, and more recently with the (slightly problematic) Ava from Ex Machina and Samantha from Her.
The introduction of Siri for voice home automation was pretty dreamy – being able to ask Siri to turn on and off lights, heaters, close blinds, and tell me the CO2 levels in our home blew my mind.
But Siri’s limitations are quickly reached when trying to have a conversation with it.
So when OpenAI announced ChatGPT the first thing I imagined was being able to build a better Siri, and maybe get close to something like Samantha.
Experiments with the Python OpenAI package worked out pretty nicely, so I decided to get a bit meta and co-build a voice chat with GPT-4 itself using Whisper for text transcription and Eleven Labs for the compelling text-to-speech part.
Code: github.com/sighmon/chatgpt-voice
Next I wanted to be able to take Samantha with me wherever I went, so an iOS version was needed. Again I asked GPT-4 to help me build a SwiftUI version. Because iOS has a Speech Recognition API built right in I decided to use that over Whisper for now, but it’ll be interesting to watch the Whisper.cpp project and see how it evolves.
Code: github.com/sighmon/os-one
Apple App Store: apps.apple.com/app/os-one/id6447306476
There’s a little bit of setup to get going – after signing up for API Keys for both OpenAI and Eleven Labs, paste them into the settings view (the gear icon on the home screen) and toggle on Samantha from Her.
The Eleven Labs text-to-speech is amazing – but expensive. So in the future I might try and get a Bark server running and migrate over if it’s fast enough.
Here’s the cost breakdown for almost a month of usage:
- OpenAI: USD
$3.10
(mostly GPT-4 which translates to ~52,000
tokens) - Eleven Labs: USD
$22.00
(100,000
characters or ~2
hours of audio) - Total: USD
$25.10
(AUD$38
)
Toggling between gpt-3.5-turbo
and gpt-4
gives a huge difference in speed (2x
) and price (30x
), but gpt-4
gives so much better results that I find it worth the cost… for now.
Let me know what you enjoy chatting with her about.