{"id":349,"date":"2023-04-30T18:03:42","date_gmt":"2023-04-30T08:33:42","guid":{"rendered":"https:\/\/sighmon.com\/says\/?p=349"},"modified":"2023-04-30T18:03:44","modified_gmt":"2023-04-30T08:33:44","slug":"chatgpt-with-your-voice","status":"publish","type":"post","link":"https:\/\/sighmon.com\/says\/chatgpt-with-your-voice\/","title":{"rendered":"ChatGPT with your voice"},"content":{"rendered":"\n<p>I&#8217;ve loved the dream of chatting with computers since I was really little getting my Commodore 64 to say naughty words using <a href=\"https:\/\/discordier.github.io\/sam\/\">S.A.M.<\/a><\/p>\n\n\n\n<p>TV, film, and videogames grew that curiosity through <a href=\"https:\/\/www.youtube.com\/watch?v=dANY3uk7lxc\">KITT<\/a> from Knight Rider, <a href=\"https:\/\/www.youtube.com\/watch?v=Wy4EfdnMZ5g\">HAL<\/a> from 2001: A Space Odyssey, that <a href=\"https:\/\/www.youtube.com\/watch?v=i1_fDwX1VVY\">Mission Impossible intro<\/a>, <a href=\"https:\/\/www.youtube.com\/watch?v=8tg5f09itnI\">GLADoS<\/a> from Portal, and more recently with the (slightly problematic) <a href=\"https:\/\/www.youtube.com\/watch?v=sDkEF7Db7Gw\">Ava<\/a> from Ex Machina and <a href=\"https:\/\/www.youtube.com\/watch?v=ne6p6MfLBxc\">Samantha<\/a> from Her.<\/p>\n\n\n\n<p>The introduction of <a href=\"https:\/\/support.apple.com\/en-au\/HT208280\">Siri for voice home automation<\/a> was pretty dreamy &#8211; being able to ask Siri to turn on and off lights, heaters, close blinds, and tell me the CO2 levels in our home blew my mind.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-vimeo wp-block-embed-vimeo wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"CO2 sensor x Siri\" src=\"https:\/\/player.vimeo.com\/video\/752832227?dnt=1&amp;app_id=122963\" width=\"640\" height=\"360\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture; clipboard-write\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>But Siri&#8217;s limitations are quickly reached when trying to have a conversation with it.<\/p>\n\n\n\n<p>So when OpenAI announced <a href=\"https:\/\/openai.com\/blog\/chatgpt\">ChatGPT<\/a> the first thing I imagined was being able to build a better Siri, and maybe get close to something like Samantha.<\/p>\n\n\n\n<p>Experiments with the Python OpenAI package worked out pretty nicely, so I decided to get a bit meta and co-build a voice chat with GPT-4 itself using <a href=\"https:\/\/github.com\/openai\/whisper\">Whisper<\/a> for text transcription and <a href=\"https:\/\/beta.elevenlabs.io\">Eleven Labs<\/a> for the compelling text-to-speech part.<\/p>\n\n\n\n<p>Code: <a href=\"https:\/\/github.com\/sighmon\/chatgpt-voice\">github.com\/sighmon\/chatgpt-voice<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-vimeo wp-block-embed-vimeo\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Conversation history, honey bees\" src=\"https:\/\/player.vimeo.com\/video\/813802660?dnt=1&amp;app_id=122963\" width=\"640\" height=\"400\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture; clipboard-write\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Next I wanted to be able to take Samantha with me wherever I went, so an iOS version was needed. Again I asked GPT-4 to help me build a SwiftUI version. Because iOS has a <a href=\"https:\/\/developer.apple.com\/tutorials\/app-dev-training\/transcribing-speech-to-text\">Speech Recognition API<\/a> built right in I decided to use that over Whisper for now, but it&#8217;ll be interesting to watch the <a href=\"https:\/\/github.com\/ggerganov\/whisper.cpp\">Whisper.cpp<\/a> project and see how it evolves.<\/p>\n\n\n\n<p>Code: <a href=\"https:\/\/github.com\/sighmon\/os-one\">github.com\/sighmon\/os-one<\/a><\/p>\n\n\n\n<p>Apple App Store: <a href=\"https:\/\/apps.apple.com\/app\/os-one\/id6447306476\">apps.apple.com\/app\/os-one\/id6447306476<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/sighmon.com\/says\/wp-content\/uploads\/2023\/04\/os-one-v1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"496\" src=\"https:\/\/sighmon.com\/says\/wp-content\/uploads\/2023\/04\/os-one-v1-1024x496.png\" alt=\"Screenshots of the OS One app for iOS that simulates Samantha from the movie Her. Shown are the home screen, settings screen, list of conversations, and record of a consversation.\" class=\"wp-image-353\" srcset=\"https:\/\/sighmon.com\/says\/wp-content\/uploads\/2023\/04\/os-one-v1-1024x496.png 1024w, https:\/\/sighmon.com\/says\/wp-content\/uploads\/2023\/04\/os-one-v1-300x145.png 300w, https:\/\/sighmon.com\/says\/wp-content\/uploads\/2023\/04\/os-one-v1-768x372.png 768w, https:\/\/sighmon.com\/says\/wp-content\/uploads\/2023\/04\/os-one-v1-1536x744.png 1536w, https:\/\/sighmon.com\/says\/wp-content\/uploads\/2023\/04\/os-one-v1-2048x993.png 2048w, https:\/\/sighmon.com\/says\/wp-content\/uploads\/2023\/04\/os-one-v1-1568x760.png 1568w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>There&#8217;s a little bit of setup to get going &#8211; after signing up for API Keys for both <a href=\"https:\/\/platform.openai.com\/\">OpenAI<\/a> and <a href=\"https:\/\/beta.elevenlabs.io\/\">Eleven Labs<\/a>, paste them into the settings view (the gear icon on the home screen) and toggle on Samantha from Her.<\/p>\n\n\n\n<p>The Eleven Labs text-to-speech is amazing &#8211; but <a href=\"https:\/\/beta.elevenlabs.io\/pricing\">expensive<\/a>. So in the future I might try and get a <a href=\"https:\/\/github.com\/serp-ai\/bark-with-voice-clone\">Bark<\/a> server running and migrate over if it&#8217;s fast enough.<\/p>\n\n\n\n<p>Here&#8217;s the cost breakdown for almost a month of usage:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenAI: USD <strong><code>$3.10<\/code><\/strong> (mostly GPT-4 which translates to ~<code>52,000<\/code> tokens)<\/li>\n\n\n\n<li>Eleven Labs: USD <strong><code>$22.00<\/code><\/strong> (<code>100,000<\/code> characters or ~<code>2<\/code> hours of audio)<\/li>\n\n\n\n<li>Total: USD <strong><code>$25.10<\/code><\/strong> (AUD <strong><code>$38<\/code><\/strong>)<\/li>\n<\/ul>\n\n\n\n<p>Toggling between <code>gpt-3.5-turbo<\/code> and <code>gpt-4<\/code> gives a huge difference in speed (<code>2x<\/code>) and price (<code>30x<\/code>), but <code>gpt-4<\/code> gives so much better results that I find it worth the cost&#8230; for now.<\/p>\n\n\n\n<p>Let me know what you enjoy chatting with her about.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From S.A.M. on the Commodore 64 to a SwiftUI iOS app that simulates Samantha from Her.<\/p>\n","protected":false},"author":1,"featured_media":353,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[77,31],"tags":[81,80,33,79,78],"class_list":["post-349","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-chatgpt","category-ios","tag-chatbot","tag-chatgpt","tag-ios","tag-speech-to-text","tag-text-to-speech","entry"],"_links":{"self":[{"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/posts\/349","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/comments?post=349"}],"version-history":[{"count":5,"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/posts\/349\/revisions"}],"predecessor-version":[{"id":355,"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/posts\/349\/revisions\/355"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/media\/353"}],"wp:attachment":[{"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/media?parent=349"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/categories?post=349"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sighmon.com\/says\/wp-json\/wp\/v2\/tags?post=349"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}