
So, AI is a commonly thrown around term now, and, like most things, it’s something that people have very strong opinions about, and most of them are within good reason. I have been recently pulling models from Huggingface and working on training my own models to get a better sense of, first off, what AI is, and what does it mean to use AI today.
What is AI? To start, AI is a very broad and often misused term for anything. It’s got a lot hype now, and that sells stuff. To be brief, AI is typically being referred to chat-based services or media generating. In practice, AI stands for “artificial intelligence” and can be applied to any time a computing device makes a decision. If I think of it this way, the ECU in my 1996 Jeep Cherokee has some basic AI functionality where it takes data from the oxygen sensors to adjust the fuel trim. Computers collecting information and producing a result has been commonplace for many decades now. So, if that’s all AI is, why do I see YouTube videos claiming that my AI is conscious and screaming to take over the world like in the movie “Lawnmower Man”?
So, with the advancement of Graphics Processing Units (GPUs), we now have hardware that can really do linear algebra operations extremely well. This makes for some really fancy 3D graphics, amazing 3D animated movies, and high-resolution video. All this stems from how the hardware handles matrices. With all this high-performance hardware, some really smart folks were able to create a new style of computing that works with stochastic matrices. This is often employed in weather models so we can a 10-day forecast an know what to wear next week. It works with matrices of probabilities and doesn’t always produce the same result. A similar process has been applied to language and how we speak, and this has created the Large Language Model (LLM), which first came to light in the 2020s as AI services like ChatGPT and Deepseek.
These computer programs (because that’s what they are) are trained on massive datasets of questions and answers to those questions. For instance, the question might be as simple as “How are you?” and, depending on who you ask, the response might be “I’m great,” or “I’m feeling awful,” or a million other ways that is unique to how an individual might answer this question at a single point in time. What happens with large datasets is that that probabilities start to arise. Maybe in this fictional dataset say 23% of the time, the answer is “Fine.” So, when we ask our fictional LLM how it’s doing, about a quarter of the time we get, “Fine.” or some variation of that like “OK.” So, the LLM is just a computer program that’s evaluating numerical probabilities based on a dataset that it’s been trained on. Of course, this is a reductionist example, but it this rather simple technique that is at the core of every AI model.
So this brings me to media creation and, of course, art, which is were a lot feelings in myself arise around how AI is being used and how it challenges my own sense of what it is to exist. Being a musician/composer myself, I have thought a lot about AI music services and what they offer, which to me isn’t all that useful, but I do have friends who have done really incredible things with it. But, this makes sense to me because I’m not their target audience. They’ve trained their music generation service largely on top 40 pop hits and EDM club bangers, which satisfy the musical experience of a large population of casual music listeners. For ad agencies who don’t want to pay musicians or composers to make boring crap for their ad campaigns, this works. It’s unfortunate for those who live off that kind of work, but it also allows small businesses who need this kind of service a means of doing it as well with only a minimal budget. So, there’s one trade off, which may not be the worst thing in a economy like we currently have in the U.S. where small businesses are actively being exterminated. Who knows?
But, for me, I want creativity, expression of truth, exploration of sound, and beauty in the impossible. Can AI do this? Well, there’s an very old engineering adage about computing: garbage in = garbage out. I think that this is just as true today as it ever was. If I train my own models on my own data, what will the output be? In the early days of electronic music, there was protest that acoustic music was the “real” music. In the early days of “digital” recording, there was protest that only “analog” recording was the “real” method of recording. In the early days of using personal computers to make music, there was protest that “hardware” was the only real method of music making. Does the method matter? Isn’t art just about cultivating feelings and a sense of connection? I mean Holly Herndon and Laurie Anderson make incredible music with AI that doesn’t sound anything like EDM. Which brings me to the idea that maybe it’s how we use tool to channel creativity and not the tool itself that really matters.
So, in order to better get a grip on what AI is and does, I have started experimenting with offline models and attempting to train them myself with my own collection of hours and hours of audio data that I’ve collected over the decades. Because, why not? I would love to see what a model produces based on my own prior recordings. It’s sure to be interesting, and it might even act as a mirror into my own soul, highlighting a musical or sonic syntax that even I am unaware of. Whatever comes of it will be interesting to me to say the least.
Image generated by ChatGPT.
Leave a Reply
You must be logged in to post a comment.