Is AI running out of data?

An interesting video was posted here on that subject

I’m curious about this announced shortage of reliable data. Because I don’t understand that thinking about copy of copy of copy as the author says (near the end of his video.)

In fact in our field (go) I understood that the next generation after the victory against Lee SeDol was built without data, from scratch with only the rules of the game and this proved to be even stronger as the one built with datas from thousands years. I think there is some misunderstanding at least in this video on how AI is processing.

So what is your opinion?

4 Likes

The video is about language models that are trained on material generated by humans (texts, images, videos). That’s not like AlphaZero which improved by itself without human interaction. So far, AI can learn go from scratch, but cannot learn math, music, painting, etc. from scratch.

2 Likes

But that’s all about being able to define the most basic set of rules in a field and let IA do its job, which proved to generate a better result in the field of go. Isn’t that asthounishing?

This side is most of the time neglected when reporting about the AI abilities. Like when only mentioning the poor quality of the sources (youtube…). This auto generation is what intrigued me the most.

I am from the generation pre AI. I was very sceptical about the birth of a software able to beat a pro. And I even never thought that a software may come even better as the knowledge we inherited from so long practicing the game. That was my belief.

I’m afraid now to join what you tell me as being a belief too.

1 Like

Yes and no. The main problem is defining a good loss function, ie a mathematical definition of improvement. For instance in go you can define loss as the percentage of lost games in a certain number of games again all the other models of this generations. That’s why the ai can improve itself. If you take mathematics for instance, you can define basic rules (ie axioms) but then you don’t have a criterion for AI to improve at making new theorems.

My point is, we use AI to define AlphaZero, but it is vastly different from LLMs (chat gpt) or diffusion models (image generators) in the sense that AlphaZero doesn’t generate anything, it “just” computes the move with the best probability of winning in a given configuration of the game. In this sense AlphaZero is kinda “easy” to train.

Regarding the vid, I don’t think Rick Beato has a good understanding of Ai models, it sounds like he makes the very common misunderstanding of “AI is a big database of the internet”. Like if AI gets something wrong its because they haven’t trained on the right part of the internet or idk.

And his “copy of a copy of a copy” thing is a gross misunderstanding of “synthetic data”. He thinks that it means “oh now we have so much AI things on the internet that AI is training itself on it” (which is somewhat true btw, nowadays AI trained on the internet are less accurate cause of that). But when scientists refer as “synthetic data” it means that they made up a statistically accurate database. It is very common and doesn’t mean AI learns on itself.

8 Likes

I didn’t watch this video. I find Rick Beato’s opinion pieces usually excruciatingly single minded, although I think his interviews are great.

When it comes to art, I am of the opinion that AI does actually a fine job already. Even though I hate it.

100.000 new tracks are uploaded to Spotify every day. That’s only a fraction of what’s being made. 99.99% of that is either a commercial attempt or otherwise unoriginal artist, meaning they made a copy of a copy of a copy. Or it’s just so far out there that no listener has the specific artistic references to appreciate it in the way the original artist does.

AI already makes music that is not at the bottom of those 100.000.

I produce music for myself and others, sometimes commercially and I am of the opinion that AI is already great in commercial music (advertisement drivel) and better at songwriting than half of amateur songwriters.

Considering much of music and art is very much a copy of a copy with only a small original twist on it, AI has on paper enough data to work with, to improve beyond this level of a good amateur.

However most people want to see emotion in art. Since we know it isn’t there in AI, it’ll never be appreciated as much as an equal art piece made by a human, unless the origin of the song is “made human”, by pretending a human made it or by making a convincing AI avatar artist.

4 Likes

Not only was this not at all the topic of the discussion here, but you completely miss the point of the issue with AI in creative arts.

The sole use of AI in art is for big companies to stop paying artists and fatten more themselves. What if AI makes “better” music than humans? Not only do I strongly disagree, but it wouldn’t even matter.

And yes, humans make copies of copies of copies as you say, that’s just how creative arts work. Music is not filled with plagiarism as you try to imply, rather people who take their inspirations and try to make something new with them and their experiences. No one ever invented something brand new? What a shocker damn. Even your “original artists” are actually inspired by other artists.
Why do you play go if you just play the moves stronger players taught you, if you think like that?

We don’t listen to songs because we like pretending there is a human backstory behind. We like them because they share an experience. And even for more commercial songs that were clearly written by producers to make money, you still have artists and performers behind it. If you prefer listening to AI songs you do you, but don’t insult musicians and listeners who have taste. “AI music is good on average” - who gives a fuck people listen to music which makes them feel things. Your message is not only off topic and ignorant, but also pretentious towards listeners and artists.

By supporting this kind of thought you not only just support greedy companies, but also undermine the work and experience of artists and discourage amateurs from trying to publish out there. Once again, you wanna listen to ai music? Suit yourself, but don’t go “people would be okay if we made up an artist behind it, all music is plagiarism either way”

(and btw ALL music is a commercial attempt if its out on platforms and not free, any distinction is made up. Some people are more greedy than others that’s all.)

3 Likes

For certain sub-goals (reasoning, coding) of LLMs there are zero-like approaches that AI researchers experiment with. Not sure if that will lead to anything, but personally I wouldn’t rule it out.

2 Likes

Zero-shot and few-shot approaches are already working pretty well with GANs (even if its not really sot), I don’t know really abt LLMs tho but it might lead to interesting results. You have some papers?

1 Like

Data is both singular and plural. There is no such thing as datas.

2 Likes

Thx I edit the title

4 Likes

Here is one: [2505.03335v1] Absolute Zero: Reinforced Self-play Reasoning with Zero Data

But I came across multiple when skimming through the hugingface daily papers list during the last months.

1 Like

Thx ill check it out!

1 Like

You have completely misunderstood my post. So either I misrepresented my opinion or you didn’t interpret it right, but either way your response is so far off from what I think I said that I see no reason to respond further.

3 Likes

What makes AI so impressive when playing Go reflects the limitations of the human brain as compared to the scope and scale of the game’s challenge. That’s why we’re dazzled at its prowess. We don’t marvel much at how powerful AI is at playing tic tac toe, or even checkers.

Creating (and enjoying) music as a form of human expression is a rather different category. Equating the two invites confusion.

Go is, at heart, a math problem. Sure, we humans find it fascinating, challenging, and socially engaging. But as a visual math problem, it is ripe for a computational algorithm to tackle without much opportunity for “hallucination” …which is what happens when you copy a copy of a copy.

4 Likes

How do you come to this conclusion?

2 Likes

Not all problems are the same. Of course the holy grail is to build a model without data because data is expensive. That’s why AlphaZero was seen as such a massive breakthrough.

But that worked because Go is just a game tree which can be traversed by a computer program. All the “data” can be computed.

The problems that LLMs solve require some humans in the loop:

  • Participate in a conversation in a convincingly human way
  • Write an opera in the style of Mozart

Without human data, the bot will be totally lost on these tasks.

Programming is an interesting one in that code can be represented as an abstract tree, and a bot can traverse it without guidance from humans. There is also non-human feedback: compilation and tests. I think this is one reason LLM perform so well at programming. But even there, code should still be human-readable, and therefore human data is needed.

6 Likes

I find it interesting that no one has commented on what strikes me as the most revealing and disturbing part of the video. I’m referring to ChatGPT’s answers regarding the sound-engineering questions that Beato posed.

ChatGPT answers the technical mixing question and follow-ups in some detail, which Beato does not dispute. I take this as indicating the answer is correct as far as it goes. However, as Beato notes, the answer fail to provide any context, and as he explains later, the process involves a vast number of variables that must be considered by the mixer. In other words, the answer is a generality that is practically useless. That would be fine if ChatGPT had identified the generality of its answer, but it didn’t.

Next Beato asks if it is an expert at mixing music. ChatGPT dodges around the term “expert” and claims it got its knowledge from “working with countless mixers and producers,” which would be a lie if a human had said that. Beato asks “what records” it has worked on. ChatGPT admits it has no personal experience and compares itself to a “walking encyclopedia,” even though it is not walking (an interesting self-anthropomorphism).

Beato asks where ChatGPT learned its knowledge. ChatGPT says it learned from everywhere and rattles off a list of famous names. It admits it did not intern with these people, but claims it has “all their wisdom in hand.”

Beato drills down, asking, “Where did you get it from?” ChatGPT says, “All that knowledge comes from thousands of interviews, tutorials, and resources they’ve shared over the years.” Beato torpedoes this answer by citing several top people and noting “very little information [is available] on their craft.”

In a human, we would call these answers lying by airy generalities, misdirection, and omission. However, since I don’t think ChatGPT has consciousness, I won’t call it a liar. I will call it a bullshit artist, perhaps programmed to imitate the dodge and smoke tactics of politicians and bureaucrats. If it has discovered these tactics on its own for its own purposes, that is even more disturbing.

6 Likes

If you do not mind an interjection, I’d say that AI can learn language, painting, music and almost everything from scratch, but the result will not be “what humans want or find pleasing” therefore, training them that way is counter productive.

AlphaGo could train on its own because the rules of Go are the same for it, as it is for humans and, indeed, once it trained on its own it came up with moves that humans had not thought and in many case didn’t find pleasing. However, since Go is a game, us humans had to reconsider the value and the merit of our moves and ideas and consider why the AI thought that its new moves were better.

Comparatively you will find people mostly unwilling to reconsider their language, their music or their art, even if the AI came up with languages/music/art completely of its own, because those things are not a game and the result cannot be objectively judged (unlike a Go move which can be researched to see if it is good or not, and there are some observable measurements with which one could claim that a move is good or not).

So, I think that what @jlt is saying is practically correct. :slight_smile:

Tell that to the fanboys/fangirls of various artists around the globe. A lot of them are the definition of a “parasocial relationship” and no longer care about the produced music… :sweat_smile:

You can see that with actors or even athletes, as well. A lot of them tend to have fans way beyond their career’s end and some of them even despite their career, which dispells any illusion about the purity of the “art adoration motive” from the humans that consume said art/entertainment.

A lot of popular songs fail to even share their own lyrics, considering the wide usage of things like autotune or their own lack of proper enounciation. Sometimes that can be part of the appeal (see Weird AL’s “Smells like Nirvana”), but let’s not pretend that every song is somehow sharing an experience and that this is what everyone likes about music.

Do they? How do you know that they are not just using AI now and just plastering their name over it? :wink:

Did you hear about the recent scandal where the editors of actual published books forgot to proof-read the content and they forgot the AI promts in the published text? :rofl:

And if text has some tells, how can you tell when a music is AI generated or not, when in the past decades or so, music production has been invaded by corrective algorithms and autotuning software? :thinking:

Calm down, please.

3 Likes

You nailed it.

What’s more, most LLMs are trained to say whatever it takes to please the prompter. Hence, sycophantic responses that lead to dangerous outcomes when believed by gullible humans.

I’m afraid it’s going to get worse before it gets worst.

3 Likes

ChatGPT can have self awareness without consciousness. So it can be an intended liar.

1 Like