I think you are not appreciating how much noone is interested in evaluating ai summaries.
If you want to use ai to summarise something for yourself, you do that. If you think the thread could use a post that is the summary, then you can read the ai summary, decide which parts you agree with and which ones are silly, do your own reading and contemplating and then write a post yourself, summarizing the thread.
I’m definitely going to just skip any ai summaries, whether you’ve included the prompt or not. And I guess lots of other people are not interested in them either.
It’s like you’re posting the results page from a Google search on the topic of the thread, without reading/considering/selecting the results yourself. It’s an intermediate research step, not a useful piece of information or valuable perspective that would be worth sharing.
It sounds like you object to the AI summaries on aesthetic grounds. Fair enough, everyone is entitled to their own aesthetic preferences!
But other than that, I think this sort of hybrid approach is the worst of all possible compromises. We’re still offloading the cognitive load onto AI, which the anti-AI crowd doesn’t like, but since we’re rewriting the AI’s summary we lose the advantages associated with summaries written by non-participants in the thread.
You also myopically focus on “bias”. My table is unbiased, there are still reasons I don’t ask for its opinion. You may point out that you did say all things being equal. Aside from the fact the LLM is not a participant in the debate, and we are, do you really think all things are equal here?
If I object to taking Alex Jones at his word because of his bias and factual errors are you going to hold your hand up and go “well, hang on, all humans have some bias and make errors. So if your reasoning is correct you should, by the same token, disregard any other human the same way.” Do you really think my entire reasoning for dismissing AI is simply that it is not perfect? That’s such a strawman.
Your comparison doesn’t make sense. If someone makes a claim, and they show their reasoning and cite where they got information from then yes, if someone disagrees they should state and explain why. If someone makes a claim with nothing else to back it up the burden should not be upon someone else to disprove it. Aside from me manually going through all the comments and categorising them myself, how am I meant to assess ChatGPTs claims in the summary you gave? If it has made mistakes, it took seconds to generate them and a considerably longer amount of time to refute. There’s a good reason why we get people to explain how they arrived at a conclusion.
This is absolute rubbish. Yes you mentioned you have experience with the tools, but you explicitly said "“the numbers and summary looked more or less accurate to me, and that’s why I felt comfortable posting it”. Emphasis added by me. Your vibe of the state of the conversation up until that point justified your use of the AI, yet it is also what you are trying to use the AI to prove. You said you double checked, but if you did you’d know if it was accurate. Saying it “looked more or less accurate” seems to indicate you did not really double check.
I have used ChatGPT (both paid and free) extensively and from my experience I’ve repeatedly run into issues with it. The same thing is true of Grok and Gemini, although I’ve only used the free versions of these. I do some hobby coding and, even though there is a great benefit to getting help from it, I have seen it absolutely butcher code. With arguments I have seen it repeatedly misrepresent my points or not even understand them. With factual information I’ve seen it make up information as well as sources. Should this not inform my opinion? Should we just assume I’m a liar because you’ve read “relevant scientific literature”? Do I not actually understand my own points, while this state-of-the-art LLM does?
The fact is these tools are not flawed in the same way humans are. If I wanted to write a program I have a pretty good idea of what will happen if I ask ChatGPT to write the whole thing for me. Same thing if I want to refactor the code for a whole project. It will just absolutely shit the bed, and as bad of a coder as I am I’m able to succeed where it fails. Are you going to push back and go “well, you could just as well say people write bad code that doesn’t run, so if we accept that reasoning why should we let a human write it?”. This might sound reasonable on the surface, but it ignores the fact that the nature of the mistakes and limitations of these LLMs are not the same as a human.
I am actually flabbergasted at people who have used LLMs and act as though, or even explicitly state, that the default assumption is to trust what they say until shown otherwise.
How am I meant to argue against the summary? Can you post your prompt?
Oof, no. I object on the grounds that they don’t add anything valuable into the conversation and are just useless noise.
If you offload your cognition to an ai then I’m afraid there will be no meaningful cognition happening between the two of you.
Which is not to say that LLM won’t get to a point where they could meaningfully contribute to conversations. That might even happen soon. Or maybe it will take a long time. Either way, the models that you or I currently have access to are only really useful as tools to help humans understand things, to the human who’s driving them.
If they help you understand something, then share the insight you gained not the detailed notes of the journey you took to reach the insight, unless you really think the journey itself was enlightening in some way.
There is a joke to be found here on why AI is really bad when it comes to arguing, compared to humans.
Even if the people who dislike AI would be right that current versions are weaker in logic than humans - sooner rather than later it will be stronger.
This discussion won’t be getting anywhere if it keeps beeing about the strength of AI.
What it comes down to is: Do you like AI? Yes or no?
That’s a fair question to ask and based on that result, we should make a decision.
(Instead of arguing here over another 100 posts, where neither side will convince the other.)
Yes, I do. LLMs are routinely validated on a variety of text summarization tasks both by the teams building them and by third party researchers. Generally speaking LLMs perform on par with or better than humans across most metrics for standard text summarization tasks (e.g. corpora of news articles). They lag somewhat behind expert humans when summarizing corpora of niche technical documents like cancer research or astrophysics papers.
In addition to reviewing this body of academic literature, I have previously helped run a domain-specific text summarization experiment comparing an automated solution to humans. Based on that I would expect an LLM to be pretty reliable at summarizing threads in a forum such as this unless the thread is very large (tens of thousands of posts) or extremely technical (e.g. complex sequences of go moves in every post).
I’m not strawmanning your reasoning - I’m questioning its existence. At best you’ve given reasons for not uncritically accepting AI-based conclusions, a practice which nobody in this discussion is advocating.
In the previous thread that you referenced, another commenter opined that I was “creating fictitious subtext that wasn’t really there”. So I scoured the thread in search of specific quotes which proved that I was responding to actual claims made by others. This took considerably more work than it took the other user to claim that I was making it all up. Should that user be sanctioned by the moderators?
What would you expect a human explanation for how they arrived at a text summary to look like? Something like “I looked at each of the posts and kept track of recurring themes”? Would you find AI summaries to be any more valid if you were convinced that they were generated similarly?
I have a set of beliefs about the quality of AI text summarization that have been formed through my experiences. Other people have different experiences, and I guessed (correctly, as it turned out) that their standards were likely higher than mine. So I did not feel comfortable posting the summary until I had applied greater scrutiny than I personally thought necessary.
Measuring accuracy of text summarization systems is complicated. You can measure reliability, i.e. how well themes were extracted from individual posts and merged across posts. You can measure completeness, i.e. what portion of the corpus is covered by the summary. You can look at information theoretic efficiency, essnetially how well a person can answer questios about the original corpus using only the summary, relative to the length of the summary. There’s also sampling accuracy: whether or not the summary emphasizes “large” topics in the corpus over “small” topics. And so on.
Measuring any one of these quantities rigorously is tricky and requires multiple annotators to be at all useful, and choosing which metrics to measure depends on the problem at hand. I did not feel it was appropriate to go into these nuances in the thread, so I made a non-rigorous attempt to estimate reliability and completeness and wrote “more or less accurate” to reflect the fact that my results were positive but there were gaps in my methodology.
I’m sure you have - I have too! I would propose different standards if the question at hand were about allowing AI-generated code, AI-generated solutions to math problems, AI fact checking, or AI-generated citations. AI is not as strong in the benchmarks in these tasks, though even that requires nuance - e.g. I would be more likely to trust an AI-generated sorting algorithm than a sorting algorithm written by a human without much programming experience.
I am not proposing any standard of the form “We should not allow X to perform a task because X is known to make errors at that task”. Through all this I still don’t know what position you’re actually taking, but my best guess is that you are proposing this standard for X = AI but not X = humans.
I agree, that would be a very weird and questionable argument to make!
Sure, requiring prompt transparency would be a reasonable policy decision. But it’s worth noting that anyone who has read and followed a thread (true for most active participants) will have some intuitive sense of whether a summary is egregiously flawed, intuitive or not. E.g. they will likely detect topics in the summary that were not discussed, or major topics in the discussion that do not appear in the summary. This falls short of rigorous evaluation, but text summarization carries fewer risks when there are people familiar with the corpus being summarized.
Honestly, I pretty much agree with this take. I think a number of people here simply do not like AI-generated content on purely aesthetic grounds. It’s not even the aesthetics of the content itself - they would accept the same paragraph if it were written by a person but not by a LLM.
The thread is dragging on and on because they are trying to rationally justify their aesthetic preferences, and naturally these justifications are rather flimsy. But they aren’t even necessary - if the forum is at risk of losing users in good standing if the moderators alllow AI-generated content, then they probably shouldn’t.
You mean the prompt used to generate the summary that I posted in the other thread? I didn’t save it, but it was something like “Please give a short list of brief bullet points summarizing the arguments for and against the proposal being discussed in the following thread”. I probably added some additional clarifications regarding what I meant by “summary”, but that was the gist of it.
“AI generated content can be allowed but must be clearly identified as such, and have some relevance to the thread in which it appears” is so diplomatic and vague. It strikes false balance, see Argument to moderation. How is “relevance to the thread” is defined? I thought that every post in a thread should be relevant to it, if no, then why should the “AI content” also be not allowed? How will you define “AI generated” content? For instance, sometimes I use AI to correct my grammar, technically it’s AI generated, so I should always identified as such, right? Another example is that, I use AI assistant to correct my writing in German - and It suggests some alternative sentence construct, different words or style etc., so I pick some of that up and use it in my next writing attempt, how much does that make my content “AI generated”?
People here appealing to Nirvana fallacy: AI is useful, but makes mistakes, thus it should not be allowed. Human answer is not perfect either - and there are degrees of truth in an answer. Otherwise, AI content detector would become Truthness Detector. It’s not black and white (no pun intended). @Samraku’s poll is more sophisticated, in my opinion, and gives realistic cases as to when that might occur. But again It’s difficult to define what percentage of a content is AI, moreover, AI is getting better and at some point it will blur the line between AI vs non-AI content.
I value human-to-human interaction more, and I have cut down on social media use just when I felt like it’s become more and more rife with AI content (dead internet theory and so on). I do use AI sometimes for various purposes, but there is time for that and time for other things. The line between AI vs non-AI for me is aesthetic more than anything else and I’m honest with myself about that, and it’s disingenuous for me to say otherwise.
TLDR; People need to define things. “The devil is in the details.”
TBH my response to this ^^ is strong disappointment at the level of cynicism about this consultation process.
There didn’t have to be any consultation. There nearly wasn’t.
But here we are: I asked the question and gave everyone a chance to speak on it.
No poll is perfect: in fact it’s a totally well established phenomenon that the only purpose of an initial poll is to find out what the real questions should be.
But to respond to this by suggesting that this is a “make it look like the will of the people” exercise is really sad. It’s also simply not true.
The subsequent questions also demonstrate a complete lack of awareness of how moderation happens, as well as missing the point of the thread, which I clarified a number of times.
BECAUSE PEOPLE WERE FLAGGING IT.
The purpose of this thread was to establish whether “it is AI generated” was ALONE enough to warrant removal. BECAUSE PEOPLE WERE RAISING FLAGS AS IF IT WERE.
Another way of saying it is “Do we, OGS, want a policy of no AI generated content, irrespective of what the content is?”
My takeaway from the poll inputs is that the community is NOT that opposed to some AI generated content. “This is AI therefore remove it” is not a well supported idea.
The polls and discussion give a flavour of what people do and do not like.
Moderation works based on this kind of thing: it is always a case by case judgement, There is no “lawyer like based on a contract” answer to the (stupid) question “how will you determine if a post is relevant?”
This question is stupid because the obvious answer is “using judgement, just like any other moderation decision”.
I don’t think cynicism is necessarily bad per se. Cynicism is deeply ingrained in human nature. I don’t try to judge people; I just to try to come to an understanding with them. In retrospect the Elon Musk analogy was terrible; I admit that. My whole point was that moderation should be as objective as possible, so that people can read the rules and just infer for themselves what they are allowed/not allowed to do. That’s better approach for everybody. I’m glad to know that your post here is just preliminary and you will provide more guidance on AI content rules! I didn’t know that this is just a survey of opinions. But, I respectfully disagree with the idea that things should be decided on a case by case basis because subjectivity can lead sometimes to confusion given the vast worldwide user base of this forum or any other open forum.
I’m not trying to disappoint anybody; I’m just trying to give an unfiltered feedback. I didn’t expect this all-caps onslaught; perhaps I was a bit too blunt. I do sometimes give stupid takes; unfortunately, my level of intelligence is barely average. Sorry for that.
Thank you for consulting us on this matter – and thank you for your valuable time. I appreciate your efforts very well!
Thanks - it’s good to thrash it out, and to find that you have a “come to an understanding” attitude.
Cynicismn may not be bad per se, but coming out of the blue and saying “this moderator is just doing a will of the people exercise” is very rude.
It conveys that you have a completely mistaken idea of how the place works, and of the attitude and goals of the people running it.
Let’s take this on face value (while noting that it certainly did not come across as your “whole point” in your post).
It simply does not work like that.
We are not going to be writing rules about each situation, and refining them as cases come up, so that later lawyers can debate whether a given behaviour meets the rules or not.
Which is what happens if you follow the path you are advocating.
Underlying your suggestion is the idea that people can’t determine for themselves what is appropriate without a detailed rule book.
This is not true.
People are very good at seeing rough guidelines, watching how others behave, and aligning themselves.
Our moderation is in line with that. We don’t hold court and fine or send people to jail when they break a law.
Rather, if some behaviour is not in line with what we judge that we want to see here, we point that out and ask people to refrain from whatever that was.
If we (the moderation team) do this out of line with community expectation, it’s not long before we hear about that and need to make a change or discuss it.
And that is what this thread is about. Finding out what people think, judging the mood of the community, so that we have a better idea about where to steer.
Here’s a point that I think people probably widely overlook:
When you signed up to the forum, do you remember agreeing to saying that anything you post may be used to train LLMs for future use?
When someone takes anything you’ve posted on the forums, and includes it into an LLM either to make summaries or to judge the sentiment etc, are they generally doing it with your consent? Are they guaranteeing that even if you now delete your post from the OGS forums, that they will also somehow exclude that from OpenAI, Google, Grok or anything like that’s future training data they use to improve the LLMs from now until who knows when?
We can’t control what users do with the posts on OGS, they’re posted on the internet like anything else.
But are you comfortable if the moderation team decides that yes it’s ok for people to potentially regularly feed everything you write into an LLM where it’s unclear what the agreement is what it does with that data?
I’m sorry to say, but any content you post here is implicitly available to be fed into an LLM.
This (the internet) is where training data comes from.
There is absolutely no point in us saying “you’re not allowed to take other users’ content and feed it into an LLM because they didn’t give you permission”. Actually, no-one needs permission to feed publicly available content into LLMs: they (LLMs) already take it themselves.
We the moderation team don’t get to say whether or not content on this forum goes anywhere else, or where it may be stored or re-published. It’s out there already.
I guess we could take the stance that says “we will protect to the best of our ability our users’ content from being fed into LLMs by disallowing the output of that (summarisation)”.
That would be a narrow-scope question we could collect input on.
I see this as different to the broad question of “is the community willing to see AI generated content in the forum in some way or other, where that content is judged as ‘constructive or contributing in some way’?”.
For the public parts of the forum, at least, consent is not needed for noncommercial purposes, because (as written in the ToS) user content here is licensed under Creative Commons BY-NC-SA 3.0. As far as I know, this should not allow commercial LLMs to use the data here.
Although, I think it’s common knowledge that the big LLMs have a complete disregard to copyright and somehow still go unpunished for it.