It's Machine Learning, yo

Crick · Postby **Crick** » Wed Nov 22, 2023 5:16 pm

I've heard the same. It's a ton of people obsessed with how they can best bring about AGI (using Language Models?). Take it slow or try it as fast as possible. Seems a bit untethered from reality. Very cyberpunk, though.

Brantly B. · Postby **Brantly B.** » Wed Nov 22, 2023 5:37 pm

OpenAI was created as a think tank by a number of interested parties essentially taking a gamble on them ever producing something of monetary value. Now that they, surprisingly, actually have, those parties are expecting their big payout, and Sam and his crew just want to keep being a think tank and not worry about metrics and profits and all that... corporate stuff.

Presumably they've finished taking Sam somewhere very private and having a very long and pleasant chat with him, and now they're putting him back into the chair to very slowly, very quietly, but also very definitely turn the culture in the direction they would like it to go.

Grath · Postby **Grath** » Wed Nov 22, 2023 6:45 pm

(original post linked under image since no embeds)

As far as I can tell:
Some of the particularly brainwormed OpenAI board members (largely Effective Altruism people, kinda like Sam Bankrupt-Fraud the Crypto grifter/convicted fraudster) tried to do a coup, and forced Sam Altman out; Microsoft promptly offered him a job. Most of OpenAI's employees threatened to quit and follow Sam (likely in pursuit of an equity deal, not actually Cult of the Founder) and now they've gone back and re-hired Sam Altman to OpenAI, with a new board who are more likely to be sycophants.

Mongrel · Postby **Mongrel** » Wed Nov 22, 2023 8:39 pm

I just... I don't get how brainwormed you have to be to think we'll get AGI out of this, especially when - still - we don't understand basic fundamental principles of consciousness, and even before that the general structure and functioning of the brain. Sure there's the "you don't actually have to be able to tell - it doesn't matter!" crowd (uh, yes it does?), but even then none of these chat programs hold their shit together for long under any real scrutiny, especially without hand-coded rails.

But then I guess I do get how you can imagine that if you just look at everything like it's some kind of fucking handwavey magic.

Büge · Postby **Büge** » Wed Nov 22, 2023 9:23 pm

Mongrel wrote:I just... I don't get how brainwormed you have to be to think we'll get AGI out of this, especially when - still - we don't understand basic fundamental principles of consciousness, and even before that the general structure and functioning of the brain.

Which reminds me, isn't Elon Musk forcing that brain computer surgery that drove those monkeys insane into human trials

Thad · Postby **Thad** » Wed Jan 10, 2024 12:21 pm

It's extremely frustrating seeing how many people are falling for the framing that the problem with OpenAI et al is a copyright issue rather than a capitalism issue.

Copyright is manifestly the wrong tool for addressing LLMs' economic impact on creators.

"But they're scraping everything indiscriminately off the web without permission!" Yeah, so do search engines. If you're arguing that mass scraping should, in itself, be copyright infringement, then you're arguing that search engines should be illegal.

So it follows that merely scraping data, in itself, shouldn't be illegal; infringement should depend on what you do with that data after you scrape it. I don't think it should be controversial to say that the output of an LLM should be subject to the same fair use analysis as anything else, but apparently it is.

To the problem that companies are laying off workers and replacing them with shitty algorithms: that's not a copyright problem, it's a capitalism problem. People who are accepting the narrative that it's the former instead of the latter are getting rolled.

Anyone who thinks expanding copyright enforcement will help individual creators rather than corporate publishers hasn't been paying attention the last...every single time we've ever tried that.

Copyright's not a solution to automation and layoffs. Corporations are going to do those things any chance they get, because that's how they function; this didn't just suddenly start last year with the Spicy Autocomplete fad.

Artists, journalists, creators of all types need to be able to create and make a living. That doesn't mean making OpenAI pay royalties to the New York Times. The conversation we need to be having is about worker protections and the social safety net. Strong unions, UBI, government funding for the arts and for journalism. Instead, people are falling for a corporate narrative that the problem is that corporations are profiting without giving other corporations a taste.

KingRoyal · Postby **KingRoyal** » Wed Jan 10, 2024 12:39 pm

I agree and you're absolutely right that there is the broader issue of capitalism at play, since it's always the broader at issue at play

But what I disagree with is this comparison

Thad wrote:"But they're scraping everything indiscriminately off the web without permission!" Yeah, so do search engines. If you're arguing that mass scraping should, in itself, be copyright infringement, then you're arguing that search engines should be illegal.

So it follows that merely scraping data, in itself, shouldn't be illegal; infringement should depend on what you do with that data after you scrape it. I don't think it should be controversial to say that the output of an LLM should be subject to the same fair use analysis as anything else, but apparently it is.

The purpose of a search engine was is to create an index of other sites and direct users there. They do scrape and represent the content, but as a way of showing a user if it's what they're looking for before they go there. In this way it functioned more like a telephone book to direct users to where they wanted to go, which helped justify the advertising appended to it

The LLMs and the image generators are not doing this. They're scraping copyrighted material (which includes people's personal and professional blogs, their digital art portfolios, etc) and then presenting that copyrighted material as their final product to make a profit from. And they're pretending that having to pay the people they take the data from is too much hassle, which is them asking the government to legalize their theft

In a lot ways this is similar to when it came out how much Google and Facebook were harvesting users' private data, and the response was a form of regulatory capture that helped them define the proper way for them to invade our lives for their profit without pay

OpenAI isn't some scrappy underdog. They're backed by Microsoft to the tunes of tens of billions. They absolutely can pay for the works they scrape from, and if history has taught us anything it would be a fraction of the VC cash they pull in. They just don't want to do it and I don't think we should let them get away it

Thad · Postby **Thad** » Wed Jan 10, 2024 1:49 pm

KingRoyal wrote:The purpose of a search engine

But that's my point right there. It's not the scraping itself that is infringing; the purpose of the use must be considered.

In fact the purpose and character of the use is one of the four factors of fair use analysis.

KingRoyal wrote:The LLMs and the image generators are not doing this. They're scraping copyrighted material (which includes people's personal and professional blogs, their digital art portfolios, etc) and then presenting that copyrighted material as their final product to make a profit from.

They're not, though. If what goes out is the same as what went in, then that's not an LLM, it's cp.

Again, fair use has to be considered here -- if they really are spitting out something that is identical to the training inputs, then yeah, that should be infringing. If they're outputting something that's substantially different from the training inputs, then where's the infringement?

KingRoyal wrote:And they're pretending that having to pay the people they take the data from is too much hassle, which is them asking the government to legalize their theft

I didn't buy this line when Lars Ulrich was saying it about Napster, and that was a case where people actually were redistributing copyrighted works in their entirety, as a substitute for the commercially-available versions, and there was no feasible fair use defense.

There are plenty of instances where it's legally and/or morally defensible to copy something without paying the rightsholders (and let's not forget for a goddamn second that "rightsholder" and "creator" are not synonyms). Legally, that's what fair use analysis is for. As for whether it's moral, that's an entirely different question and one that can't be answered through copyright litigation.

KingRoyal wrote:In a lot ways this is similar to when it came out how much Google and Facebook were harvesting users' private data, and the response was a form of regulatory capture that helped them define the proper way for them to invade our lives for their profit without pay

And that's the likely result in an OpenAI defeat: they work out a deal with the plaintiffs, they get to keep doing what they're doing, the publishers get paid but the creators don't.

KingRoyal wrote:OpenAI isn't some scrappy underdog.

And neither is the New York Times. This is a fight between billionaire corporations over how to split the money they get from screwing over creators. The New York Times winning isn't going to mean better journalism. It's going to mean OpenAI keeps doing what it's doing but now Sulzberger gets his beak wet and hey maybe we need a news snippet tax here in America too, because did you know Google is stealing content from newspapers and not paying for it? And have you seen the mass scraping those filthy pirates over at archive.org have been doing for decades?

And yeah, that's a slippery-slope argument, but that's because we're talking about legal precedent here. I'd love to see OpenAI and ChatGPT and the rest shut down, but the ends don't justify the means; you have to consider what the NYT and similar media outlets will do next if the court grants them a favorable precedent here.

KingRoyal wrote:They absolutely can pay for the works they scrape from, and if history has taught us anything it would be a fraction of the VC cash they pull in. They just don't want to do it and I don't think we should let them get away it

But it's not just about them. Copyright enforcement isn't just about one case. Lots of people are scraping data for lots of reasons; some of them are assholes, but the view that we should enforce the law harshly because the defendant is unsympathetic makes for terrible policy outcomes. There are going to be other defendants.

KingRoyal · Postby **KingRoyal** » Wed Jan 10, 2024 2:09 pm

I mean, they can be sympathetic, but that doesn't change the fact that they're taking copyrighted works and then presenting everything underlying about it as their own. There has been no shortage of examples of it repeating sentences from people's written works out. There's even documented evidence of watermarks from images showing up in works

When an artist submits their work to a portfolio website, the ToS itself is specifically designed in a way that grants the website legal permission to be able to be able to reproduce the work in certain ways. OpenAI is arguing they don't even have to do that, and can freely take and use works for authors as they see fit without bothering about consent

At the end of the day, OpenAI has admitted they know they can pay the people they scrape from. Their whole argument is that it's too much of a hassle and they don't want to do it because it ultimately hurts their bottom line. It's not even that difficult of a problem to solve, since the idea of data brokers who can provide them training data with proper legal clearances isn't an idea that's unique to this century

To throw a little bit more context out there, SAG-AFTRA just negotiated a deal that mandates that all actors are entitled to fair contracts for the use of AI likeness and voice in productions. This is part of why celebrities like Tom Hanks and Mark Hamill are onboard, since they were probably de facto approached as individuals for whom consent should be sought and compensated, even though studios can use that data in ways that could be qualified as fair use.

Truthfully, I would agree that I don't think copyright is the most pressing issue. I ultimately don't think that OpenAI and a bunch of other LLMs or image generators would be detrimentally affected in their progress if they had to make sure the data sets they used to train were vetted and approved, even by third party brokers. That stuff is relatively cheap in the face of a $40 billion backing

The real issue I see is the gulf in what these things can do and what the AI leaders have been hyping, and the very real scenarios where this technology is used in mission critical operations where people's lives are at stake, but are attempting a legal framework in which legal responsibility is murky and may fall on the user, even when the device they were sold is called something like "Full Self Driving" or "Autopilot"

Brantly B. · Postby **Brantly B.** » Mon Jan 22, 2024 3:19 pm

The emotional reactions I'm encountering to this sort of thing is getting me way down. We can all agree that the core issue with ML is people not using it responsibly. The problem is that people will get the same feedback whether they attempt to use it responsibly or irresponsibly, because reactions from the general public are gut and uninformed, and reactions from experts are based on an agenda. It's worse than politics. How are you supposed to train the humans in the room to use the machines "right" if they can't even find a defining line between "right" and "wrong"? You almost need to build a machine to run the machines. We as a species just are not fit for managing the future.

Mongrel · Postby **Mongrel** » Mon Jan 29, 2024 9:35 pm

Should we start keeping track of how many times this happens? Cause it's gonna be a lot.

AI'll see you in court — Following lawsuit, rep admits “AI” George Carlin was human-written (via Ars)

Mongrel · Postby **Mongrel** » Fri Feb 16, 2024 5:17 pm

Love this: Air Canada's chatbot gives false advice on booking a bereavement ticket, guy gets screwed, then Air Canada then argues its chatbot is a separate legal entity, which is also entirely responsible for its own actions. Air Canada did not explain why it believes that is the case.

In addition, they also argued that it was okay for one part of their site to lie because another part of their website had the correct information, to which the court replied, in the finest of legalese "How the fuck was the customer supposed to know THAT, you assholes?".

Not only did these arguments not fly, the National Transportation Safety Board is now laying out the carefully-numbered remaining small burnt pieces of it on a tarp in a hangar.

Mongrel · Postby **Mongrel** » Fri Feb 16, 2024 5:34 pm

I'm beginning to think that the upshot of so-called AI is not us ending up with self-aware machines but decreased self-awareness in humans.

It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Re: It's Machine Learning, yo

Who is online