OpenAI challenges New York Times lawsuit on fair use and copyright
Media Landscape
This story is a Media Miss by the right as only 3% of the coverage is from right leaning media. Learn more about this data[LAUREN TAYLOR]
OPEN A-I IS CHALLENGING A LAWSUIT FROM THE NEW YORK TIMES IN THE COURT OF PUBLIC OPINION — IN A BLOG POST THE COMPANY SAID THE LAWSUIT LACKS MERIT WHILE SUGGESTING THE TIMES ISN’T PROVIDING A COMPLETE PICTURE OF THE SITUATION.
THE TIMES CLAIMS COPYRIGHT INFRINGEMENT, ASSERTING THAT OPEN A-I AND MICROSOFT USED THEIR ARTICLES TO TRAIN THEIR CHATBOT, CHAT G-P-T.
THE LAWSUIT ARGUES THAT THE TIMES STANDS TO LOSE CUSTOMERS AND REVENUE IF IT’S FORCED TO COMPETE WITH CHATGPT AS A NEWS SOURCE.
BUT WHAT IS THE FULL PICTURE? IS OPEN A-I WORKING WITHIN THE CONFINES OF FAIR USE OR COULD THIS LITIGATION OPEN THE DOOR TO CRAFTING A NEW FRAMEWORK FOR SUCH ASSESSMENTS.
[MATTHEW SAG]
As a copyright lawyer and an academic, this is the first thing that I wanted to know.
[LAUREN TAYLOR]
THAT’S PROFESSOR MATTHEW SAG, A PROFESSOR OF LAW AT EMORY UNIVERSITY. HE SPECIALIZES IN THE INTERSECTION OF INTELLECTUAL PROPERTY AND GENERATIVE A-I.
[MATTHEW SAG]
Generative A.I. is kind of a slippery term. I mean, what we’re really talking about is sort of a subset of machine learning programs. And the way machine learning works is that rather than starting with a theory and then you know testing that like a normal statistician, you basically throw an incredible amount of data at a model and the model keeps tweaking itself in successive rounds of training, trying to get better.
[LAUREN TAYLOR]
COURTS HAVE INDICATED THAT LARGE LANGUAGE MODELS, WHICH PROCESS huge AMOUNTS OF DATA TO GENERATE ABSTRACT INFORMATION ABOUT COPYRIGHTED MATERIAL, MAY QUALIFY FOR FAIR USE UNDER U.S. LAW
HOWEVER THIS IS A NUANCED AND DEBATABLE ISSUE – PARTICULARLY CONCERNING POTENTIAL INFRINGEMENT CONCERNS.
[MATTHEW SAG]
One of the things that’s really impressive about The New York Times complaint is that they show, like a lot of examples of, hey, you didn’t just learn abstract things…you kind of seem to have learned how to copy our works exactly. And quite frankly, I was shocked at how impressive the evidence was, but that evidence has not been tested.
[LAUREN TAYLOR]
FOR EXAMPLE, YOU CAN HEAD TO CHAT GPT AND ASK IT TO SUMMARIZE SHAYS’ REBELLION, A FARMER LED UPRISING IN MASSACHUSETTS IN 18-86. THE GENERATIVE A-I SPITS OUT 150 WORDS SUMMARIZING THE MOVEMENT WITHIN SECONDS.
OR YOU COULD PROMPT IT TO WRITE A SONG IN THE STYLE OF YOUR FAVORITE ARTIST ABOUT DRINKING A CUP OF COFFEE ON A SNOWY DAY.
THAT BRINGS THE QUESTION, WHERE DID THAT INFORMATION COME FROM AND HOW CAN YOU BE SURE IT ISN’T A DIRECT COPY FROM OFFICIAL SOURCES?
THESE ARE CALLED REGURGITATIONS OR MEMORIZATIONS. THIS MEANS THE MODEL MIGHT GENERATE TEXT THAT IS SIMILAR OR IDENTICAL TO PHRASES, SENTENCES, OR PASSAGES FROM THE DATA IT WAS TRAINED ON. IT’S A PHENOMENON WHERE THE MODEL APPEARS TO REPRODUCE OR MEMORIZE SPECIFIC PATTERNS FROM ITS TRAINING SET RATHER THAN GENERATING NOVEL OR CONTEXTUALLY APPROPRIATE RESPONSES.
IN RESPONSE TO THE TIMES’ LAWSUIT OPEN A-I SAYS THAT QUOTE:
The regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate.
[MATTHEW SAG]
You know, the NYT complaint it’s impressive. And if what they’re showing really is representative of what the GPT four is doing, then you know you can, you know they’re hard put to argue that it’s a non expressive use. I’m still skeptical but I think it’s one we have to wait and see how it plays out.
[LAUREN TAYLOR]
OPENAI AND MICROSOFT HAVE NOT SUBMITTED FORMAL COUNTER-ARGUMENTS IN THE NEW YORK CASES. THEY ARE REQUIRED TO ANSWER SUMMONS BY JANUARY 18.