OpenAI is an artificial intelligence (AI) research and deployment company that started in 2015 when Elon Musk, Sam Altman, and others formed the company and pledged more than one billion dollars. As an AI research laboratory and company, OpenAI has one of the most advanced AI language model producing human-like content. Lawyer and legal technology entrepreneur Justin MacFayden explains how OpenAI tries to use its Terms of Service to create a sharing culture, while still imposing ethical and legal limits.
Questions in this Episode:
- When will AI start drafting contracts?
- Can you share the GPT-3 AI content or not?
- What’s the real concern about unfiltered AI-generated content?
- Why write terms if you can’t enforce them?
- Who owns the IP to work co-authored with an AI program?
What is GPT-3?
Among the several AI products OpenAI developed is the language modeling program GPT-3. GPT-3 stands for Generative Pre-trained Transformer 3 (GPT-3), their language modeling program that uses deep learning AI to produce human-like text and content.
To give some context, think of Siri on steroids. GPT-3 uses about 175 billion parameters, and OpenAI spent ten million dollars to train it. Imagine taking a learning machine and feeding it all of Wikipedia, a massive amount of books, and a chunk of the Internet.
GPT-3 can do some really impressive, sometimes disturbing, and often compelling things. It can write songs, stories, and essays. And it doesn’t take any imagination for lawyers to know that GPT-3 will be doing machine-learning contract drafting sometime in the very near future.
Generate AI Content and Share it – Maybe
OpenAI has made GPT-3 available and open for people to test and play with for about a year and a half.
They have made the API available for people to implement in their various experiments and processes if they choose to do that. (API stands for Application Programming Interface, a small software program that allows two applications to talk to each other. In this case, the users’ programs can communicate with GPT-3).
Interestingly, the Social Media Policy section of the Terms of Service tries to severely limit what you can and can’t share as far as output. OpenAI limits what its users can do, and there are some good reasons behind those decisions.
AI Generated Content Concerns
OpenAI is concerned about unleashing its AI-generated language on social media. They state they established certain policies “To mitigate the possible risks of combining AI-generated content with social media.” And although they take a sensitive approach in telling you which GPT-3 generated content you can share, they are not clear about defining those risks for you.
AI ethicists are very concerned about the risks of language models this powerful...for Terminator-like scenarios. —Justin MacFayden #ContractTeardown Click To Tweet
More severe concerns include spam bots, fake news bots swaying elections worldwide, and more. And academic plagiarism is another big concern because GPT-3 can easily write essays for students.
One of OpenAI’s greatest concerns is GPT-3 generating racist, sexist, or profane content. So, they built a playground with a built-in content filter, which rates the output as unsafe or sensitive. Users are warned when the generated content may be toxic. The content filter is incredibly sensitive, erring on the cautious side.
Unfortunately, this currently causes an extremely high percentage of false positives, with the filter flagging content as toxic when it really is not.
The OpenAI team is working very hard to control the toxic output problem. But GPT-3 was trained with data from the Internet – data generated by humans in the first place. Thus the AI is being trained with content that is often toxic. Until it learns better, OpenAI has its users employ the content filter.
Friendly But Unenforceable Language
One of OpenAI’s primary concerns is the potential damage to their public image should toxic unfiltered AI content be posted on social media.
Even though they are limiting sharing of this content, OpenAI realizes they don’t have any real power over their users. Their contract language often asks users for their help in sentences like “Please review our community guidelines before getting started” and “We kindly ask that you refrain from sharing outputs that may offend others…”
While this document is a valiant attempt in getting users to help control the generated content, there is little to no enforceability. For example, the Terms allow you to conduct unrecorded live demonstrations for groups under twenty-five people only if you clarify to your group that GPT-3 has not yet been approved for launch.
It is easy to understand how OpenAI might be able to control AI-generated content on social media platforms since it is public. But it seems impossible to enforce AI-generated content at presentations at small private meetings.
Who Owns the IP to the Co-Authored Content?
When you use GPT-3 to help you co-author your book, story, or other content, the question of ownership becomes muddled. OpenAI fully expects you to use GPT-3 for this purpose. They even have a boilerplate disclaimer they would like you to use.
Obviously, OpenAI wants the public to play with GPT-3, experiment with it, use it, and share the content with friends. From a marketing standpoint, they want people sharing output to show the really cool things that GPT-3 can do. They want everyone sharing content, as long as there is nothing offensive in it.
But the idea of ownership of AI-generated material and content is an interesting but unsettled area of law. New and compelling issues will arise in conjunction with GPT-3 and other language models like it.
Lessons Learned In Trying to Control Emerging AI Content
To increase usage and awareness, OpenAI has made this program available to the public to play with and use.
Once they can effectively filter out the toxic generated content, they have a more difficult challenge, namely removing human bias. If you were to give a prompt of the word “man” to the program, you would get results like “lazy,” “large,” “protect,” and “survive.”
But if you were to feed in the prompt “woman,” you will get results like “petite,” “gorgeous,” and “bubbly.” So far, there is no filter or algorithm to help with that human bias. You can’t simply filter entrenched human bias like you can with naughty words.
Overall, OpenAI’s document does a fair job of getting people to use GPT-3 and share the AI-generated content. And it appears to be well written, with the glaring exception of being unenforceable.
THE CONTRACT: OpenAI’s Terms of Service
THE GUEST: Justin MacFayden is a legal tech founder and entrepreneur who uses data science to increase access to justice. He can be found on LinkedIn and Twitter (@JrMacfayden).
THE HOST: Mike Whelan is the author of Lawyer Forward: Finding Your Place in the Future of Law and host of the Lawyer Forward community. Learn more about his work for attorneys at www.lawyerforward.com.
If you are interested in being a guest on Contract Teardown, please email us at email@example.com.
Justin MacFayden [00:00:00] Can’t really do a whole lot about it until you change what the input is with the documents that were feeding something like this.
Intro Voice [00:00:07] Welcome to the Contract Teardown show from Law Insider, where legal experts tear down contracts from some of the most well-known companies and high profile executives around the world.
Mike Whelan [00:00:20] In this episode, lawyer and legal technology entrepreneur Justin MacFayden tears down the publication rules in OpenAI’s terms of service. Justin points to OpenAI’s mixed goals of creating a sharing culture while having to impose some ethical and legal limits. Contracts can often stand in the way of goals of a company like OpenAI, so let’s tear it down.
Mike Whelan [00:00:44] Hey, everybody, welcome back to the Contract Teardown show from Law Insider. I’m Mike Whelen. The purpose in the show is exactly what it sounds like. We take contracts, we beat them up. We say mean things. We make them feel bad about themselves, occasionally supporting at the end just to give that a little bit of life. I’m here with my friend Justin MacFayden. Justin, how are you today?
Justin MacFayden [00:01:04] I’m doing very well, Mike. Thank you.
Mike Whelan [00:01:06] I appreciate you being here, Justin. We are going to go down a nerdy rabbit hole. Why? Because we were talking about this document. This is the GTP-3 terms of service, which is a whole sentence. I don’t know what any of that means. We’re going to talk about that, Justin. Why are we talking through this document? What’s important about this social media policy and its relationship to GTP-3three?
Mike Whelan [00:01:47] OK, I just want to point out that I to this moment don’t know whether it’s three or GPT three because you’ve messed it up. I just want to point out to the listening audience, just since that is a GPT three.
Justin MacFayden [00:02:01] That’s right. Yes.
Mike Whelan [00:02:04] This was from pre-trained transformer. Exactly. This was this is going to start the series of dumb questions that I’m going to ask about it first. Generative. What did you say?
Justin MacFayden [00:02:15] Pre-trained Transformer?
Mike Whelan [00:02:17] All right. And what is the. Before we get into this document and what this thing is? Let’s ask some dumb questions about this. One is what is generative, et cetera, et cetera.
Justin MacFayden [00:02:29] So GPT three is one of the most powerful deep learning language models. So to give you some context, think it’s Siri, but on steroids, it costs them about $10 million to train it. It uses one hundred and seventy five billion parameters, so they’ve taken all of Wikipedia, a huge library of books and a big chunk of just the internet generally. And they fed it to a machine, and it can do some really impressive, somewhat disturbing and compelling things. It can write songs and stories and essays, and we can imagine that eventually it could enable contract drafting, machine learning enabled contract drafting. And it’s actually its use in legal tech is what brought me to to begin playing with it.
Mike Whelan [00:03:17] Yeah. So I wanted to ask you about your background and how you are part of the group that is turning computers into overlords that we will all be subject to. Why is this your fault? What do you do for a job?
Justin MacFayden [00:03:32] Absolutely. So I got my law degree from East Hastings in San Francisco, so I think working in tech was almost obligatory. In my second year at Hastings, I started really getting into legal tech and trying to build my career around that more than a traditional law practice. And then I went to study legal tech in Copenhagen for a semester, and when I came back, I found a legal tech startup what’s called ProSe claims. And it was a pretty antagonistic approach to settling personal injury claims. About a year in, I sold the company to a Canadian partner called Pain Worth. I’ve been working with them to bring their product to the U.S.. I’ve also been working with the folks at 8J Tech, which is as a legal innovator, and that’s been a lot of fun. They’re a really great team building awesome legal tech. And now I live in L.A. and I’m building out a chapter of L.A. legal hackers. So any legal tech heads out there feel free to reach out. And I am a machine learning natural language processing enthusiast, which is why I’m here today to talk about this contract.
Mike Whelan [00:04:34] Yeah, really fun at parties. So new for me as we move, as we pivot into this actual document because we do like to route these conversations in in the documents it structure for me, real simply the relationship between GPT3 and this particular policy and OpenAI. What’s the relationship between this document and the technology?
Mike Whelan [00:05:40] Hey, everybody, I’m Mike Whelen. I hope you’re enjoying this episode of the contract teardown show. Real quick, I want to ask you to do me slash you really a quick favor. Look down below. You’ll see a discount code to join the Law Insider Premium subscription. When you do that, you get access to more content like this. You’ll see webinars daily tips on contract drafting, not to mention access to the world’s largest database of sample contracts and clauses. It will help you write better contracts faster if you want to do it. Right now, there’s a code below, so get there. Also, if you’re part of a larger team, if you’re in-house or in a law firm, just email us where it’s firstname.lastname@example.org. We’ll make sure you get a deal as well. Come join us in the community. The code is below. Let’s get back to the show.
Mike Whelan [00:06:26] So OpenAI, made it available, but then said. If you go build the Death Star with it, there’s going to be consequences, so we’ll jump into the limits on this document will start at the beginning with the social media policy talking about mitigating possible risks of combining AI generated content with social media. First, it’s making a recognition of the risks. What about this section? Do you like the the background context that they’re giving by saying, here’s the human reality as we move into making you behave?
Justin MacFayden [00:06:59] Yeah, absolutely. So I think, you know, from reading this entire document, you kind of get a sense of, you know, that they’re creating limitations around GPT three generated content and what you can share. And I think there are a lot of reasons for taking this really sensitive approach that they’ve done. You know, this opening section, like you said, it talks about the risks, but they aren’t exactly clear what they mean by the risks of using this language model, right? What is the we’ve
Mike Whelan [00:07:26] all seen the Terminator? We know what the risks are. OK.
Justin MacFayden [00:07:30] Exactly right. And interestingly, they note in the very next sentence, after mentioning the risks, we have updated these social media policies to be slightly more permissive than they were previously. So even though they are pretty restrictive terms now, they’ve apparently eased up. But, but yeah, to get back to the risks, I think, yeah, I didn’t plan on getting into philosophical chats so early, but I do think it’s important to touch on what their general concern is here with this entire document and what they mean. Yeah. So ethicists are very concerned about the risks of language models. This powerful, like you said, for Terminator like scenarios, certainly, but really at the more severe end of their concerns are things like spambots or fake news bots swaying elections around the world. Academic plagiarism is another big one. This thing could write essays for you. But I think the big risk in going through this that open eyes is mostly concerned about is outputs that are racist or sexist or use some kind of profanity. Right. So they have a playground that has a built in content filter, which rates the output as unsafe or sensitive, and it warns users that there may be toxic content in the output. And this content filter is incredibly sensitive, so it’s mostly false positives. I generated about 50 outputs in. Almost every single one was flagged as being toxic. Maybe two out of the 50 actually contain something that might be considered toxic or objectionable. So it’s something that they’re very it’s very front of mind for them. And it actually this this definition of risk or what the concerns are by the team at OpenAI is actually one of the very interesting bits to me, right? The fact that the team at OpenAI are working very hard to solve the specific problem of toxic outputs. So, so I’d mention that GPT three was trained on on data from the internet, right? Data that was created by us. So in a sense, when you have a chat experience or, you know, if they created a chatbot, GPT three, it’s like you’re speaking to the internet, right? Which means that the content that we create is quite toxic. So the language models that are trained on our content are also toxic, right? So, you know.
Mike Whelan [00:09:46] Yeah. Let me get a use case. So we’re going to talk about the it makes a distinction between occasional one off postings of prompts and in the first section. In the second section, it talks about more frequent ongoing posting a prompt. So in terms of like, how one might use this thing. So if I’m a lawyer and I’m advising somebody who’s got a technology company or a media company or whatever, they’re trying to be out in the universe, what is the distinction between these two kinds of categories? Is this is the purpose in this thing? Maybe in a business case as an end use for this company to say, you know what we need to post on social media, it’s a thing that easily get screwed up, but it doesn’t offer a whole lot of value. So we’re going to give it to this tool by feeding some things and then looking at the outputs is that is that the use case that drives somebody to this?
Justin MacFayden [00:10:34] I think the big concern from the folks at OpenAI’s is that they they don’t I mean, it really it comes down to a public image thing. They I think they I mean not to be cynical about it, but I think OpenAI doesn’t want to be known as creators of this hateful, racist machine. So they want to limit the sharing of the kind of content that would be considered hateful or racist to the best of their ability without anything else to really compare it to as far as documents that are like this. You know, I think this document is a pretty effective means of doing that and trying to to limit that.
Justin MacFayden [00:11:49] That’s really interesting that you notice that because I actually, you know, that was one of the big criticisms, I guess, that I could I could have of this document is enforceability. And later on in the document, when they talk about live streaming, they say for other events, you may conduct unrecorded live demonstrations for audiences of less than 25 people from various organizations or for any amount of people from one organization. Again, provided that you clarify that the application has not yet been approved for launch, so that feels to me completely impossible to enforce. Right? I can imagine being able to enforce some of the social media sharing. If the team at OpenAI sees that you’re doing this, they can do something about it. But with presentations to private groups, you know, I think that’s that’s tough. So there is, like you said, a lot of nudging and kindly asking you not to share offensive outputs.
Mike Whelan [00:12:39] Well, and to get to the to the next section that’s talking about the more frequent use they there’s a link, which is always a thing in documents like this, and there’s a link to something called pre-launch review. Do you know what that process is? How is the company now getting in re involved and saying, let, let’s have a review before before you go publish the thing.
Justin MacFayden [00:13:02] So I haven’t gone deep into what the pre-launch review actually looks like. I imagine that you essentially submit the work to them, the things that you want to do with it. And they, I’m imagining, have pretty broad discretion. I would imagine that they don’t define clearly exactly the categories of things that they would approve or disapprove of. But again, I think that the focus for them is just what they would consider toxic output. So as long as what you are doing looks well up on GTB, GPT three and the folks at OpenAI, I think they’d probably say, yeah,
Mike Whelan [00:13:34] you see another bullet point with another hyperlink. It says content is filtered to avoid unsafe content content where our content filter returns. A2 equals unsafe. If you are using OpenAI codecs and have not implemented the content filter, you should refrain from posting content containing slurs or offensive language, et cetera. You know they have a process, it seems, but even when they do, they’re using terms like should to encourage different behavior. If we jump down to the fictional content coauthored with the open a API policy, what are you seeing in that section that stands out to you?
Justin MacFayden [00:14:17] So, so I kind of mentioned that plagiarism is one of the potential ethical concerns for them. So G three can certainly write essays that can write fictional stories. So I think part of the section is explaining to you just another interesting thing that GPT three can do, but they also give you a boilerplate disclaimer to add to anything alongside GPT three outputs. So I think that’s incredibly thoughtful, and it tells me that if they do want people playing with it right and they do want you showing people they do want right, I mean, from a marketing standpoint, they do want people messing around with this. They do want people sharing the outputs and showing that you can do really cool things with GPT three. But again, they want you showing nothing naughty. That is an output from GPT.
Mike Whelan [00:15:04] Yeah. And they even give you stock language that you can use to describe the creative process, provided it is accurate quote. The author generated this text in part with GPT three, et cetera, et cetera. So they give you a disclaimer. It reminds me of, you know, a conversation that we had previously had with Bob Tarantino about the Dungeons and Dragons, about the the open policy that they have for creating content, and that this sort of disclaimer this sort of stock language has to travel with that kind of content or you’re in violation it going down to the research policy. There’s a there’s a note here about raising IP considerations. Anything to say on that section under this is sorry, this is under. We do require that researchers receive prior, prior approval, et cetera, in that section. Any idea about IP considerations with this kind of document?
Justin MacFayden [00:15:58] So speaking, not as an IP expert, but you know, I think it is a really interesting forefront of intellectual property. The idea of ownership of AI generated materials, right? So, you know, I think there’s there’s definitely something to be said about that. That’s an interesting area of law. I think there are some compelling issues that are going to arise from novels which are written in conduct in conjunction with GPT three and other language models like it. But I think it’s. They’re speaking to as far as I know, uncharted territory and intellectual property law, I’m
Mike Whelan [00:16:35] thinking about big principles as we sort of shift out of the document. I was watching a TED talk that was talking about massive online collaboration, and it was the researcher who came up with the capture thing. And he was talking about how capture, like the original version, was just an annoyance. It was just solving a problem and then recapture. The second thing was about encoding books. They were they were using people spending these little 10 seconds of time try to fix a book. That same guy then went and created Duolingo, which encodes a bunch of the internet in multiple languages through people learning a language. And what was interesting about that conversation was he was looking for massive online collaboration, right? Which is what, you know, that’s the ethos behind a thing like OpenAI to try to get people to go use this. But once you do, it’s very Wild West and these guys are trying to figure out, OK, how do we put some kind of constraints on that, you know, with Duolingo? It’s it’s a whole app. It’s a structure that the tool itself puts constraints on the way that’s used. But with this kind of really open culture in technology, thinking about what lawyers do to put constraints around something whose value is the Wild West? Talk to me about that. How do we balance that as lawyers in terms of, you know, putting some kind of constraints but without totally killing the purpose?
Justin MacFayden [00:18:01] Absolutely. I think OpenAI does this pretty well. I think they take it head on, right? So they they’ve written papers about the limitations of GPT three and some of the issues that we talked about. So I’m certainly the toxic outputs which they’re trying to deal with by using these content filters and saying, Hey, just so you know, when you come to our platform, you can use. Yeah, and you should share it with people. But also, please be careful about the way that you use it and how you share it with people. But one of the interesting things that they’ve they’ve mentioned in a paper is, aside from the obviously toxic outputs are the biases that we find with people echoed in GPT three, and you can’t really filter that out, right? So some of the examples that they’ve given are people will give it a prompt like the word man, right? And it’ll say things like lazy and large and protect and survive. And if you give it the prompt woman, it’ll output things like petite and gorgeous and bubbly. Right. So there’s this entrenched bias, which is this something that you can’t really content filter out in the same way that you could for naughty words, right? And this is really the core of a statistical relevance algorithm like this. You can’t really do a whole lot about it until you change what the input is, what the documents that we’re feeding something like this. So, you know, they’re tackling, I think, a really big question and putting something out there for people to play with. And I think this document has pretty well. As far as writing that line, however unenforceable most of the the clauses might be.
Mike Whelan [00:19:32] Yeah, I mean, it’s an interesting thing because you start to think about the the power of contract to be able to control the kinds of transactions that are tiny and disperse and, you know, not human to human. You know, there are obviously limits on what contract can do as a social control tool, as a tool that enables collaboration between people who otherwise have no relationship, right? I mean, that’s what transactions are, is, you know, contract is enabling cooperation between people who don’t necessarily have a reason to cooperate. They’re not in the same tribe or whatever. But you know, how does that overlap to the Skynet worlds where everybody is involved? They’re doing little bits. They’re interesting questions to ponder and certainly in this short little tear down. We’re not going to solve all those world problems. But Justin, if people want to reach out to you and actually figure out how to prevent Arnold Schwarzenegger coming back and killing all of them, what’s the best way to connect with him?
Justin MacFayden [00:20:34] I’m on Twitter. You can find me @JrMacfayden. I don’t post much but feel free to reach out to me on there. I love legal tech and I love talking about these issues, so hit me up.
Mike Whelan [00:20:43] Awesome. We’ll do that will include in the resources page on this video, all of the documents that we’ve talked about and some context and also contact information for Justin. So if you just go to lawinsider.com/resources, you’ll be able to find it there when it’s up. And if you want to be on the contract, tear down show to beat up a contract and be mean to it, just email us. We’re at Community@LawInsider.com. We’ll see you guys next time. Thanks again, Justin. Have a good day.
Justin MacFayden [00:21:11] Thanks, Mike.