Fair Use vs. Consent enters a whole new realm

Bruce Schneier is a long-time digital and technology privacy advocate who’s written some of the most important and influential books on those and related subjects.

He posted this brief article yesterday, nicely summarizing the grey area issue surrounding what constitutes fair use of material readily available to the public vs. what people should have to give consent for others to use.

The reason this case has gotten the attention it has is that it involves OpenAI (which you may not have heard of) and Microsoft (who you’ve definitely heard of). Specifically, it holds the latter responsible as one of the main contributors funding the research into and training of the former.

Some background: OpenAI is the organization put together to create an “A.I.” system. I use quotes because it’s not really A.I.; it’s not sentient and self-aware, which is a key part of what science fiction-based actual A.I. is, yet the term has long been used by software developers anyway as something that mimics human intelligence. Software “A.I.” has been pretty dodgy for decades, but these days people can now use this OpenAI system to, for instance, produce specific content with very specific parameters. ChatGPT is the name given to this particular system (and that, you may have heard of, since it’s been getting a lot of press lately), and it’s been praised by some but criticized by others for what it can do: Everyday users simply type into a basic command line what they want it to do, and in literally just seconds, it will start to produce the results.

ChatGPT can, as just some examples, write lyrics for a song about any subject in the style of any singer/songwriter you wish. As a test, I had it write a song about cheeseburgers in the style of Jack Johnson, and damn if you read what it cranked out and couldn’t hear Jack singing those lyrics in his catchy song style.
You can ask it to write a speech for you. I asked it to write a short speech praising Democrats in the style of Donald Trump. It had everything perfect, down to Trump’s signature verbal styles and flourishes.
And you can ask it to come up with ad copy of certain lengths for specific media and specific clients. The ad copy samples I’ve been reading and mucking around with in post-production as I edge my way toward doing voice over work was all written by ChatGPT.

Some of the concern about what ChatGPT can do is that, for instance, a high school student could go into ChatGPT and type in “write a ten page essay on the influence of Impressionism on artwork in the twentieth century”, and it would produce that content for him in a couple of minutes while he went and made himself a sandwich. Suddenly, to write a paper of any length on any subject, no research is needed, no sweat or effort or time is spent on writing the paper personally. It certainly won’t be flawless, and will need some editing and revision, but that’s of course a drop in the bucket of time spent compared to researching and writing the whole thing from scratch.

Further, whereas educators have long been able to use some tools to help combat students handing in work that isn’t their own — websites have been around for a while now that allow teachers to upload essays to compare to other essays in its databanks to see if its content matches other content previously submitted, meaning the student pulled big chunks of unaltered material from some established website or service offering the completed work for a price — with ChatGPT, that opportunity is wholly eliminated. The nature of ChatGPT is that it’s constantly learning, and changing, and growing. What that means is that not only would a teacher not be able to verify if this content had been mechanically generated because ChatGPT would create it on the fly so there’s no version of that paper that previously existed, but literally, even if the teacher knew the exact phrase a student typed into ChatGPT to have that material content produced, it still wouldn’t write the same material again. More, it wouldn’t write the same material twice even for the same student using the same typed request right away. In the minutes since that first typed request was made, ChatGPT has learned and changed and grown — it also assumes that the same person asking for the same thing again must want something different than what it just produced — and so it would generate a unique essay, again and every time.

My wife, a teacher, pointed out that good teachers who pay attention are of course able to identify which kids write in what way, so they’d be able to flag something that clearly had been written by someone (or something) else. Which is absolutely true. But as kids get into more advanced grades, they write more material but less often, which gives teachers less of a sampling of any one kid’s writing style to gauge other work against. And what if that kid has simply always used something like ChatGPT to produce the bulk of the work he’s handed in? That’s an apples-to-apples comparison that no teacher of any experience would be able to (nor should they be expected to) pick up on.

This is going to be a huge issue for educators to handle in the very near future, and is a real-world concern here and now for the Writers’ Guild of America, whose members are on strike because their incomes have literally gone down over recent years while the production companies they’re working for, and upper management for those companies, are pulling in huge profits.

More on that in a minute.

How ChatGPT constantly learns and changes and grows is the point of the lawsuit and Schneier’s article.

The problem is that ChatGPT is directed to “scrape” (pull data from) billions of written words across the entire internet. It learns how sentences work. It learns how people write certain things — essays, blog posts, ads, entertainment content — down to how specific people write and, thanks to transposed interview and speech and song material, how they talk or orate or sing.

And there’s the rub: ChatGPT is using content online to learn. But the law suit asks, why does it have the right to do that without the producers of that content giving explicit consent for their work to be swept up and used to teach this system how to recreate what they’ve done?

I suspect that nothing will come of the suit. OpenAI and Microsoft’s lawyers will argue that if content is out in public, there should be nothing stopping a machine learning from it any more than you’d stop an aspiring songwriter studying and learning from the lyrics of Taylor Swift or Adele or Ed Sheeran. That this system does it way more quickly and broadly than anything we’ve seen before is something of a counterpoint, but then where is the line drawn between acceptable volumes of what’s learned from and what isn’t? With it being a huge grey area in which hard lines can’t readily be agreed upon, I think the suit will fizzle out and ChatGPT will carry on.

I’m of a split mind on whether or not this is a bad thing for students. On the one hand, there’s undeniably something wholesome and maturing to students having to do their own research and writing on a topic. On the other hand, systems like ChatGPT are ultimately tools that will probably change some aspect of the world forever. Again, I used it to crank out some sample ad copy for me to read for some voice over demos, and it did it in seconds instead of me taking 10-15 minutes to research current company tag lines to include and then to write out the content, so why ad copywriters (for one) would or should feel obliged to do without that beneficial tool doesn’t make sense to me. Same with writing lyrics. Same with writing code for computer systems. The list of its uses at least as a hugely beneficial shortcut go on and on. And it seems odd to outright deny students the use of a tool that will help make some (at least narrow) aspects of the world easier from now on.
If I’m paid to make a jingle for a car dealership, is it bad if I use ChatGPT to help me?
If I’m having trouble thinking my way out of writer’s block about a specific scene my characters got themselves into, is it bad if I use ChatGPT to come up with ways they could get out of it?
If I don’t want to write out the lengthy, boring coding I need to for this client’s software, is it bad if I use ChatGPT to write the bulk of it?
And, further to this point, if this is the world of jobs that our kids are going to be getting into anyway, is it bad for them to start using ChatGPT to help them get their work done faster and more efficiently in school as well?

That’s me playing devil’s advocate, at least to some degree. Again, I think there’s inherent benefits and character-building to creating one’s own work from one’s own mind and effort. But the line drawn in the sand between what’s acceptable use of ChatGPT and what’s not is, to continue the metaphor, getting eroded by the incoming tide of times a-changin’.

ChatGPT can’t yet deliver quality material of all types and sizes, mind you. The coding it produces is already well known to not be great. And while it’s certainly a concerning new aspect the WGA members will need to bear in mind going forward — some production companies have verifiably been trying out ChatGPT to write TV shows and movies in order to simply circumvent the need for writers as a whole, which is pretty chilling — the fact is that ChatGPT doesn’t do that kind of thing well. Particularly, I image, when it comes to feel and continuity of characters in the likes of a sitcom, where you also need season- or series-long arcs as well as the situation of the week.

At least, it doesn’t do that kind of thing well yet, which is a bit ominous but is the truth of it. ChatGPT, and other “A.I.” systems like the one Google* has recently confirmed it’s working on, will only improve what it can do, literally by the second. Content producers of all stripes, from education to entertainment to computer coding, are soon going to be facing a pretty daunting uphill battle to retain their importance, their very relevance, in a world where anyone who wants to can type in a request for any content of any format and, while it will need polishing, have the a product within minutes for free instead of paying a professional for work that takes days or weeks or months.

I’m expecting (and okay, hoping) that people, and in particular companies, will retain some appreciation for the human touch in work they need done. A.I. systems can now mimic artwork in the style of anyone whose artwork has been widely published (which has been addressed by some including science-fiction heavyweight John Scalzi, who informed his publisher in no uncertain terms that he won’t publish material with them unless the original cover art has been verifiably made by a human), can mimic the writing style of songwriters and I imagine certain authors, and — frustratingly for those of us looking to break into the voice over industry — are getting much better at mimicking voices.

Software is now available as a service that lets podcasters sample their voices and can then do a pretty good job at reading copy aloud as they would read it live. Scammers, those scumbags always happy to edge into new technology to get money from people who don’t know better, are now using “virtual kidnapping” scams, where they use tech to mimic the voice of a loved one and call a relative using that voice to say they’ve been kidnapped and ask for a ransom to be paid. Meanwhile, the person supposedly kidnapped is out and free and totally unaware that any of this is happening. But when you get a call from your child or niece or nephew that sounds and talks exactly the way they do and, crying, ask you to send money now to get them released, that this must be a scam isn’t the first thing that will occur to many, unless they know about it ahead of time. And reports have said that even for those who know that what they’re dealing with is a scam in demonstrations of this technology at work, there’s still a real psychological and emotional impact of hearing your loved one’s voice desperate and scared and needing you to do something to help them.

Like any other tool, modern “A.I.” systems can be used for good or ill, and we’re going to see more of both of those before it really settles into its own place in a world that seems determined to barrel forward with what new thing can be created before stopping to give due consideration to whether or not it should be created.

Stay informed, dear reader. And stay safe.

*As mentioned in my last post, Google is an unprecedented collector of volumes of data from literally billions of sources around the world. If you’re building a system to mimic how people do what they do, being probably the world leader in collecting that kind of information puts you in an exceptionally good position to work from.