Last month, cOAlition S released its Rights Retention Strategy to safeguard researchers’ intellectual ownership rights and suppress unreasonable embargo periods—Creative Commons (CC) keenly supports this initiative.
Modernizing an outdated academic publishing system
cOAlition S’ Rights Retention Strategy was developed “to give researchers supported by a cOAlition S Organisation the freedom to publish in their journal of choice, including subscription journals, whilst remaining fully compliant with Plan S.” Read more.
Under a traditional publishing model, researchers who want to publish their articles in a journal typically need to assign or exclusively license their copyright in the article to the journal publisher. Basically, they hand over their rights to the publisher in exchange for the opportunity to be published in the publisher’s journal. While this model may have worked several decades ago, it is currently unsuitable to the ways in which academic research is funded, conducted, and disseminated. It unjustifiably raises legal, technical, and financial barriers around knowledge and perpetuates unbalanced power relationships among the various players in academia and beyond, from researchers and research institutions to publishers, libraries, and the general public.
Nowadays, with the help of new technologies and the internet, academic knowledge is produced, shared, and built upon at a pace and through methods that call for a completely different approach to publishing—one that favors access, collaboration, and fairness. Many funders (particularly governments and philanthropic foundations) require that research outputs be published openly to guarantee that the public can access, use, reuse, and build upon the knowledge created. This is where open access (OA) publishing comes into play.
Open Access and Creative Commons licenses
OA is a publishing model aimed at making academic and scientific research outputs (publications, data, and software) openly accessible. We are strong supporters of OA and open science and our licenses are the global standard for OA publishing. Our efforts are focused on encouraging and guiding public and private institutions and organizations in creating, adopting, and implementing OA policies. For example, we routinely submit comments to consultations on how to promote better access to publicly funded research, science, and educational content. A few examples include the 2013 White House memorandum on public access to the results of federally funded research, the 2020 US Office of Science and Technology Policy (OSTP) consultation on Research Outputs, and the United Kingdom Research and Innovation (UKRI) consultation on its OA policy.
CC consistently advocates for OA policies on publicly funded research outputs; this has been demonstrated to stimulate knowledge creation and sharing, spur innovation, and provide a better return on investment for funders. Specifically, we advise research funders to require that their grantees publish their research results under the following conditions:
Zero embargo period, so everyone, everywhere can read the research fully and immediately at the moment of publication;
A CC BY license on article(s), to allow for text and data mining, no-cost access, and
CC0 on the research data, to be clear that the data is in the worldwide public domain to the fullest extent allowed by law.
The COVID-19 crisis has only reinforced the notion that openly sharing research is the best way to do research. How could anyone justify an embargo period on COVID-19-related research articles? Or impose a NoDerivatives condition, thereby preventing translations and other valuable adaptations of important scientific discoveries? In order to solve this crisis, scientific research must be shared as rapidly and as broadly as possible.
In response to the COVID-19 pandemic and guided by these “open” values, we helped develop and are leading the Open COVID Pledge: a global initiative that works with organizations around the world to make their patents and copyrights freely available in the fight against COVID-19. We are also working with international organizations such as the World Health Organization in operationalizing the desire of many to freely share their intellectual property related to COVID-19 with anyone who needs it.
Open access and rights retention: the fundamentals
It’s important to remind ourselves that when researchers publish their articles under an OA model using a CC license, they retain their copyright. They do not give any rights away to anyone, whether it be in the form of an assignment to a publisher, as it is the case under most traditional publishing models, or otherwise. Instead, researchers give several broad permissions to anyone to use and reuse the research article, but they continue to hold their rights and can enforce them in the event the reuser fails to adhere to the license.
Further, all CC licenses include multiple safeguards against reputational and attribution risks. These safeguards, that are in addition to and not in replacement of academic norms and practices, are in place to provide an additional layer of protection for the original researchers’ reputation and to alleviate their concerns over changes to their works that might be wrongly attributed to them. CC licenses are also non-exclusive, which means that researchers publishing their articles under any CC license remain free and legally authorized to enter into different publishing agreements with different parties.
Publishing under an OA model and transferring rights over to a publisher are antithetical. The mere suggestion that a researcher would give away their rights to a publisher defeats the whole purpose of what OA aims to achieve. By retaining their rights, as cOAlition S promotes through the aforementioned Rights Retention Strategy, researchers are empowered and keep their freedom to share their research outputs in ways that benefit the academic community and society as a whole.
For nearly two decades, this organization has worked to make the world a more open and equitable place.
When CC first launched in 2001, I was a recently-elected Member of the European Parliament at a time when copyright and access issues were beginning to receive attention.
But throughout my 20 years as a legislator, directly representing over five million people in Scotland and delivering change for over 500 million Europeans, I took on the task of championing digital policy issues including copyright reform, citizen privacy and data protection, and improving public access to digital tools.
As I reflect, we today find ourselves in a very different world. And as I look to the future, I know the work of CC has never been more important.
We have the opportunity to play a leading role in the global fight to remove obstacles to the sharing of knowledge and creativity.
This matters because of the pressing challenges facing us, as the coronavirus pandemic continues to wreak human and economic devastation across the globe.
Inequality is on the rise, and injustices have been exposed.
The tragic killing of George Floyd sparked the global Black Lives Matter movement, while there have been pro-democracy protests in several countries, including in Belarus only last week.
The challenges and the crises we have witnessed during this extraordinary year have raised legitimate questions about power and privilege.
Who has access to knowledge in our unequal society?
We know that too often it is the hands of the few, not the many, and access is often denied to women, people of color, LGBTQI communities and people from the global South.
We have a role to challenge power and privilege, and the solution to that is to open up access and share knowledge.
During the coronavirus crisis, we saw some progress being made.
It’s a shame that it took a global pandemic to realize this, but I hope the lesson has now been learned.
Yet for every step forward there is also a step backwards.
Some nations have imposed restrictions on the right to information and not all have reinstated them.
And too much knowledge remains out of reach, with museum and library doors still shut in many countries, and digital access not available for so many.
Breaking down barriers is not easy.
Take the example of the National Emergency Library, designed by the Internet Archive to make over 1.3 million e-books available for checkout, free of charge during the pandemic.
I have been a longstanding champion of the need to unlock digital access to drive a new era of development, growth, and productivity for everyone in society.
I’m excited by the opportunity to make a difference.
The work of CC has already proved crucial during this devastating pandemic. The Open COVID Pledge has made it easier for universities, companies, and other holders of intellectual property rights to support the development of medicines, test kits, vaccines, and other scientific discoveries.
Last month we introduced the CC Chapter in Italy to you! This month we’re traveling north to the CC Chapter in The Netherlands! TheCreative Commons Global Network (CCGN) consists of 43 CC Country Chapters spread across the globe. They’re the home for a community of advocates, activists, educators, artists, lawyers, and users who share CC’s vision and values. They implement and strengthen open access policies, copyright reform, open education, and open culture in the communities in which they live.
To help showcase their work, we’re excited to continue our blog series and social media initiative: CC Network Fridays. At least one Friday a month, we’ll travel around the world through our blog and on Twitter (using#CCNetworkFridays) to a different CC Chapter, introducing their teams, discussing their work, and celebrating their commitment to open!
Next up is CC Netherlands!
The CC Dutch Chapter was formed in September 2018. Its Chapter Lead is Maarten Zeinstra and its representative to the CC Global Network Council is Lisette Kalshoven. Since the beginning, the Chapter has been involved in promoting and supporting openly licensed music, open GLAM, open education but over the last year, in particular, it has enhanced its activities covering almost allCCGN Platforms. To learn more about their work, we reached out to CC Netherlands to ask a few questions. They responded in both English and Dutch!
CC: What open movement work is your Chapter actively involved in? What would you like to achieve with your work?
CC Netherlands: We like to work together with the whole open sector. Open Licenses are awesome, but even more so when applied to sectors that really benefit knowledge creation and sharing. That’s why we have members from diverse backgrounds. You can see all Open Netherland members here. Are you a person living in the Netherlands? Join us!
CC: Op welke open thema’s is jullie chapter actief? Wat zouden jullie graag willen bereiken?
CC Nederland: Wij werken graag samen met de hele open sector. Open licenties zijn fantastisch, nog meer als ze daadwerkelijk gebruikt worden door de sectoren die kennis creeëren en delen. Daarom hebben we leden van diverse sectoren. Op onze site kun je zien wie er allemaal lid is van Open Nederland. Woon je ook in Nederland! Sluit je dan aan!
CC: What exciting project has your Chapter engaged in recently?
CC Netherlands: We are worried about the implementation of the DSM directive in Dutch copyright law. Exceptions and limitations are paramount in a working copyright system, and automatic filtering threatens those. We have been active in working towards a positive implementation of the new ‘Copyright Directive’ (#DSM) – informing government and parliament on the importance of open knowledge, licenses and broad implementation of exceptions and limitations.
CC: Wat is een project waar jullie chapter recent aan gewerkt heeft?
CC Nederland: Wij maken ons zorgen over de manier waarop de Europese richtlijn voor auteursrechten in Nederland wordt geïmplementeerd. Uitzonderingen en beperkingen op het auteursrecht zijn belangrijk voor een goed werkend stelsel. Automatische filters zijn hier een bedreiging voor. De afgelopen tijd hebben we ons ingezet om de implementatie positief te beïnvloeden, o.a. door de overheid en het parlement te informeren over het belang van open kennis, licenties en een juiste implementatie van de uitzonderingen en beperkingen.
CC: What do you find inspiring and rewarding about your work in the open movement?
CC Netherlands: The Dutch Chapter and @OpenNederland, the association that runs the Chapter, brings people together from all the corners of the open world in NL, open design, healthcare, heritage, education, and more. Thus far this has led to crossovers that did not take place before, like looking at open education from the user experience of a student: what can open education mean for your entire learning path from toddler to adult?
CC: Wat vinden jullie inspirerend en waar halen jullie voldoening uit bij jullie werk in de open beweging?
CC Nederland: Het Nederlandse chapter en Open Nederland, de vereniging die het chapter ondersteund, brengen mensen bij elkaar uit alle hoeken van de open beweging. Bijvoorbeeld open design, gezondheidszorg, erfgoed, onderwijs en meer. Dit heeft al geleid tot kruisbestuivingen die niet eerder plaats hebben gevonden, zoals het bekijken van open onderwijs vanuit het perspectief van een leerling. Wat kan open onderwijs betekenen voor iemands onderwijs carrière, van kleuter tot volwassene?
CC: What projects in your country are using CC licenses that you’d like to highlight?
Dutch GLAMs have been active with open licensing for a long time. Beyond our beautiful Rijksmuseum, also have a look at the Re:VIVE project, which invites artists to remix old archival sounds; @benglabs, which aims to make audiovisual heritage open and searchable; or the beautiful collection of the city archive of Den Bosch, with these billiard playing ladies.
Kenny Vleugels, a game developer from NL, creates really cool CC0 game assets.
We like to party when it is Public Domain Day in the Netherlands. We organise a fun and informative day with lectures about the Public Domain, but also about the creators whose work now entered the Public Domain. See full videos and photos from the 2020 edition here. International coordination takes place through pdday.org.
Did you know the Dutch government uses CC0 as their standard on all text and data on websites? They have been doing so since 2010, and were—as far as we know—the first to do so. See the notice here.
We also have an award for the best re-use of open government data, the Stuiveling Open Data award. The 2019 winners researched fraud in healthcare using open data.
Sharing government news in the current Corona-crisis is more important than ever, but it can be tough to weed through. The Open State Foundation has made all local government news accessible through one platform, all openly licensed.
CC: Wat zijn projecten die CC licenties gebruiken en die je graag onder de aandacht wil brengen?
Nederlandse culturele instellingen delen hun collecties al geruime tijd met open licenties. Naast het welbekende Rijksmuseum zijn er ook initiatieven zoals
Re:VIVE, een project waarbij kunstenaars en muzikanten uitgenodigd worden om geluiden uit archieven te remixen,
@benglabs, dat audiovisuele archieven ontsluit en doorzoekbaar maakt,
Of de fantastische collectie van het archief van Den Bosch, met deze biljartsters.
Kenny Vleugels maakt gave CC0 gelicenseerde game componenten,
We vieren jaarlijks Publiek Domeindag, een leuke en informatieve dag waarbij we aandacht besteden aan de werken die publiek domein zijn geworden en de makers van deze werken. De foto’s en video’s van Publiek Domeindag 2020 zijn hier te zien. Internationale coördinatie van publiek Domeindag vieringen ondersteunen we met pdday.org.
Het delen van nieuws van de overheid is zeer belangrijk in de huidige Corona-crisis, maar het kan lastig zijn om de juiste informatie te vinden. Open State heeft al het nieuws op lokaal niveau op één platform gebundeld, onder een open licentie.
CC: What are your plans for the future?
CC Netherlands: We hope to grow our membership in the coming year, engage more with our community, and do more outward-facing projects.
CC: Wat zijn jullie toekomstplannen?
CC Nederland: We willen nog meer leden aantrekken, onze huidige leden activeren en meer betrekken bij onze werkzaamheden en meer zichtbare projecten doen.
CC: Anything else you want to share?
CC Netherlands: The rise of algorithms determining possible copyright infringement can also have a negative impact on open content, because these algorithms do not take open licensing in account enough. That’s why we’ve started working on “Filter me niet” (Filter me not) in which we look for ways to indicate that you’re purposefully CC licensing to let others remix your work. The first results are in Dutch only, here.
CC: Wat wil je verder nog delen?
CC Nederland: Toenemend gebruik van algoritmes, om potentiële auteursrechtenschendingen te identificeren, heeft negatieve consequenties voor open content. Deze algoritmes houden onvoldoende rekening met open licenties. Daarom zijn we Filter Me Niet begonnen, een project waarin we manieren onderzoeken om actief aan te geven dat je bewust Creative Commons licenties gebruikt om je werk beschikbaar te stellen voor hergebruik. Een eerste resultaat is te zien op www.filtermeniet.nl.
Thank you to theCC Netherlands team, especially Lisette Kalshoven and Sebastiaan ter Burg for contributing to the CC Network Fridays feature, and for all of their work in the open community! To see this conversation on Twitter, click here. To become a member of the CCGN,visit our website!
The wrap-up party for the annual CC Global Summit is always incredible, featuring local artists and musicians who send us off in style. Of course, things are a little different this year as we’ve transformed our in-person event to an entirely virtual one—but that doesn’t mean we can’t find a way to party together like we usually do!
This year, we want to close the CC Global Summit (19-24 October 2020) by celebrating with musical performances showcasing the artistic talent of our global community. We’re looking for musicians, singers, DJs, dancers, or performance artists! Some things to keep in mind:
The performances must be set to CC-licensed music
We’re prioritizing diversity in languages and are actively seeking non-English performances
You can perform in any genre but it must be in line with our Code of Conduct
The deadline to submit your application is Friday, August 28.
If selected, you’ll work with the CC Summit Production team to record a one-song video performance that will be included in our CC Summit Closing Concert. The concert will be pre-recorded and released at the closing of our virtual event and shared afterward on Youtube. Join us!
In the second part of our series on artificial intelligence (AI) and creativity, we get immersed in the fascinating universe of AI in an attempt to determine whether it is capable of creating works eligible for copyright protection. Below, we present two examples of an AI system generating arguably novel content through two different methods: Markov Chains and Artificial Neural Network. We then apply the copyright eligibility criteria explained in “Artificial Intelligence and Creativity: Why We’re Against Copyright Protection for AI-Generated Output” to each example.
Here’s the gist: Through the Jane Austen examples below, it’s clear that the seemingly “creative” choices made by the AI system are not attributable to any causal link between a human and the result, nor is it a human that defines the final form or expression of the work. The randomness elements incorporated in an AI program is what gives the illusion of creativity—and the closer one gets to a semblance of a creative work created by a human, the higher the similarity, thus the lower the originality. All this leads us to conclude that the copyright protection requirements of authorship and originality are not satisfied.
Method 1 – Markov Chains
Suppose you wanted to develop an AI system that could write like English novelist Jane Austen (1775-1817). To do this, one might model writing a sentence as a Markov chain. First discovered by Andrey Markov, a Markov chain is a stochastic model describing a sequence of events where the distribution of possibilities for the next event is dependent only on the current state of the sequence up to that point. These models were first applied to language by Claude Shannon in his groundbreaking paper, A Mathematical Theory of Communication.
An image of the title page from the first edition of Jane Austen’s “Sense and Sensibility (1811)”. This image is in the public domain via the Lilly Library at Indiana University. Access it here.
For example, the word “Mrs.” (capitalized and with punctuation) occurs 2,157 times in the complete works of Jane Austen, and words following “Mrs.” are “Annesley,” “Gardiner,” “F.”, etc. The AI system would then randomly select from the list of words that follow “Mrs.” to get a possible continuation of a sentence starting with “Mrs.” By leaving the repeated elements in the list and selecting from it uniformly at random, a preference for selecting words that occur more frequently after the “seed” (or initial) word is ensured.
Let’s say the AI system randomly selects “Annesley” from the list to follow “Mrs.” This process can then be repeated with the list of words that follow “Annesley.” The word “Annesley” is less common (occurring only two times) and is followed by “to” and “is.” This process can be repeated multiple times to create a growing sentence stub and eventually construct something that resembles a sentence, like:
Mrs. Annesley to it, to be mistress as the room for drawing room to be at the best known intimately judge wisely.
This “sentence” uses real words, which are chosen from Austen’s works, but doesn’t make much sense linguistically or grammatically. In order for the AI system to have more context when choosing words, a standard idea is to try to find words that follow multi-word snippets, rather than single words. In this example, you might look at the list of words that follow the two-word snippet “Mrs. Annesley,” which include “to” and “is.” Note: These are the same words that follow the one-word snippet, “Annesley.”
If you randomly select “to” to follow “Mrs. Annesley,” then you have to find the list of words following the snippet “Annesley to,” and so on. Continuing in this manner, you could construct a sentence like:
Annesley to Miss Thorpe, and then only think! I found that I am sure you would think tall, and on this second interview, such she did not doubt your inclination.
Although this sentence is fragmented, it makes more sense than the previous sentence constructed from one-word snippets. Now let’s look at an example constructed using three-word snippets, starting with “Mrs. Annesley to”:
Mrs. Annesley to Miss Darcy had been standing near enough for her to refuse the correspondence; it was impossible for me not to hear it.
At this level of context, we’re starting to see correct grammar. The sentence almost seems like it could have been written by Austen herself (on a bad day). However, this sentence is completely machine-generated. The longest snippet of words in this sentence that also appear consecutively in Austen’s works is:
…for her to refuse the correspondence; it was impossible for…
However, the context around that snippet is different from Austen’s original work and is actually composed of different sections from several of her works. Here’s an example using four-word snippets, starting with “Mrs. Annesley to Miss”:
Mrs. Annesley to Miss Darcy had been given, to remind her of what anxiety was; but when she saw, on her frequent and minute examination, that every symptom of recovery continued, and saw Marianne at six o’clock sink into a quiet, steady, and to all appearance comfortable, sleep, she silenced every doubt.
In this case, the method is beginning to lose originality. In fact, this sentence is composed of two snippets directly from Austen’s original works:
Mrs. Annesley to Miss Darcy had been given, to remind her of…
…to remind her of what anxiety was; but when she saw, on her frequent and minute examination, that every symptom of recovery continued, and saw Marianne at six o’clock sink into a quiet, steady, and to all appearance comfortable, sleep, she silenced every doubt.
These snippets are stitched together at “to remind her of.” The first snippet is from Austen’s novel Pride and Prejudice and the second is from Sense and Sensibility.
From these examples, it’s clear that expanding the “context” (i.e., the snippet length) increases the probability that the AI system will produce something akin to proper English, but it also decreases the originality of the output. To increase originality, the system requires more text from the original author’s works to be given as input. Even with this simple method, a system can produce fairly realistic English prose. In fact, the actual limit on the quality of content generated by this method turns out to be processing power, computation time, and storage. Also, since the goal is to generate prose only in the style of Jane Austen, the set of possible input text is limited to her works.
The Markov chain described above is just one example of a more general concept called a language model. In technical terms, language models are probability distributions over sequences of words in a language. In our case, we are interested in the probability that a word will occur as the next word, given a sequence of words up to some point. In this model, selecting at random from the probability distribution of possible words following the sequence up to the current point allows us to generate “prose.” As of this writing, one of the most recent large language models is called GPT-3, and was produced by an organization called OpenAI.
Method 2 – Artificial Neural Network
GPT-3 is a considerably more sophisticated model than the Markov chain. In fact, it’s an example of an Artificial Neural Network (ANN). An ANN model is quite complicated, but here’s the gist: it’s a computational model based on the neural networks of the human brain.1 Just as our brains are composed of interconnected processing elements (i.e. neurons) to process information, this artificial system also consists of a neural network that works together to solve a specific problem. Further, just as humans learn when given more information and subsequently change their actions to solve a problem, this artificial system also learns based on its inputs and outputs.
For example, to train an ANN model to predict the next word in a sequence, we make many predictions from different snippets of text per second and use a mathematical process to adjust the ANN model after each incorrect prediction. The adjustments are in the form of slightly changing the values of different numerical parameters in the model. Because the same parameters are used for each snippet, we need many of them to make a general enough model so that we can make predictions based on any arbitrary input sequence. (The large version of GPT-3 has around 175,000,000,000 parameters!) After several iterations of the process above to improve the model, we can generate new text by feeding the model existing text, appending whatever word it predicts next, and finally feeding the result back into the model. In reality, this process is a bit more complicated than described above but the general idea is that it allows us to generate a novel output on each run, rather than the same thing over and over.
Unfortunately, Brent (CC’s data engineer) couldn’t run the large model on his laptop, so he settled for using GPT-3’s predecessor GPT-2, which only has 117,000,000 parameters. The model comes “trained” out of the box, meaning it has already gone through many iterations of the process described above on English text. A user can “fine-tune” the model by performing further iterations on a sample of the English text of their choosing. Here is an example of the output after training the model for around 10 minutes on Jane Austen’s work:
“Yes, I suppose,” replied Emma, “but I do not think she does a great deal of good, of course, I dare say; but if she could, she might, I must say, but she is a great lady at heart, I do not know whether we know that Mr. Elliot is his kindest sister.”
Note that while it’s not making much sense as a story, there are no real grammatical mistakes, and the “voice” does seem to closely echo Jane Austen’s. In general, every AI method for generating novel content, written or otherwise, involves developing a (potentially quite sophisticated) mathematical model that emulates some intelligent behavior. Then, content can be generated by selecting randomly from a probability space defined by that model.
Applying copyright theories to our AI-generated Jane Austen sentences
On a theoretical level, ideas regarding “authorship” and “originality” as we examined them in the first post of this series appear to be at odds with any conception of AI (i.e. non-human) creativity. As we’ve seen in our Jane Austen example, the seemingly “creative” choices made by the AI system are not attributable to any causal link between a human and the result, nor is it a human that defines the final form or expression of the work. Where humans (such as AI programmers or users) are indeed involved in the creation of AI-generated output in the models described above, this involvement is solely mechanical, and not authorial or creative. The randomness elements incorporated in an AI program is what gives the illusion of creativity—and the closer one gets to a semblance of a creative work created by a human, the higher the similarity, thus the lower the originality. All this leads us to conclude that the copyright protection requirements of authorship and originality are not satisfied.
All said, as much as AI has advanced in the past few years, there exists no clarity, let alone consensus, over how to define the nascent and uncharted field of AI technology. Any attempt at regulation is premature, especially through an already over-taxed copyright system that has been commandeered for purposes that extend well beyond its original intended purposes. AI needs to be properly explored and understood before copyright or any intellectual property issues can be properly considered. That’s why AI-generated outputs should be in the public domain, at least pending a clearer understanding of this evolving technology.
Notes
1. In more technical terms, an ANN can be defined as a class of functions that take vectors in from some vector space and map those vectors to a different vector space. Transitions between functions within the class are defined via an operator which is itself a mathematical function. The operator is designed to “train” the ANN model by minimizing some cost function associated with the output.