Sunday, September 22, 2013

Copyright, Metadata, and Attribution

The Berkeley Center for Law and Technology (BCLT) has done some interesting research on copyright, including a white paper that details the issues of performing "due diligence" in a determination of orphan works.

Recently I attended a small meeting co-sponsored by BCLT and the DPLA to begin a discussion of the issues around copyright in metadata, with a particular emphasis on bibliographic metadata. Much of the motivation for this is the uncertainty in the library and archival community about whether they can freely share their metadata. As long as this question remains un-answered, there are barriers to the free flow of data from and between cultural heritage institutions.

At the conclusion of the meeting it was clear that it will take some research to fully define the problem space. Fortunately for all of us, BCLT may be able to devote resources to undertake such a study, similar to what they have done around orphan works.

One of the first questions to undertake is whether bibliographic metadata is copyrightable in the first place. If not, then no further steps need to be taken -- not even putting a CC0 license on the data. In fact, some knowledgeable folks worry that using CC0 implies that there do exist intellectual property rights that must be addressed.

However, before you can attempt to determine if bibliographic metadata can be argued to be a set of facts which, under US copyright law, do not enjoy protection, you must be able to define "bibliographic metadata." During the meeting we did not attempt to create such a definition, but discussion ranged from "anything about a resource" to a specific set of descriptive elements. As there were representatives of archives in the room, we also talked about some of the implications of describing unpublished materials, which have a different legal standing but also provide less self-identification than resources that have been published. Drawing the line between fact and embellishment in bibliographic metadata is not going to be easy. Nor will the determination of level of creativity of the data, a necessary part of the analysis for US law. Note that other types of metadata were also discussed, such as rights metadata and preservation metadata, as well as a recognition that the exchange of metadata will of course cross national boundaries. Any study will have to determine where it will draw the "metadata" line, and also whether one can address the the question with an international scope.

Another complexity is that bibliographic data is already "crowd-sourced" in a sense. For any given bibliographic record,  different contributions have been made by different librarians from different institutions and at different times. This recognition makes it hard to ascribe intellectual ownership to any one party. And while library catalog data may be considered to be factual, it is much more than a simple rendering of facts, as the complexity of the cataloging rules attests. I likened library cataloging to a medical diagnosis: the end result (some scribbles in a file and perhaps a prescription given to the patient) does not reveal all of the knowledge and judgment that went into the decision. Metadata is the tip of an iceberg. That may not change its legal status, but I think that unless you have delved into the intricacies of cataloging it is hard to appreciate all that goes into the fairly simple display that users see on the screen.

The legal question is difficult, and to me it isn't entirely clear that solving the question on the legality of bibliographic data exchange will be sufficient to break the logjam. In a sense, projects like DPLA and Europeana, both of which have declared their metadata to be available with a CC0 license, might have more real impact than a determination based in law. Significant discussion at the meeting was about the need for attribution on the part of cultural heritage institutions. Like academics, the reputation and standing of such institutions depends on their getting recognition for their work. Releasing metadata (including thumbnails in the case of visual materials) needs to increase the visibility of those institutions, and to raise public awareness of the value of their collections. It is possible that solving the attribution problem could essentially dissolve the barriers to metadata sharing, since the gain to the institutions would be obvious.

Perhaps my one unique contribution to the group discussion was this:

We all know the © symbol and what it means. What we need now is an equally concise and recognizable symbol for attribution. Something like "(@)The Bancroft Library" or "(@)Dr. Seuss Collection". This would shorten attribution statements but also make them quickly recognizable, and a statement could also be a link to the appropriate web page. Standardizing attribution in this way should make adding attributions easier, and would demonstrate a culture of "giving credit where credit is due." The symbol needs to be simple, and should be easy to understand. It's time to comb through the Unicode charts for just the right one. Any suggestions?

See Also:


Unicode 1F6A9 - Triangular flag meaning "location"

6 comments:

Anonymous said...

Of course, you need something available from tbe standard keyboarf. My vote: ~

Let the symbol wars begin. ;)

Karen Coyle said...

I like the requirement that it has to be on a "standard" keyboard - although of course we could do like we do with the copyright symbol and have a keyboard equivalent - (c) - that some software will render as the actual Unicode symbol. For that I'm thinking about Unicode 1F6A9, Triangular flag on a post, meaning location information (I'll put it in the body since I don't know how to add it here in a comment) would be good, and could be typed as ">>" or "|>".

Becky said...

Writer Maria Popova has proposed a standard symbol for attribution. I don't know to what extent anyone has adopted it, but it's gotten a bit of attention.

http://www.brainpickings.org/index.php/2012/03/09/curators-code/

stone said...

Hi Karen,
Nice post! Thanks for your sharing,

Tom Morris said...

I didn't realize you were going to be in town. Sorry I missed you.

I'd argue that whatever the graphical representation of the attribution mark the core attribution data should be machine readable.

It's really a failing of our web tooling that automatic attribution wasn't built-in to our tools from the very beginning. The same ease with which we copy and paste text could also been applied to automatically include attribution data with no human effort.

Tom Morris said...

I didn't realize you were going to be in town. Sorry I missed you.

I'd argue that whatever the graphical representation of the attribution mark the core attribution data should be machine readable.

It's really a failing of our web tooling that automatic attribution wasn't built-in to our tools from the very beginning. The same ease with which we copy and paste text could also been applied to automatically include attribution data with no human effort.