Monday, December 8, 2008

If not Google, who?

There's been some buzz about the Library of Congress putting images on Flickr, and the Life/Google partnership, but now the German Archives is putting 100,000 images up on Wikimedia.

It's an interesting choice. Flickr is popular, but so was AOL and Compuserve once. Google is the big dog, but keeps things in their silos (Google Books). Wikimedia has the advantage of international support and a foundation, so it may be a wise choice. It's certainly cheaper than having your own server farm, and provides redundancy. And I'm sure some of those images will end up on Flickr and other sites, for more redundancy.

I've always been a belt-and-suspenders person, so this looks like a win-win situation.

http://commons.wikimedia.org/wiki/Commons:Bundesarchiv

Sunday, November 23, 2008

I Will Fear No Google

Over at Archives Next, I just took a quick look at the Archives 2.0 Manifesto, and one thing really stood out for me. There's a lot of muttering, some from high places, on primary sources being the next target for Google, fed by the Google/Life project that just announced.

Well, duh.

I worked in a photographic archive with 1.5M images, and I've freelanced for archives with image collections. I KNOW how hard it is to describe images in a meaningful way if there's no caption info, and even if there is. I blush to think how long it took me to realize that Jack Niles was John Jacob Niles, and the reason he was in Eastern Kentucky was because he was Doris Ulmann's assistant, and suddenly there were research implications that weren't there before.

If I wasn't there, how long would it have taken for someone else to make the connection? And if they had, would they have told us?

I love the Flickr archives images. The people who know by experience what and where of the FSA photos are dying off, and the common knowledge will become deep research we can't afford to do.

I love the Google copyright registry. Ditto.

Why should we wait until we gather a group and write a grant and get funded (someday maybe)when someone's willing to do it now, at no cost, and more importantly, make it available. What will happen when Google goes away? The Internet Archive. And then? Someone else. There's always someone else, someone with the public good in mind.

We're supposed to serve the scholarly community, and not just the scholars on our campus. If Google Archives puts us out of a job, what's that compared with the public good? That's like lamenting the loss of catalog card typists and electric erasers.

I'm only the third generation in this country. My grandmother used to say "What you know, no one can take from you". Multiply that by a few billion, and no political or natural disaster can wipe out the knowledge.

Books that can't be censored or banned, images that can be freely seen, archives that can be read any time and any place? What's the problem with that?

Oh, I know there are problems and issues, including me learning still another profession. But if we don't act, if we don't partner with people who can make it happen, if we don't move forward and reinvent ourselves, someone else will do that for us.

Google rules right now. It used to be AOL and Compuserve. Things change all the time.

Why shouldn't we?





http://www.archivesnext.com/?p=64

Wednesday, October 29, 2008

Google settles for - everything

Of course, digitizing everything was the goal. And considering what Google has in the bank, $125M is cheap to settle. The bad news is that it's a settlement, and doesn't make any difference in the law, but it does set a non-legal precedent.

Of course, I'm curious if OCLC's fledgling copyright information registry is partnering with Google; it would make sense not to duplicate effort, and crowdsourcing has made sense in other fields. It certainly hasn't hurt LC, with its Flickr experiment.

I'm not as concerned about Google replacing libraries. It's not like Google is taking the books away from libraries, stopping ILL, or burning them. It certainly has a lower barrier to entry than the aggregators, when an individual can pay for full access to only what they want. Certainly better than Corbis, who bought the photos and has locked them up.

That's not to say that it can't change, but the libraries have gotten their digital copies, presumably the Internet Archive will crawl the open collections, and the libraries will know what was actually used if it comes to them doing the work again.

What's the worst case scenario? Google goes out of business, all the digital files go away, and the libraries are no worse than before, and some publishers and authors have made some money they wouldn't have gotten under the first sale provision.

What's the upside? 20% instead of snippets, payment to authors (if they can be found), fewer dead trees, universal access.

I don't have any problems with that.

Friday, October 24, 2008

Socializiing 2.0

Facebook is fun, but low on content. OCLC's Web Junction had content, but was focused on public libraries.

Good news! WJ now has a section for academic libraries, for special libraries, and archives and museums. Now there's a chance for connections between different institutions for collaboration and just getting to know one another.

My Preservation Group is growing, and all are invited to join it. I'm slowly getting some links into the Archives Section, and will be adding to it. Right now there are links to preservation handouts, book repair videos, and disaster plans.

So if you want to meet other archivists and librarians online without the Facebook Fluff (R), check out WebJunction. The more of us who join, the more useful it will be! You do have to join, but it's free and spamless.

Moving my wiki

Which will technically not be a wiki any more, though I'll be glad to add anyone as collaborator who wants to add sites or annotations! PBWiki was getting too weird, with hidden ads; the old wiki will remain, but I won't be adding to it.

The new site is a Google Site, at
http://sites.google.com/site/humanitiesforlibrarians/

Wednesday, October 8, 2008

Out of Context

It's a phrase we hear a lot during the political season - you took my remark out of context. But do archival collections always have context to remove an item from? Is it always bad to scan just part of a collection?

Take photographic studio collection. They were never meant to be documentary, they get a job, they shoot it, they don't know where it'll be used the next day. There may be some context in that shoot, that series of five street shots of downtown, but they have nothing to do with the passport photo before it or the product shots after, unless you're the odd duck studying the business model of studios. And that's fine.

But most patrons want a portrait of their uncle, not the street shot. Or their house, and don't care about the other thousand. Or a particular building for an article. The context is in the image itself.

Or the family papers, four generations of paper - but only the eyewitness letter about Pearl Harbor has any relevance outside the family. The sons baby pictures have no relevance to that event, or the wedding pictures.

So where's the sin in digitizing what's useful, what's interesting, and just telling people there's more? Or, like LC, posting photos on Flickr and letting people identify them for you?

Let's think about what we're doing, and why we're doing it, and not follow dogmatic policies. The odds are that if a patron looks at the whole collection, they're still going to want that one picture they're looking for. When they post it online or publish it in an article, it's going to be out of context again, just like it was for the studio.

We can keep context in the collection, but we can't send it out into the world with the item. Let's do our job and let the patrons do theirs.

Sunday, September 28, 2008

Everything Old is New Again

We've had a little more excitement in Louisville - first the earthquake that struck while I was here at MAC talking about (what else?) disaster preparedness, then the first ever hurricane to hit town. We lost a couple of trees, a gutter and a downspout, my nerve (I was standing next to one tree, rescuing the downspout when it went), and ten days of electricity.

The good part is that the spoiled food in garbage cans lining the alley were a raccoon buffet, I met a nice neighbor with a chainsaw, and I caught up with some reading by candlelight.

The best was from The Primary Source, by Chatham Ewing, about using NUCMUC to catch up on collection level cataloging of special collections.

You remember collection level work, don't you? The big kickoff from ARL at LC in 2003? We all said, yea, in our spare time, right.

You remember NUCMUC, don't you? Well, maybe not. But it will help libraries and archives without the time and expertise create those records and get them into OPACs.

I recommend checking it out. Unless you have too many catalogers and too much time on your hands.

Sunday, August 24, 2008

What can we afford to lose?

Even when the economy is booming, there’s never enough time, money, or people to do everything that needs to be done. We’ve always had to prioritize and make decisions, we just have to make harder decisions now. Today we have two fronts for both knowledge and preservation – traditional paper that lasts for centuries if we care for it, and popular electronic that can be gone in 60 seconds.
We have an obligation to use our resources – cash and people – wisely. We also have an obligation to scholars to preserve both the cultural heritage and new research for future generations. The decisions we make now will determine whether tomorrow’s students and faculty study here, or at a school that thought in the long term.
What's the problem with commercial online journals? Everyone loves them - they're fast, easy to search, printable, you can access them from home, and more than one person can use the same issue. Yea, they're expensive but so are the print issues, and isn't this better to buy? But we don't buy them - we just rent them, for hundreds of thousands of dollars a year. And we rent them by the package, so often we pay for four or five copies of the same online journal - more than we would buy if it were paper. The more we spend on onlne resources, the less we have for permanent ones and preservation.
We can expect even the most acid paper to last a few decades, and good paper to last centuries, even if it’s slowly decaying. But digital can be gone in a nanosecond, if you don’t pay the rent. Even if you do, things can change – journals disappear, or like newspapers, the content can disappear. After the Tassini decision on licensing rights, papers just took all their back issues offline, rather than renegotiate rights. That can happen with back issues of anything.
Why don’t we just do it ourselves? We’re forbidden by the license agreement to print out hard copies from the online version. We can’t digitize them ourselves, because of copyright; we could license directly from the journals to do it, but their license with the big aggregators – the businesses who run the big databases – forbids them.
We could buy and preserve a lot of print journals and books for a million dollars. What we can’t do is make them easily accessible by our scholars – they would have to come here and use them, or use ILL, or microform. Those options aren’t nearly as sexy – and not as easy for distant learners and other non-traditional users.
Preservation vs. access is the classic conflict of librarians. You can keep your Edsel or Pinto in great shape – as long as you don’t take them out of the garage and use them for what they were meant for. Books get checked out and have all kinds of adventures – they go to MacDonalds for lunch, they meet dogs, they take baths. Educating users can help, but we can’t turn back the clock a century and close the stacks. The culture of research has changed, and is changing again.
Preservation isn’t just about saving books, it’s about saving the cultural record. We’ve seen revolutionaries destroy national libraries. The photos taken by the FSA were ordered destroyed – who wants to remember the Depression! We don’t know what will be important to future scholars? Small press books, ephemera, wartime editions? These are “medium rare” books – ones that aren’t unique for the contents, but for the design and marketing. They were the paperbacks of their day. Today they’re studied, 25 years ago I bought them for a dime each.
The idea of books is the distribution of knowledge, so one disaster can’t destroy all the copies. But we’re an ARL library because of our unique resources, the special collections and archives. When all our eggs are in one basket – we better take care of that basket!! Security, climate control, insect control, education are all part of preservation.
Can we microfilm? Sure, but grant money for microfilm has dried up in favor of digitization. Can we digitize? Sure, we just need a few hundred manYEARS, a few million dollars, a couple of thousand gigabytes of server space – and the software, and scanners, and technical expertise to keep it all running. And that’s just to scan the title pages! In the meantime, we need to save the originals for the long term. Like the medium rare books, they have value for their form, as well as their content.
Don’t the big online aggregators, the guys who buy the rights to put them online, have to do the same thing? Nope – they get the files in digital form, and just index them. In fact, most books and journals want their incoming material in digital form, so they don’t have that expense. And that’s why so few journals have back issues online, because the works weren’t submitted digitally.
What will it cost to preserve the paper version? That depends. Think of preservation as insurance – costs determine care. Will a band-aid suffice, or do you need a specialist? Is the money better spent on a few important patients, or on community health – preserving by controlling the environment? Lowering the temperature and humidity helps avoid mold and decay – and the patrons happen to like it, too. But the books don’t leave on spring break, and the air conditioning or heat has to stay on, no matter how hard that is to explain to engineers.
We’ve made a start. What we haven’t looked at is how to preserve digital information – across campus, in all disciplines.
Money for microfilming may have dried up, but the model is a good one. We need a central registry for who’s doing what, whether that’s digitization or preservation photocopying. Co-operation will give us the best return for our money. We may only be able to save a few books, but we can prolong the lives of many. We can have a central registry of who is doing what and who has the designated preservation copy – the one that will be saved physically and not used.
Don’t we have brittle books licked? Haven’t we been working on them a long time? Well, most of the world’s publishing was done on acid paper, in the last 150 years; and we’ve only been working on it for the last 20 – and acidic paper is still being used for printing new journals.

What can we do?

We can get input from scholars on what’s vital, what’s important, and what may be useful. We can make educated guesses, but the more input we have, the better our choices.
We can co-operate with other libraries and archives, to maximize the impact of our efforts.
We can invest in preservation. Books – and knowledge – are capitol assets, not sunk ones. We have plumbers, electricians, groundskeepers to maintain buildings. But not a single part-time preservation officer for seven libraries and 2 million books. That’s just over 100 million dollars of contents – more than the cost of the building, and not counting irreplaceable unique collections of millions of items. Are we good stewards if we let them disappear though neglect?
We can teach faculty the implications of digital information – not just using it, but the effects of the software and forms they use to create new papers influences the way they are accessed – or not accessed - from then on.
We can insist that online journals archives their files somewhere safe. It worked with giant Elsevier, it can work with others. And it must, or we’re throwing money away.
We can discuss our options. Everyone is involved in this, not just librarians. Scholars, administrators, and students need to know why prices for journals, costs for tuition, and research costs are going up, not just that they are.
Most importantly – we can do something. Now. Our preservation work has doubled – now we are responsible for the paper copies and the digital ones. There is no time to lose – and millions of books to lose. Even if Google digitizes twenty million volumes, that leaves millions more in our care.
Let us lead the way.

Thursday, August 21, 2008

The new WebJunction

Public librarians know WebJunction well, but academic and special librarians haven't, and archivists have no clue. It's no surprise, we all have plenty to do and read as it is. WJ is like Facebook without the annoying pokes and such.

WJ has just added sections to broaden its appeal and widen its usage. There are now sections for all of the above. To get the ball rolling, I've started a preservation group. I've posted a few book repair manuals and just now, a link to three short book repair videos created by Rachel Hoff's students at UNC.

I encourage you to use this. It's a great place for getting special/archives/museums together to talk and share - which may lead to grants, since that's a new focus for federal grants!

There's also a discussion list where we can get together.

Friday, July 25, 2008

Moving on

I moved back to Louisville last week. I loved Ohio U and Athens, but major changes in the Ohio university system (a unified state system, centralized processing) meant that my position was eliminated. Best of luck to all my Ohio friends in the reorganization.

Everything changes. Evolution works.

I'll be job hunting, and teaching again, and looking at outsourcing myself - independent workshops and consulting, grantwriting and preservation, mostly. I'll be spending a lot of time updating my wiki of humanities resources for the class, Humanities for Librarians

Take a look and contribute. Why a wiki? I can use a linkchecker, which I can't on a closed PHP silo. Plus it's a lot more fun for students (I hope).

Look for more when I get a chance to read my email again, and my RSS feed. It's not quite an intelligent agent, but it'll do for now. I'm old enough to LOL when I read text messages - I wonder how many people realize that they are the venerable IRC chat abbreviations c. 1995?

Sunday, July 6, 2008

Microsoft's out of this picture

Microsoft has dropped out of the Open Content Alliance. I can't say I'm surprised, and not for the reason you think - since they (or more accurately, Gates) have gotten into philanthropy, they have a real point. Since they focus on global health, their considerable money is probably better spent there. After all, the dead can't read online books (or, the cynic in me says, buy MS products).

But in any case, they were gracious enough to leave their equipment and not lock up their content. Good for them!

What will this mean for the OCA? One less partner, less money, neither of which is good. Will it stop the OCA? I doubt it. There are too many well-endowed partners with too wide a base of support, plus considerable technical knowledge.

Will it ever be a rival to Google? Yes. It does have its niche, as does Project Gutenberg; they appeal to open source fans and people with slow connection rates, and they're serious about access. No silo here.

What is seriously threatened are the silos of online books - NetLibrary and its ilk. While Google may not make all public domain works available and only show snippets of copyrighted works, at least they're findable. What good is full text, if you don't know it's there?

Business models will not be the only thing to change in the next few years, merely the early victims. Libraries are in line, as are publishers. Who else? Who knows? Brave new world!

Monday, June 16, 2008

Copyright considered as a standup routine

As I was looking over the many permutations of copyright terms and requirements, ending with the current morass, I suddenly was reminded of Bill Cosby's old routines. Remember "No one will ever touch anybody ever again!" in the car ride with children?

That's what the current scheme is like. Like among all the clamoring voices arguing about whose term is longer and who has to renew, frustrated Papa Congress said "Okay, everything will be copyrighted forever, now SHUT UP!!!!"

And we have. And we shouldn't.

In reality, only a few people even care about copyright. Those that hold it on commercially viable works do, since they produce income. They should, that's what copyright is for, although the phrases limited terms and public good seem to have been forgotten.

The other people who care are librarians and archivists. We should, our profession depends on the intellectual property of others, and both legally and ethically we should respect it.

But we shouldn't fear it. And we do. Millions of archival items and books languish because people, mostly administrators, fear a lawsuit if they digitize or even just copy. Despite Lolly Gasaway's infinitely useful chart, they don't understand the law and they don't understand risk assessment.

Does it matter that no one has produced the documentation to prove that great-great-great grandad wrote that letter? Let alone got all 187 heirs to agree on who owns what portion of the hypothetical copyright? Would they then be able to agree how to split the 187th of a dollar they earned from the copyright?

In reality, most heirs are just glad to find a digital copy of their ancestor's letter, which they didn't know existed. They understand that they would spend more to research and defend a hypothetical copyright than they would ever earn.

Why don't administrators?

I researched a collection of some academic interest and maybe even a little financial one. The possible corporate owner (after many mergers) has no interest in the rights. There is documentation that they freely gave away the images without restrictions. They were then published in the 40s and 50s without copyright notice. They have been reproduced since they have been in an archive these last 30 years without anyone claiming copyright. Heirs who were contacted say they don't have any rights to them, because they were works for hire. I researched at the Copyright Office and found no records.

Where's the risk here, or in the hundreds of other lesser known collections?

Still, administrators there say they can't be digitized until any possible copyright expires. These are people so risk aversive that when they sneeze they say "Hand me a Kleenex, a trademarked name duly registered and enforced".

ARL is heading up a slow and painful study of the proposed Orphan Works Act, while the Google Guys forge ahead and Brewster Kahle fights on the legal battleground. Meanwhile, millions of people post copyrighted work (perhaps unknowing that it is) online in Flickr and YouTube and blogs and Facebook.

Except adminstrators and congressmen from the last century. Let's all write them letters and put stamps on them and see if the Pony Express can reach them.

Saturday, May 24, 2008

Where are we going?

That's a collective "we" - what's going to happen to academic libraries - and librarians - in the next five years?

I see us becoming archivists, as the physical book loses importance. We may be the caretakers for the copy of record. And I use "the" advisedly. Will we spend the money to have two copies in the library - or even in the state? Or will distributed copies suffice?

Maybe we'll be responsible for printing a physical copy of electronic books and theses. Why even do that? Because electronic documents have two prime characteristics - they are mutable and they are fugitive. You can change them when you want: something we don't want for official documents like laws and vital records. And while paper copies can and do burn or decay, they are longer lasting than bytes, which can be gone in nanoseconds.

Perhaps we'll be teachers and not caretakers or collection builders. As more information is out there, it takes more skills to locate it and evaluate it. We are good at that, and we have two options (or at least two obvious ones): we can do it as a pro-bono service subsidized by our school, or freelance in a just-in-time pay-as-you-go system. Either way, we are information brokers and not warehouse managers, labeling and stamping books. That's been a waste of our skills for a long time.

We may also be information creators. We've done that for a long time, too. We've indexed and cataloged, in ways that are outdated now, but what about new ways? Can we not build KM systems to synthesize the sources we manage? To build recommender systems? People say that IT folks can do all that, but they can't. They're really good at the software and the hardware, but don't care about the content. We do.

This is just step one in my trying to think ahead of the curve. This week maybe I'll think about how other fields have outrun us in our own field - or what we thought was our field, and synergy with other fields.

Or maybe I'll relax and enjoy the holiday. Naw....

Wednesday, May 14, 2008

Getting rid of originals?

I was reading the notes of the digitization meeting at RBMS, and thought - WHO thought of the "digitize, then dispose" comment? Some very young person? The first digitization project I worked on followed the LC standards of the time - 150 dpi. What were we thinking? Well, we weren't thinking ahead, we weren't thinking in economic terms that mass production means cheaper, we weren't thinking of the future.

What if we had thought of scan&toss then? What would we make the better scans from? Other than deteriorating nitrate and acetate negatives, which can spontaneously combust or can suddenly turn to dust, what would be good enough to scan but not good enough to keep?

I was shocked that this came from RBMS people - what books would we have left if we tossed the shabby used ones? I hope this isn't an offshoot of the googlescan mindset of administrators, or maybe I hope it is and isn't from the librarians and archivists!

I know that archivists are reputed to be packrats, but are RB librarians conspicuous consumers?

Thursday, May 8, 2008

Out of the box software

I've been looking at open source software lately, and I like what I'm looking at. I haven't migrated to Linux yet, but I'm replacing many of the expensive brand names with better OS software. GIMP isn't instinctive, but neither is Photoshop.
Open Office is much better than the built-in boobytraps of MS Office, and I'm in love with Google Docs - not for all purposes, but great for collaboration and easy transfers while traveling.
So now I'm looking at content management systems and thinking of wikis as documents and teaching platforms as well as "wikipedia" clones. Maybe just starting as a supplement to Blackboard, since I get paid to use that, but also as an alternative for people who are having issues with firewalls. That seems to happen once a year, at least!
I'll let you know what the results are, let me know your favorites.

Saturday, April 26, 2008

Shaking things up at MAC

I'm back from Lovely Louisville and the Midwest Archives Conference. It was a good conference, if a little shaky - yes, we felt the second midwest earthquake in the middle of a session. And no, it wasn't ours, on disaster recovery, and we had to spend Friday apologizing for mis-cuing the special effects.

We had a good response on our workshop on co-operative disaster recovery, and we hope to take it on the road and online, so feel free to give us feedback! And feel free to contact us if you're interested in a session where you are. Have powerpoint and manuals, will travel.

Thanks to all of you who gave us such a warm welcome back in the old stamping grounds - no, I haven't lost all my Kentuckisms yet!

Saturday, April 5, 2008

Taxing our patience

As I work on my taxes and look at the piles and piles of forms, I had a thought.

There are boxes to donate a dollar to a political party - no big surprise. Why can't we have a box to donate a dollar to: a library? PBS? NEH? A national college scholarship fund?

These are all non-political non-sectarian good things, right? The public good? And ignored by the present administration.

So let's start a movement. Let's let people make a difference in a positive way!

Monday, March 17, 2008

Workshop at Midwest Archives Council in April

If you're going to be at MAC this spring, come to the "Strength in Numbers: Collaborative Disaster Planning" workshop on Saturday. I'll be presenting with long-time collaborator BettyLyn Parker, and we'll cover local and regional co-operatives, as well as larger options, such as state level and FEMA programs.

There will be tons of handouts, so don't worry about taking notes. Just open your mind and think about things that archivists don't usually think about - like cost effectiveness and return on investment. Believe me, you won't convince beancounters to rescue your collection if you don't know how to count beans, and where they are!

If you haven't been to MAC, it's the best archival conference in the country - not on a high-price coast, friendly, and with many useful sessions - and all at a bargain price.

If you don't come to the workshop, look at nametags and say hi!

Sunday, March 9, 2008

The catalog isn't broken. Really.

The catalog isn't broken. Really. It finds just what you type in the search box.

And that's the problem.

Try searching for a book on typography. According to LCSH, it doesn't exist, even though designers and publishers use it every day. Who (except librarians) would search for Type and type-founding. or | Graphic design (Typography) ? And why would Hersey's Hiroshima show up in the results? I'm confused, and I'm a librarian.

The catalog works. What doesn't work is the search.

The thesauri are so outdated that they might as well be chiseled in stone. Who searches for a term that's 30 years outdated?

The thesauri assume too much. You go to the catalog to find out about something, you shouldn't have to know about it before you search. Why even go to the catalog when you have to Google it first?

Worldcat on Google make it easier to find, but not more accessible. You still have to know the secret code. Sure, breaking the facets makes it better, but still not good, or especially usable. Search for Princeton in the default search box, and the second result is The essential Jung.

Huh? I'm still confused.

So why don't we change the terms? It ain't easy. There's no simple way to update the terms, and when they do get updated, they're usually outdated again. It's not LC's fault, it's not OCLC's fault, it's the whole system. The whole 20th century we-use-one-letter-codes-because-it-saves-a-byte system.

So what's the answer? Tagging, which is fun and helps you find your stuff, but doesn't help anyone else find your stuff? Another thesaurus? Hierarchy? Or dumping it all out and unstructuring it, mixing in the tag clouds, and sorting it with a Page-rank type ranking?

I don't know, but if you do, email me and we'll look for capital, because the demand is out there, and if libraries don't fix it soon, people will get used to going elsewhere.

They could come back, texting is just the old IRC abbreviations resurrected, it could happen. And they could fall in love with telnet and Wordstar, too. But I'm not betting on it.

What's my wishlist? The new system should be flexible, capable of near-real-time (or at least within a year) updatable, allow uncontrolled terms as well as controlled vocabularies, and allow relationships (similar to, related term).

Like RDF.

Like the semantic web. Just a corner of it. Just for now.

Please?

Sunday, February 24, 2008

Civil disobedience

I've followed the progress of the Copyright Wars with great interest. Kahle vs Eldred, Gonzales, etc etc, which has gone to the Supreme Court, and the quieter resistance of Google Books.

Kahle is the man behind the Internet Archive and a partner in the OCA. He started archiving the internet in 1996. For the young people out there, in 1996, less than 1% of the population used the net. AOL was a big player, Google was still two years down the road. And Kahle started saving those few early web pages.

Without Kahle, there would be no record of the online world in those formative days. The lifespan of a webpages is the same as fruitflies,we hear; two years is geologic ages in web years. So Kahle is the hero of our age, he is, in effect, the man with the fire extinguisher at the Library of Alexandria. He's fighting for the right to save our cultural heritage from the copyright sharks.

Kahe is doing it in the legal arena, without much success. He has some high profile partners here, too, like the Library of Congress. So Kahle is taking the polite path - if you object, and can prove it's your intellectual property, he'll take it down from the IA.

That's the same path that the Google guys are taking - if we scan your book, and you object, we'll take it down. And while there have been some challenges, no one has stopped them.

So in their quiet way, they have stopped Mickey Mouse from stopping progress. The endless extensions of copyright terms has made lawbreakers out of many of us, without our knowledge (which is not a legal excuse). The purpose of copyright law is twofold, according to the Constitution - to protect the rights of the creator for a limited term, and to end that term to foster progress. Current law reverses that intent.

So here's to three guys who have taken the path of most resistance, who have stood up for the rights of people everywhere to know their history.

Tuesday, February 12, 2008

On mass digitization

I love reading about the perils of mass digitization. The same things that were said about the OPAC, computers in general, and (a little before my time), the printing press.

What will technology bring? It's too new! Too untried! And most of all - what will happen to my job?

What brought this on? A recent tirade I heard about Google cutting the librarian out of the process. They're doing everything! There's no librarian doing the selection!

It does seem that there were many librarians in the selection process, over the course of many years. These huge academic libraries didn't build themselves. While not everything may be as useful as when it was selected, it may well be useful in a different way.

What's missing is the intermediation - the librarian as middleman. There's something lost here, mostly in the cataloging process - but cataloging as we know it is broken. Of course, tagging isn't all that healthy itself, but at least it doesn't rely on antiquated vocabularies and concepts. The semantic web may still be in our future; I'm looking forward to Google (or the next guys) distilling it al into a giant thesaurus.

It may be the fruits of reading way too much science fiction in my youth, but what's wrong with all the books being online? People complained about Project Gutenburg (maybe) having inaccurate keyboarding, now they have the page image (with the stray thumb). Now they complain about the thumb - the same people who study typos in old texts to determine the pagination and foliation of rare editions.

Mistakes have their uses, too.

I'll be the first person to agree that a full text search of Othello doesn't tell you it's about jealousy. On the other hand, Shakespeare didn't call Hamlet the Melancholy Dane, we stuck that label on him. Is he really the Schizo dane, or the Teenage Dane, or the Ironic Dane?

So let us not stick our labels on for eternity. This is a new age of scholarship, where you don't have to have a Columbia ID to see the actual text, where the ivory tower casts a fainter shadow, and and fresh eyes are welcome.

Yeh, we were going to do all this ourselves. Eventually, when we had the time and the money.

We still can, when we have the time and the money, and do it our way.

right.

So stop kvetching and let's do what we do best - intellectual access. It's what we've always done, in theory, at least. We help people find information.

We have a few million books to work on, plus the whole web. That should keep us busy for a while. So we might have to do things differently. Well, we don't type catalog cards anymore, and we got over it (at least, most of us).

Meanwhile, no censors, no closed stacks, no geographic limits (though there are still economic ones).

So what's so bad about Google Books? It's a gift! Open it up and enjoy!

Saturday, January 26, 2008

Why am I flexible? And who am I?

And why should a librarian be flexible?

Hello! Where have you been? The world is changing, and librarians are (or should be) part of that world. Mass digitization, the internet, subscription databases, and both the digital and analog generations of patrons and students have changed what we do, who we do it for, and what we need to know. But more on those later.

As for who I am, I'm:
A librarian (always in beta)
An archivist (with a background in photographic archives)
A teacher (graduate and undergrad archives and library science online, and workshops in person)
A freelancer (on and off, there aren't a lot of rich librarians)
A preservation administrator (by training)
Curious (If you're not growing, you're dying)
Flexible (If you're not part of the solution, you're part of the problem)

I've been:
An elementary teacher
A rehab contractor
A theatrical designer
A business person
A bookseller
A burger slinger
A paralegal

I love libraries because it's the one place where every weird thing I've done over time is useful, either to me or someone else.

We are walking encyclopedias, always needing revision
We are human search engines
We are not the gatekeepers of information, but the doormen

As long as we're willing to be flexible, to be lifelong learners, to grow.

In upcoming weeks, look for things to read, things to think about, things to try, things that are new and exciting and some that aren't worth the effort. They're all out there. Some will come here.