Tuesday, October 19, 2010
Monday, November 30, 2009
DRAMBORA
DRAMBORA is a two phase system self-assessment tool. The first phase results in a comprehensive organizational overview. The second phase involves risk identification, assessment, and management. Through the bottom-up approach repositories evaluate themselves based on the repositories own contextual environment and can then compare their characteristics to other repositories. According to this article, DRAMBORA’s creators are also developing “key lines of enquiry” which are sets of questions that will guide auditors within an organization to focus on significant issues or risk factors.
The overall idea is that a successful digital repository is one that plans for uncertainties, converts them into risks, then manages these risks. It involves a cyclical process by which a repository reduces the level of risk after each iteration. Participation is not rewarded by a certification or endorsement. Repositories benefit by strengthening their self-awareness and ability to identify and manage risks. This enables them to present information about their repository that makes them more approachable and trustworthy. The creators argue that DRAMBORA “offers benefits to repositories both individually and collectively” in that it opens up lines of communication between repositories. DRAMBORA facilitates classification of digital repositories so that services and characteristics are more easily communicated to an audience.
The website states that he purpose of the DRAMBORA toolkit is to facilitate the auditor in:
- Defining the mandate and scope of functions of the repository
- Identifying the activities and assets of the repository
- Identifying the risks and vulnerabilities associated with the mandate, activities and assets
- Assessing and calculating the risks
- Defining risk management measures
- Reporting on the self-audit
There is an offline version of DRAMBORA that you can download from the website, but both options require registration. While it is a European initiative, DRAMBORA has been implemented at numerous repositories in the US and Europe.
Tuesday, November 24, 2009
Two Views on Digital Curation
The first article is written by Steve Rubel, who is a marketing strategist and a blogger. In his article he talks about the internet offers an endless supply of choices that far exceeds our ability to pay attention to it all (basically he is saying that supply is outstripping demand). He mentions how dominant both Facebook and Google have become. Rubel then goes on to say that no matter how dominant these two sites are they can’t hold our complete attention, in part because they “are often a mile wide and half-an-inch deep.” This leads him into bringing up digital curation, and discussing how brands (like IBM, UPS, and Microsoft, which are some of the example sites and "curation" project he gives) are starting to curate digital information to help people "find the good stuff." He doesn't spend much time discussing his thoughts on brands being curators, though, which is disappointing because the idea is intriguing.
Rubel closes by saying that both human-powered and automated digital curation “will be the next big thing to shake the web.”
I enjoyed seeing a non-information professional’s take on the internet and curation. And it was especially interesting to me how Rubel was using the term digital curation. He never really gives a definition, and seems to make it sound like a fairly simple thing, when in fact it is pretty confusing and complex. I also find it perplexing that in Rubel’s take on digital curation it is journalists who will be playing a key part in it all. I have a journalism background (and a degree in the subject that I will never use), and I can’t imagine journalists doing half the things we have discussed in class about curating information. Rubel says that journalists won’t be the only ones taking part, but he completely fails to mention information professionals at all in his article, which is what the Resource Shelf blog entry responded to.
The Resource Shelf writer says they were sad to see that librarians and information professionals were left out of Rubel’s article. Sadder still, Resource Shelf says that librarians and information professionals are often forgotten by those outside of our field when it comes to discussions like these – which seems ridiculous because who would be better at curating information than information professionals? This response article also goes on to talk about how librarians have been “curating” digital information for years (which we’ve learned) and also about how collection development will become a form of digital curation in the future, which I given much thought to before.
Apparently the blogger at Resource Shelf actually emailed Rubel about his article to invite him “a virtual tour of some of the resources librarians have been curating for years.” Hopefully Rubel responds to the blog because that could result in an enlightening discussion.
Anyway, it was just interesting to read these two different takes on digital curation. Both writers agree that digital curation is a worthy goal to work towards.
Shared Names Project: Linking Biomedical Databases
Monday, November 23, 2009
Digital Curation & User Testing
Marchionni, Paola. “Why Are Users So Useful?: User Engagement and the Experience of the JISC Digitisation Programme.” Ariadne, no. 61 (October 2009). http://www.ariadne.ac.uk/issue61/marchionni/.
This recent article by JISC's Paola Marchionni refocuses attention on the purpose of curated digital collections: that is, their use by different user groups. Marchionni begins by noting that many digitization projects are still not paying enough attention to their users that their users' needs. They become so caught up in trying to make their content accessible online, that they don't adequately research their key users. As a result many publicly-funded projects are going un- (or under-) used. Marchionni illustrates the insight users can provide by presenting two case studies of projects that incorporated users into their development process: the British Library's Archival Sound Recordings 2 (ASCR2) project (a collection consisting of over 25,000 recordings) and Oxford University's First World War Poetry Digital Archive (WW1PDA) (a collection that contains over 7000 items pertaining to WWI poets, including digitized images of materials held at UT's own Harry Ransom Center).
Though Marchionni's article helpfully reminds digitization (and digital curation) projects to keep their electronic eyes on the prize and really take their users into account, I'm not sure that much of what Marchionni presents in her list of suggestions for user engagement is particularly surprising. She recommends first recognizing the importance of interacting with users and even having an "Engagement Officer" position as the ASR2 project did. She also advises establishing an early and on-going relationship with users. The WW1PDA project, for example, developed a typology of users, with a steering committee of scholars in the field of WWI literature advising which materials should be digitized and participating in quality control, and a separate group of secondary school and higher education instructors helping to develop and offer feedback on the education section of the project. Marchionni also emphasizes the importance of knowing what to do with user feedback. When users expressed anxieties about the integration of Web 2.0 tools out of fear that they might undermine the authority of the WW1PDA archive, the project decided to integrate such functionality in a way that made the lines between the archivists' and the users contributions more clear.
Some of the more interesting lessons regarding users came from the WW1PDA's approach to educational resources. The project held workshops for teachers in order to discover what functionality this group would like to see on the site. In a rather ballsy move, the project then asked the workshop members to help author a number of learning resources for the website. Though this did result in the creation of some resources, ultimately the project realized that perhaps it had overreached in what it was asking their busy users to produce. (Frankly, I would be a little annoyed if I agreed to participate in a workshop on a new resource and then came away having been assigned the time-consuming "homework" of creating a bunch of resources for that project).
Though asking users to create lessons plans and other teaching materials was not as successful as the WW1PDA project might have hoped, users were willing (and excited!) to contribute materials from their own familial archives to the project. In fact, the project received such a high level of response to their requests that they held extra workshops to help the public digitize their items.
Some of Marchionni's suggestions seem to blend user engagement and marketing. For example, the WW1PDA's teachers' workshops seemed to have functioned in part as a source of user feedback, but also as a forum for promoting and publicizing the resource. Teachers were seen as the key to two user groups: teachers and students. Similarly, Marchionni also suggests targeting any information dissemination activities at specific user groups. The ASR project, for example, publicized its Holocaust collection by contacting networks for Historians and those in the field of Jewish and Theological Studies. Though his may seem more like advertising than user engagement, it's nevertheless important to remember that we sometimes need to market our resources if we want them to be used. Finally, after highlighting the importance of user engagement, Marchionni ends with a reminder not to lose sight of the project's mission: though it's important to listen to user feedback, we shouldn't be bullied by it. Focus on the the needs of one's primary users and keep in mind that you can't satisfy everyone.
One thing that I wish Marchionni had addressed in greater detail is the expense involved in maintaining a high level of user engagement. Obviously it's more expensive in the long run to pour money into a resource that doesn't get used than to devote some money to engaging users, but nonetheless creating sustained relationships with users can be a drain on already strained budgets and staff schedules. I'd love to hear more about how small projects or ones meager financial resources might effectively develop ongoing relationships with users.
Friday, November 20, 2009
The Relevance of Twitter
Leslie Carr, on his blog RepositoryMan recently confessed to having similar doubts about the utility of Twitter asking himself if it wasn't just "some gratuitous teenager technology?" So he conducted a study. At a recent CETIS conference, Carr used the Twitter API to aggregate all of the tweets from the conferees. From these tweets he wanted to determine how many of the tweets were either, on the one hand, "technical/academic/professional," and on the other, "personal/informal/gossipy." Although he created other categories for the tweets, Carr was clearly interested in quantitatively studying the relative "significance" of the informational value of the tweets from the CETIS conference.
From his analysis, Carr determined that 70% of the tweets provided the sort of "informational" value that he was looking for and that about 41% of the conference attendees contributed tweets that were either "entirely" or "mainly" informational. Carr doesn't go too in-depth into what the criteria according to which he determined the relative informational value of the tweets, though he did admit that "useful information" was information that was useful "to him."
It's an interesting study, though after reading this post, I have to say that it seems that even the tweets which Carr did not think had "informational value" (e.g., tweets about the poor quality of wireless connectivity at the conference) could be very useful for other audiences. The conference organizers might have found the gripey tweets about the wireless issues very useful, as do businesses which are now mining Twitter for information regarding reactions to their products. The take-away from reading this study, for me, was actually that tweets can provide useful information to any Twitter user given the right circumstances and conditions. This is not to say that I think every tweet is useful. Tweets about Megan Fox can almost always be ignored. Although, I don't know. Maybe in twenty years, a Media Studies or Gender Studies researcher might even find the Megan Fox tweets to be of some value.
Thursday, November 19, 2009
Google Swirl
As a bridging of Picasa Face Recognition and Similar Images, search results depend upon both image metadata and computer vision research. There are comparisons to Google's Wonder Wheel (which displays search results graphically) and Visual Thesaurus. One enters a search term and 12 groupings of images appear visualized as photo stacks. One chooses a particular image and the Flash experimental interface "swirls" to display your image and branches to numerous other images with varying degrees of relationship to that image.
Wednesday, November 18, 2009
Automated Data Processing: Too Big for Our Puny Brains
Digital Scholarly Communication Projects List
- gpeerreview: Google's unfinished answer to peer review, involving getting endorsements from "endorsement organizations," graphing those endorsements, and then providing some kind of credibility ranking
- Faculty of 1000: online research tool that highlights the most interesting papers in biology, based on the recommendations of over 1000 "leading scientists"
- MONK: an open source "digital environment" that humanities scholars can use to analyze patterns in texts
- www.myexperiment.org: scientists contribute their scientific workflows (I assume this is like sharing their lab notebooks) so that others can use them, also see UsefulChem
- SciLink: Facebook for scientists, but instead of using email contacts to mine your connections like Facebook does, it uses bibliographies from articles
PLANETS: integrated services for digital preservation
Planets is not a repository project but expects each participating institution to maintain storage for their digital data. The goal is to work toward preserving entire collections, not just creating stand-alone applications that can handle one aspect of data preservation, such as migration or emulation. The project is a collaborative effort based on the idea that no single institution is going to be able to handle the level of development needed. The initiative is drawing from the expertise and experience of numerous partners in different countries.
The website explains the deliverables:
Preservation Planning services that empower organisations to define, evaluate, and execute preservation
Methodologies, tools and services for the Characterisation of digital objects
Innovative solutions for Preservation Actions tools which will transform and emulate obsolete digital assets
An Interoperability Framework to seamlessly integrate tools and services in a distributed service network
A Testbed to provide a consistent and coherent evidence-base for the objective evaluation of different protocols, tools, services and complete preservation plans
A comprehensive Dissemination and Takeup program to ensure vendor adoption and effective user training.
After hearing a presentation on digital preservation initiatives in the US by classmates in another course, I am interested in the differences between the systems in place in the EU that make this kind of large scale collaborative project possible. Reflecting on the Larsen article, On the Threshold of Cyberscholarship, I realize that PLANETS falls clearly into the research stage of activity, where tool development is key to the success of a collective infrastructure for access to digital materials. I get the impression that European institutions, and maybe scholars, are more likely to achieve success in the area of cyberinfrastructure development. Is this because of funding, social behavior, scholarly expectations?
Free Access to the Web
Another point that Arms makes is that digital libraries can be nearly completely automated. He asserts that a brute force search, such as that provided by Google, with enough information and in the hands of a good researcher, can actually be much more powerful than an intelligent search by trained librarians. (While I don't like the implications for library services, it does seem like a lot of the focus on the need for reference librarians is in terms of not-good or amateur researchers...) This automation actually increases access as you no longer need to work through a small group of homogeneously trained elites.
Arms identifies two issues or potential problems for further research. One is insuring quality of information. This role was traditionally performed by the publishing process, but with the self-publishing afforded by the web, we can no longer count on good publishing practices. The second is permanence. Flip a switch on a server, and its information vanishes.
I found this article interesting mostly because of its now somewhat historic outlook. Some things have not turned out as Arms saw them in 2000, namely the level of free access. As we see with the Google Books project, proprietary interests are finding their way into the new cyber-reality, and the Great Copyright War has yet to be fought. Interestingly, though, Arms did identify two key issues that continue to be relevant: trusting found information, and ensuring its permanence. I suspect that the best answer to the former is via education of the public. At some point, the onus has to be on the searcher. The second problem, in my mind, is much more problematic, and it is one that plagues the physical as well as virtual information worlds. All in all, I found this early article quite interesting.
Tuesday, November 17, 2009
Aquatic and Riparian Effectiveness Monitoring Program
The goal of AREMP is to use a decision support model to evaluate watersheds for overall watershed condition. A number of attributes are assigned to each watershed and once all the attributes for a watershed are sampled the data is aggregated to determine a watershed score. To aggregate the data and find a score, AREMP uses software called Ecosystem Management Decision Support (EMDS) which creates the model and then assesses the condition of the watersheds based on the data. AREMP says that they would be happy to share their data with anybody who would like to see it.
EMDS is pretty interesting, the EMDS document says, "EMDS does contain tools for conducting “what if” scenarios. For example, one can estimate how watershed condition will improve if 500 pieces of large wood were added to the stream".
I wasn't able to find anything connecting AREMP with LTER but I did discover more problems with ecological data. For instance, the EPA and AREMP both use probability sampling designs but, "indicator and sampling methods differ from those used by the EPA, and these differences hinder collaboration and data comparison" (Hughes, 2008, p. 853).
I would still like to know more about AREMP's data and how their data collection methods compare to other ecological studies.
Hughes, R., Peck, D. (2008). Acquiring data for large aquatic resource surveys: The art of compromise among science, logistics, and reality. Journal of the North American Benthological Society (27)4, 837-859.
OCRIS: Online Catalogue and Repository Interoperability Study
This study reviewed Library Management Systems (and the associated OPAC) with the Institutional Repositories (IR) of Higher Education Institutions in the UK.
The goals: determine whether the repository content within the scope of the institutional OPAC (and extent it is recorded in the OPAC); examine interoperability of OPAC and repository software; list services offered by OPAC's and repositories; identify potential for improvement in links to other institutional services; make recommendations for development of further links between OPAC's and repositories.
The primary findings are distressing. Only 2 percent of the study respondents stated that their systems were definitely interoperable, and 14 percent stated that interoperability was pending. There was an 81 percent overlap in scope for all items in IRs and OPACs; generally, IR's contained bibliographic data and OPACs contained full text.
Clearly, differences in scope/policy are not clear and there is either uncoordinated effort, hindering interoperability, or duplication of effort and/or redundant information. In order to provide a more feasible and appropriate long-term vision for IRs and OPACs, institutions should take a structured look at the goals of each service (IR vs OAPC) and coordinate efforts to best provide interoperability and reduce duplication of effort.
This was interesting... OPACs and IRs arguably have different intents - generically speaking, circulating collections versus long-term preservation and access, but they are really not so different. While both may not store information, each provides a service in locating information. In light of the volumes of money spent on IR development/OPACs/interoperability, and the number of available/established entities available to study, it is a very good time to step back and consider how these services might be better managed and coordinated to provide the best available service to the end user and the institution. I would like to see a corresponding study of American institutions.
Competing Requirements for Self-archiving
First, Harnad clarifies the opt-out clauses in self-archiving mandates. The opt-out clause is about whether or not you need to persuade the journal to accept your addendum thereby formalizing your right to deposit your article. While this is worthwhile, it is not essential so authors can choose to opt-out if they cannot persuade the journal or they simply don't wish to try. But regardless of the presence of an opt-out clause, authors still deposit their articles immediately. It is not necessary to find another publisher if the publisher denies your request.
Second, if the publisher has an embargo period, and the author wishes to honor it, they simply deposit the article as closed access for the appropriate period of time. There is no need to talk to the publisher about the embargo period.
Harnad suggests that depositing an article is not as confusing as it seems. An author simply needs to deposit all drafts as soon as an article is accepted for publication. According to his post, 63% of the top journals allow such a deposit. If your journal is one of the other 37%, simply set the article to closed access.
The issues surrounding depositing a published article are complex and probably result in many authors choosing to do nothing rather than try to navigate through all the competing requirements. Though closed access is not ideal, having a copy in a repository is better than not having anything stored. It seems that the more an author knows about the process of depositing an article and what they are allowed to do, the more likely it is that they will go to the trouble of depositing their article. The way things currently stand, most authors choose not to mess with online repositories because they don't even know where to find out what they are allowed to do.
Monday, November 16, 2009
The Editorial Board for JoVE boasts members of the scientific community from the best institutions in the world... Harvard, Mount Sinai School of Medicine, Princeton,
University of Zurich -the list illustrates an impressive number of highly qualified board members. The project was begun at Harvard in 2006 by a post-doc, Moishe Pritsker, now CEO and editor-in-chief of JoVE.
Access:
Initially, JoVE was conceived of as an open-access project, however, that model proved itself impossible, given the high costs of producing the videos. According to Pritsker,
"The reason is simple: we have to survive. To cover costs of our operations, to break even, we have to charge $6,000 per video article. This is to cover costs of the video-production and technological infrastructure for video-publication, which are higher than in traditional text-only publishing. Academic labs cannot pay $6,000 per article, and therefore we have to find other sources to cover the costs." (http://scholarlykitchen.sspnet.org/2009/04/06/jove/)
Thus, a pricing structure was created to cover these costs: "$1,000 for small colleges to $2,400 for PhD-granting institutions, prices which are in league with other commercial scientific journals. In addition, authors are charged $1,500 per article for video production services ($500 without), and there are open access options: $3,000/article with production services ($2,000 without)." (http://scholarlykitchen.sspnet.org/2009/04/06/jove/)
Taking a look JoVE's Press section, I was surprised to find they had posted articles criticizing their decision to go closed-access. While these criticisms are valid, so are JoVE's explanations of why open-access wasn't a possibility. Given the newness of this "product" and the fact that there is no existing model, it makes sense that charging for access is the only way to offset the price of video production without all of the costs falling on the research institute.
How it looks:
Videos are very high quality and accompanied by a complete scholarly article that acts as a transcript of sorts to the video.
Overall, I'm very impressed by JoVE, and while for now it is closed access it seems that as more stakeholders buy in and the model becomes more widely accepted it may be possible in the future.
Monkeying Around with Twitter Data
InfoChimps' mission "is to increase the world's access to structured data". The company appears to offer much data for free and prefers to be a platform where people may "post data under an open license". The data is available for browsing, and if the site doesn't actually have the data, it will point the user to where they can get the data for free. This is excellent for data sharing and access to large data sets. The site's homepage lists "Interesting Datasets" for perusal. I clicked on the first one, which was College Enrollment of Recent High School Completers 1960-2005. There is an intro paragraph to the data, where it's from, and an example of it. This set was prefaced with the caveat that the "files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns." Kind of funny, yet good information to know!
So, Infochimps offers free datasets - fabulous. But their recent announcement to sell Twitter data met with some skeptical questions. The Read Write Web blog discusses this development in depth, describing the data, which isn't the full tweets, but hashtags, RTs, @ messages, and other associated info. This is apparently really useful and great information to have, and the developers at InfoChimps are hoping that people create interesting apps with this data. InfoChimps mined this data themselves, by hitting Twitter Developer API 20,000 times per hour (I almost know what this means). That's a LOT of data. Marshall Kirkpatrick, the author at RWW, questions the complete legality of selling this data, and worries Twitter is going to come a-knockin'. Many commenters on the blog entry also thought so. InfoChimps (last week) swore they were on solid legal ground, but a new post from yesterday on their blog revealed that Twitter had asked them to remove the datasets. While InfoChimps swears their data had nothing personally revealing, privacy concerns came from commenters and apparently Twitter, who claims they just want to prevent any 'malicious use' of the data.
I gleaned from the blog entries related to this issue that Twitter isn't very forthcoming with their data, which is making them a new Bad Guy of social media and data sharing. It's interesting that people are upset that Twitter won't share, but probably don't care as much if a university won't share some research datasets? Shouldn't data be data? Who cares if one is more sexy than another? Access is still important. InfoChimps sounds a little naive to think that selling Twitter datasets for $9000 wouldn't cause a stir, but maybe that's what they actually intended! At least it brought some mild attention to the issues at play here.
Sunday, November 15, 2009
Gordon
