A Method for Measuring “Thanks” Part 1: A Search for Thankful Candidates

A Method for Measuring “Thanks” Part 1: A Search for Thankful Candidates

Recently, Chris Bourg shared in her blog how we will be presenting the findings from our “proof of concept” inquiry into assessing library impact by text mining acknowledgements at DLF in Austin, Texas, November 4-6, 2013. We have been given the challenge to present this paper in only 7 minutes. Needless to say, rather than sharing all of the wonderful, nitty-gritty details about our text analysis/mining methodology during the presentation, I thought it might be nice to outline some of the details of our methodology before we share all of our really fascinating results at DLF: an appetizer before the main course, if you will.

Before we were able to analyze who was being thanked in our library and the nature of those acknowledgements, we had to literally search for acknowledgements. Again, keeping in mind that this was a proof-of-concept endeavor, we decided to limit our inquiry to books and to find them using Google. Through a series of experiments, we discovered that the most robust and effective syntax for identifying candidates for our corpus was the following query in Google Books:

(~thank | ~acknowledge) & (“Green” | “Cecil H. Green” | “Stanford University” | “Stanford” | “Stanford University Library”) & (“Special Collections” | archives | ~library)

I will walk you through this syntax step-by-step and explain why we constructed this the way we did. First, let’s take a look at the individual operators.

  • & can be used interchangeably with AND, and signifies words or expressions that all must be found in the search
  • | can be used interchangeably with OR to include more than one term to be found in your search
  • “” are used to identify words or phrases that must be found exactly as they are typed
  • ~ indicates a word that can be found with flexibility (i.e. librarian in addition to library)
  • () require that the terms or expressions found within them to be performed first, and is also known as nesting

Thus, our expression can be read as the following narrative:

Find words similar to thank or find words similar to acknowledge, and find exactly Special Collections or find archive or find words similar to library, and find exactly Green or find exactly Cecil H. Green, or find exactly Stanford or find exactly Stanford University Library or find exactly Stanford University Libraries.

The reason we decided to include Special Collections is because we were noticing in our initial, experimental queries that authors would most frequently refer to that part of our library, as well as specific archives. It was this same line of logic that caused us to also include Green and Cecil H. Green along with various combinations of Stanford (University Library/ies).

When I submitted this query to Google Books just now, I received “About 981,000 hits.” Now, for the purposes of our specific inquiry, we decided to limit to the last 10 years, so from 2003 to 2013. In order to limit the Google query results to a specific year range, we need to select “Search Tools.” Next, click “Any Time,” and “Custom Range.”

Screen Shot 2013-08-23 at 3.21.59 PM

You will notice that a form with calendar view will appear. This is where you need to input the date range in which you are interested. This tool is extremely useful in that you can specify date ranges down to specific days. Keep in mind that if you just put in a year, like in the picture below, it will perform a search from January 1st of the start year to December 31st of the ending year.

Screen Shot 2013-08-23 at 3.24.07 PM

Now, you will find that your search has been limited to the range you have specified.

Screen Shot 2013-08-23 at 3.27.48 PM

These results are automatically sorted by relevance. However, you may find it useful to change the sorting to Sort by Date.

Screen Shot 2013-08-23 at 3.28.11 PM

This completes Part 1 of “A Method for Measuring Thanks.” The next step in this methodology will cover the automated harvesting of the results of our Google Books query for verification before data/text analysis can commence.

I hope that you have found this little tutorial useful, and be on the look out for Part 2 of this methodological series in the coming weeks.

0 Comments

Trackbacks/Pingbacks

  1. A Method for Measuring “Thanks” Part 2: Scraping Query Results for Analysis in a Collaborative Project | A Linguist in the Library - […] few weeks ago, I posted the first installment of this series on the methodology behind the “Measuring ‘Thanks’” project …
  2. Mining acknowledgements , Library DIY & creative Information literacy - shopBlogs - […] 1. A method for measuring “Thanks” Part 1 : A search for thankful candidates 3 […]

No shushing here--the linguabrarian wants to know what you think:

%d bloggers like this: