Skip to main content

Reclaim assignment #3: personal knowledge organization (aggregate and tag stuff)

2 min read

Ok, now you have some kind of personal website, and you have a section of it devoted to posts about your personal project's progress. Nice job!

The next step goes backwards a bit in terms of workflow. I want you to come up with some web-based method for collecting together and aggregating interesting things you find and thoughts you might have on a number of topics. One easy way to do this is to make a Known site (like this one, but using a single-author theme) as a subdomain or page of your domain. But other things might work - a wiki, for example, or a blog theme, or....Whatever you use, think about its organization and how you will be able to search it. Something that supports tagging might be a good idea - even if you don't regular use tags this is a good place to try it out. This is also a good place to capture fugitive, fragmentary

This will be, in fact, something like the commonplace books we've been discussing - most recently with Amanda.

The main point is that your personal website is one presentation of yourself (those aspects you choose to present/represent); the project progress site is a project-based site that is relative formal - formal in that you are authoring and posting semi-regular, fully coherent pieces of writing. This new site you will create is less formal - it is a place for you to experiment with doing for yourself what we do on this END site. Questions? I'm here!

What the heck is linked data?

1 min read

Three quick reads:

Still trying to figure out what linked data is? Me, too. But these quick explanations help. One by Roy Tennant:

One my Ruth Tillman:

And some caution from Jonathan Rochkind:

Reclaim assignment #2: project log

3 min read

Now that you have a shiny new domain of your own and you have created a basic landing page, you have a new task: create a place to write informally about your END person project progress. Please create your project log and post a link to it, along with a few sentences about the experience of creating it, here by Monday, June 13th.

This project log could be as simple as a new page on a Wordpress site, a wiki, or a Known site, or as complicated as you would like it to be. Browsing through the available applications via your Reclaim account might give you ideas. (It is also possible to install another application, though this will take a bit more playing around.) What platform or application you choose will depend in part on how you plan to use it - will you try out a full open research notebook or commonplace book on the model of Whitney Trettien's Whiki? A wiki like MediaWiki (the infrastructure behind Wikipedia) might be right. Do you plan on a simpler series of posts? Some kind of blog infrastructure - a series of posts with some organizational principle - will work. Will you want to be pulling together lots of different kinds of media - images, videos, audio files? A Known site might be what you want. Plan on collaborating? You might want a site that allows others to create accounts and contribute. These are just a few ideas - there are lots of possibilities you might explore.

All of your project progress posts can live here if you would like - think about a place might gather together notes and scraps and links and writing primarily for yourself as well as the periodic project update posts you write for the END team. Going forward, when we ask you to post to, you can publish your post on your project log and then cross-post to

You will want a distinct url and title for your END project progress site. You will want to it be something that is easy to use. You will want to think about how you - and possibly others - will navigate it. You will have to make decision about where to locate your project log: in a subdomain or a subfolder?

(If you are still having trouble getting up and running with your Reclaim account or landing page, email me or check in with me if you are at Penn, or Amanda at NYU; the Reclaim folks are also good resource if we can't solve your problem.)

Questions or ideas? Go ahead and comment! And thanks to Lindsay and Amanda for collaborating with me to spin up this assignment.

your own domain

1 min read

Welcome to the open web! Today, you begin experimenting with self-hosting and your own domain.

By Monday, please set up your own domain using your shiny new Reclaim Hosting account, create some kind of basic landing page, and post a link to it and a sentence or two about the experience here. Please add the tag and any other relvant tags - in general using these on this site will make it much more navigable.

Reclaim Hosting has four short videos that will take you through each step: just let us know if you have questions.

topic modeling: introductory resources

3 min read

A great tutorial for getting MALLET installed and running is Shawn Graham , Scott Weingart and Ian Milligan's Getting Started with Topic Modeling and MALLET. I recommend working through the "using the command line" tutorial they link to at the beginning of the post if you aren't familiar with the command line yet. Take your time, go step by step, ask for help if you need it - I think you will find it easy and fun.

To be able to use MALLET effectively, you'll need to know a bit about both the theory and the practice of topic modeling. Below are some introductory resources I find people find useful; there are many others out there that may better suit your particular learning style. The key is to read a few of them, then begin messing around with MALLET, and then continue to read around to deepen your understanding as you begin to get some hands-on experience.

Topic Modeling and Digital Humanities by David Blei in the special topic modeling issue of the Journal of Digital Humanities; it offers an accessible introduction. (Blei's article Probabilistic Topic Modeling has more detail but is also really useful. You need not understand it all to get something out of it.)

In the same issue of JDH I also recommend reading Megan Brett’s Topic Modeling: A Basic Introduction and then look through/skim the “applications” essays by Lisa Rhody and others.

After doing this to orient yourself, you may wish to read Ted Underwood’s more technical blog post Topic modeling made just simple enough for a more detailed perspective. (Note that this post, like the JDH articles, is from 2012 and some information may be out of date.)

David Mimno's video "The Details: Training and Validating Big Models on Big Data" is very useful to view (maybe a few times) at some point; even if you don't understand it all, you will get something out of it.

You may also want to take a look at the chapter on topic modeling from Matt Jockers's book on text-mining Macroanalysis - see our "readings" Dropbox for a pdf. I will also add some more examples of good recent articles using topic modeling methods for you to browse through.

There are also a fair number of critique-of-topic-modeling articles and blog posts out there. Most of them warning you against too quickly interpreting the MALLET output as semantically meaningful. I'm not worried that you will do this. I will say that I've found it useful to think of this kind of topic modeling almost as a kind of fiction; I think to think of topics as a counterfactual, almost fictional set of materials out of which your documents might have been created - but clearly weren't. This helps me stay away from thinking about "topics" as any kind of simplistic map of a the "contents" of a set of documents.

The other thing I like to remember is that topic modeling is probabilistic in a number of senses - keep this in mind as you learn about it.


END metadata brainstorming

1 min read

My own contribution to our brainstorming about ways of seeing our :

  • Thinking about 500 notes (presence, number, length?) as indicators of cataloger interest; connect 500 note frequency with genre words in titles, with decade, with author gender, with....

  • popular genre words in titles with existence of epigraphs (989 + 591) vis most popular genre words in titles without epigraphs (ie, do novels with epigraphs tend to fall into certain genres?)

  • genre words in titles (989) and footnote frequency (520a)

  • dominant narrative form (592a) mapped over year or decade of publication (008 or 260c)

  • dominant narrative form (592a) + author gender (599 5)

  • dominant narrative form (592a) + secondary narrative form correlations (592b)

R for Beginners fellow travelers, welcome to END's internal project site!

1 min read

You can post questions, comments, and answers about the for Beginners workshop here; please use the hashtag somewhere in all of your workshop-related posts so that we can easily pull up the stream to monitor it. Glad to have you aboard!

Formatting Known posts with Markdown

1 min read

Markdown is a lightweight markup standard useful for writing for plain-text writing for the web. The nice thing about Markdown is that it is easy for both humans and machines to read.

Instead of altering the look of your writing through varying the typography or spacing, you use the Markdown standard to indicate the function you need (different forms of emphasis, quote indication, different levels of headers).

So instead of using the "italicize" or "bold" function to add emphasis to a word in a post, you surround the word with



will render as



Instead of specifying formatting, you specify meaning:

##Header 2
###Header etc 

And the Markdown-supporting platform you are using renders it according to its standards:

Header 1

Header 2

Header 3 etc

>If you want to format a quote

You do it as above.

Here is an overview of Markdown formatting for other functions, including how to embed a link.

Welcome to END's Known! Today we want you to play with the possibilities of this platform.