Architectures for Data Security

This is a summary of one section of my workshop on Data Architectures at the SSIR Data on Purpose workshop.

Data security is a tricky concept for for organizations large and small.  In this post I’m going to lay out how I approach helping these groups come up with a comprehensive strategy that meets their needs.

Core Questions

There are a few questions you need to ask yourself before you can think about what security means for data and organization:

  • what does security mean for us?
  • what level for data security is right for us?
  • what kind of protections do we need in place?

These focus as much on technological solutions as social processes.  Security is fraught with problems, and I’m by no means an expert.  However, I want to share some frameworks that might help you get started.  I’ll use two ways to think about security – access and longevity.

Access as a Security Issue

Most folks approach security from this perspective.  Who is allowed to add, see, and manage the data?  You can think about four issues within this:

  • technical vulnerabilities – This is about software and hardware systems you put in place to protect your data.  Can your systems be broken into?
  • social vulnerabilities – This issue about about how the social dimension of people can create problems for security.  How can someone be tricked into giving their key that gets past the technical defenses?
  • external threats – This issue is about the classic model definition of someone “hacking” into your systems to get your data.  You need to understand who the threats might come from, and how they might try to get in.
  • internal threats – This is about understanding your organization.  What’s the risk that someone inside your organization will, due to ignorance or malice, give out some of your sensitive data?

The conversations tend to revolve around technical vulnerabilities from external threats… so I’ll focus on the opposite.  You need to remember that sometimes your data can get out by accident!

For instance, the Basecamp project management software had an accidental leak a few years ago. They wanted to celebrate their 100 millionth file upload so one of their staff shared the name of the file.  That might, at first, seem innocuous, however this symbolic release of information that should be private led to outrage from their community of users. If they released this simple filename, what might they release next?  This social vulnerability form an internal staff member created a serious breach of trust.  You need to think about these less-commonly considered security issues to really understand what security means for you.

Longevity as a Security Issue

Working with social change organizations, I find it is useful to remind folks that data has a lifespan.  The longevity of your data is a big security issue that you need to consider.  Who manages it in the long term?  What are your commitments to honor data retention and access policies over time? You need to consider:

  • secondary uses: What future uses might your data lend itself too?
  • data validity: Is the time of your data collection clear?  What should people who try to use it in the future be aware of?
  • data integrity: Does your data change over time?  Do you have a way to tell when it was last updated?  Are you clear about its context?
  • data ownership: Who owns your data? Is there a period of time after which you plan to release it? What happens to it if you organization disappears?

Here’s an example: a 1980s research paper looked back at the archives of the 1964 Freedom Summer project.  The researchers looked back at the enrollment forms for the people who volunteered to try and determine what the best predictors of participation were.  This kind of re-use of data 20 years after the fact is the kind of usage you need to consider.

Policies & Practices

So how do you craft policies and put them into place.  The key consideration is that they need to match your needs.  You have to take stock of the existing patterns people have and try to accomodate and build off of them.  It’s best to engage the key players in your data’s lifecycle early, so they have ownership of the system you put in place.  This “meeting people where they are” approach doesn’t mean you can’t create a strict policy about data use, but it does create an environment where your policies are more likely to succeed.