Cyber Security

Cyberthreats are constantly evolving, which means that policy must also be regularly re-examined. This policy should be reviewed and revised at least once per year. It was last reviewed on May 25th, 2022.

Threat Model

Our cyber-risk primarily stems from bulk, financially motivated cyber-threats. We don’t currently have the size, profile, or political valence to justify positioning against targeted attacks or non-financial bad actors.

This stance will be continually adjusted based on the size, financial scope, and ideological sensitivities of our clients and the projects we take on for them.

Prevention

Access Management

The overwhelming majority of incidents are precipitated through credential mismanagement, and we can minimize that possibility through a bouquet of simple principles.

  • Logins
    • We use mandatory 2-factor authentication for our Google Workspace domain, and encourage folks to have multiple forms of second factors available, with preference for a security key or phone-sign-in over an SMS based second factor.
    • All employees must use a password manager (of their choosing).
    • Passwords must be globally unique, and have high entropy (in whatever flavor the employee prefers). Passwords should be generated through their password manager.
    • We centralize risk and protection on the Google identity, so when given the “Login with Google” option, take it.
  • Tokens
    • All machine-authentication tokens should be exclusively stored in git/GitHub, encrypted using sops. Our configuration for sops is described in outline. Since we use a GCP keyring, Google Authentication centralizes our access here.
    • When generating forms of machine-authentication (API credentials, SSL certificates, etc), document the process used to generate the credential (including the site, approximate time, commands run, etc), omitting sensitive data. This is critical for reproducibility and ease of key-rotation.
    • Rotate keys when technologically feasible through automated mechanisms.
  • ACLs
    • When setting up cloud infrastructure, use standard IAM tools to authenticate and authorize machines to talk to one another. Avoid custom code.
    • Configure ACLs through Terraform, and do not override or augment them through UI interactions that aren’t then pulled back into Terraform. We will enforce this with the use of automated tools. This allows our permission hierarchy to be code reviewed.
    • ACLs should be narrow and reflect a business purpose. During initial infrastructure creation, imprecise authorization is inevitable - that’s OK. Accompany every instance of potentially over-broad permission granting with a TODO in the code that references a Github Issue with the tag #ACL to rectify it.
    • Use tooling like GCP’s policy analyzer to detect overbroad ACLs, paying particular attention to this prior to public launch/access.
    • We’re less concerned with document/email access management, but follow sensible sharing practices in everything you create.
  • Devices
    • Employees should either have a dedicated machine for work-related software development, or treat their personal device as they would a work device device.
    • Employees should only download and use software that is broadly trustworthy. If you have questions on how this should be interpreted, ask.

Code

  • Isolated Test, Dev, and Prod environments.
    • Resources in one environment should never talk to resources in another.
    • Create symmetrical ACL structures in every environment (again, using terraform). This will allow for the discovery of issues prior to prod, prevents overbroad access in Test, and allows for simpler, higher confidence changes to ACLs.
    • In the rare instances where you do want shared state between environments, use a distinct, explicitly -shared environment.
  • Do not use third party libraries that are designed to handle sensitive data (ex: github.com/[randomuser]/paypalclient) - this includes payments, authentication, or authorization.
    • This is important because it’s a bulk attack vector. Though both are plausible, it’s easier to hack a bunch of sites’ paypal credentials by creating a paypal client than it is to target those same credentials through a non-PayPal-specific third party library (though the latter is still possible).
    • If there is code that demonstrates/provides a layer on top of these services that is valuable, clone it, audit it (yes that includes its dependencies, and yes that is time intensive), and use the cloned version.
    • This restriction does not apply to libraries produced by very large, very technologically sophisticated companies (ex: Salesforce, Paypal, Mozilla) that have significant adoption.
  • Make defensive assumptions about git: it is immutable, every git repo will eventually become public. Even though these aren’t technically true, these assumptions provide strategic value.
    • Example: when you accidentally commit a secret, even if you overwrite it prior to PR, discard that secret and regenerate another one. Once the compromised secret is no longer in use, use a tool like git-filter-branch or BFG Repo Cleaner to remove any trace of the secret from the repo history.

Clients

  • Have regular conversations with clients about their cybersecurity risk profile, and the highest value steps they can take to mitigate it.
  • Develop a long term vision with the client about how they will manage secrets and ACLs for the infrastructure we build for them.
  • Use a distinct keyring per-project (prefer this over per-client).
  • Encourage clients to have their own Local, Dev, and Prod equivalents.
  • In designing projects, avoid collecting sensitive data, and design out indefinite data retention. This minimizes the surface area of data exposure or compromise.

Culture

  • Ask questions about things you don’t understand, give feedback and share information with kindness and best-assumptions.
  • Get a formal review on every security-pertinent choice you make.
  • Write documentation liberally.
  • Security by obscurity should never have any weight as an argument.
  • Expenses incurred by employees in the maintenance of these preventative measures can be reimbursed. This includes security keys, password manager fees, etc.

Detection

Since we’re centralizing our assets on GCP and Google, it’s where we’ll invest the most energy detecting issues. We’ll focus detection where we focus mitigation: at the authentication layer.

  • GSuite OAuth - We’ll use the standard Google security dashboarding and reporting tools for monitoring and alerting on suspicious authentication.
  • GCP Standard Tooling - We’ve enabled Security Command Center, Security Health Analytics, and will continue to use and expand standardized tooling for risk detection and notification. We can periodically audit resource definitions, permissions and activity and an organizational level. We have data access and audit logs enabled for all of our projects.
  • Resource Limits + Alerts - We impose reasonable limits and budget alerts on all infrastructure - cost and bandwidth limits in particular are powerful tools to combat and detect abuse and infiltration.
  • Resource Standardization - We use resource naming conventions (with a secret methodology) to allow for detection of resources created by malicious actors.
  • Resource Inventory - Track our set of deployed assets to catch malicious actors, misconfigurations, and obsolete resources.

Response

We’ll follow standard Data Breach Response best practices. If we think there may have been a data breach, we will:

  • Assign a response coordinator responsible for managing the response
  • Create two tracking documents for the incident: An internal doc to document investigation, mitigation and notification, and an updates doc where we can post notifications to potentially impacted parties.
  • Proactively notify potentially impacted clients, regardless of incident severity, and point them to the updates doc for up-to-date information
  • Use identity as the basis on which to determine the scope of potentially breached information.
  • If cause can’t be identified in-house, seek external expertise until cause can be determined.
  • Once cause has been identified, perform key-rotation on all potentially impacted resources.
  • Notify impacted end-users if sensitive personal information has been breached.

We will test this plan with a realistic tabletop exercise to validate what works and what doesn’t.

Recovery

  • Every project that we work on will perform regular backups on a cadence that is sensible for the project.
  • We’ll test the restore process for each backed up resource on a regular basis.
  • We’ll store backups in at least two distinct cloud provider zones. We’re unlikely to require this level of replication for serving data stores.
  • We disable deletes of backups, relying exclusively on TTLs to initiate backup deletion.