Rohan Chakravarthy

Threat Modeling (for beginners)

Rohan Chakravarthy — Fri, 23 Sep 2022 07:11:30 GMT

A threat modeling exercise is an important step while designing secure applications. This was an extremely daunting process when I first started working on consumer-facing products. There is a lot of information available and no clear starting point. This post aims to provide that starting point.

As one of the first engineers on Amazon Care's core infra/security team, I worked closely with Amazon's Application Security (AppSec) org. I built critical authorization systems and libraries used across the organization. And as you might imagine, this required extensive security reviews and a LOT of threat models along the way.

This post will help you understand threat modeling fundamentals and how to incorporate it into your existing software development lifecycle. It is by no means a comprehensive guide, but you should get pretty far if you think through all the questions and suggestions below. I'll include a detailed worksheet in my next post.

If you'd like a deeper dive, Threat Modeling by Adam Schostack is one of my favorite resources on this topic.

Structuring Threats
Defining Trust Boundaries
Two Phases: Start top-down, end bottom-up
Defense in Depth
You are more qualified than you think

Structuring Threats

Let's start with a template for documenting threats. I find it helpful to create a table with the columns defined below. Every identified threat is a single row in this table.

Let's use a simple example to understand each of these columns - An attacker attempting to exfiltrate customer data through an administrative API.

	Description	Example
Attacker Goal	What is the attacker attempting to do?	Exfiltrating all our customer profiles
Threat Description / Attack runbook	How will the attacker accomplish their goal?	Steal an administrative user's credentials and query the admin APIs
Business Impact	What is the business impact of the attacker achieving their goal?	Loss of customer trust, brand risk
Risk Category	What category of risk is this?	Information Disclosure (I'll cover specific risk categories in my next post)
Mitigation(s)	How does your application mitigate this attack?	IdP with hardware MFA, short-lived credentials, limited access to admin APIs, rate limiting
Verification strategy for mitigation(s)	How will you validate that your mitigations work?	Manual tests, automated tests
Incident Discovery Mechanism	How will you know if this attack was successful despite your mitigations?	Alarms and anomaly detection on sensitive APIs
Incident Response Plan	How will you respond to the incident?	Revoke all tokens. Look at logs to identify which records were exfiltrated. Work with Legal and Compliance teams to identify next steps

Trust Boundaries

A trust boundary is a logical demarcation in your system beyond which all principals (users, applications, systems) require additional checks. Everything inside the boundary has the same trust level. Everything outside the boundary requires additional (or different) checks. Common examples include VPNs, Virtual Private Clouds (VPCs), and even AWS accounts. There can be many layers of trust boundaries. For example, each of the boxes below could be a trust boundary:

Trust Boundaries

Notice how there is overlap. The same "box" could be part of multiple trust boundaries, and you can reason about them separately. Here are some potential trust boundaries in the diagram above:

Trust Boundary	Potential Trust policies
AWS Account	Each AWS Account could have console access protected via IAM users or AWS SSO
Subnet	Each Subnet could have its own security group to limit outside access, but resources within it can talk to each other
VPC	The VPC might have network ACL rules to control outbound internet access for all subnets but might allow all Subnet <-> Subnet communication
AWS organization	Resource policies could limit resource access to accounts in the org

Define trust boundaries early. If you have an idea of the checks required for a request, data or user to cross each trust boundary early on, it greatly increases your chances of building a secure system. As a result, my recommendation is to define these trust boundaries during the system architecture/design phase.

Two Phases: Start top-down, end bottom-up

I recommend going through the threat modeling exercise twice, especially for larger systems.

The first pass is a broader, top-down approach. From a timeline perspective, this is around the time you will be finalizing your system architecture. Think about threats to your service overall, ignoring low-level component details. This helps to identify major threats introduced by your approach before you spend valuable development cycles. It is a lot cheaper to re-architect a system before building it.

The second pass is a more specific, bottom-up approach. Think about threats introduced by specific components you are using: Are there documented anti-patterns for your components? Are you using trusted sources for open-source libraries? Does the SaaS service you are using have robust auth mechanisms in place to protect your customer data?

If your organization invests in low-level design documents ( documents with details about specific components you will use - libraries, cloud resources, endpoints, auth mechanisms), you should complete your second pass after writing those documents. Otherwise, perform the second pass towards the end of your development lifecycle.

Defense in Depth

Defense in depth (or the "Swiss Cheese" model) is the strategy of implementing multiple (and sometimes redundant) layers of protection.

It borrows its name from a military strategy, and has the same goal - use preventative measures to slow down an attack and give yourself time to detect and react to it. In addition to slowing down (or discouraging) attackers, it also ensures your systems don't have a single point of failure. Ideally, a single bad commit or config change shouldn't bring down your application or open it up to malicious actors.

Utilize this strategy while defining mitigations for the threats you have identified. For example, a public-facing endpoint can add multiple layers of security, including:

Use Web Application Firewall (WAF) rules to block known bot IPs
Add rate limiting rules
Require a valid auth token
Validate the caller has access to the resources they are retrieving

Defense in Depth for an API

Threat Identification Model

There are many models you can use to identify your threats. I find STRIDE to be thorough and easy to reason about for applications deployed in a cloud environment, but feel free to use another model - they are all just structured ways to identify threats.

You are more qualified than you think

Threat modeling identifies the security risks introduced by new versions of your application. It requires an intricate knowledge of your architecture, system design and application logic. This actually makes you one of the most qualified people to build the threat model for your application. You understand it better than any outside party!

Start with the guidance in this post, but also use your familiarity with the systems to identify creative ways in which attackers could compromise your application.

I'll publish a post soon with a detailed STRIDE worksheet to identify threats. As always, you can reach me @rohchak

AWS VPC Subnet Groups

Rohan Chakravarthy — Mon, 13 Jun 2022 03:37:07 GMT

The L2 VPC cdk construct accepts a list of Subnet Groups in the subnetConfiguration property. Subnet Groups only seem to be documented in the context of an Elasticache cluster, so I'll provide a quick breakdown of how they work in the context of VPCs:

Subnets are assigned CIDR blocks in the order they are defined
Subnet groups are deployed to every AZ
You can "Reserve" subnet blocks without deploying a subnet resource

For reference, a Subnet Group (configured using a SubnetConfiguration instance) has the following configurable properties:

Property	Description
name	string
subnet_type	valid values: `PRIVATE_ISOLATED`, `PRIVATE_WITH_NAT`, `PUBLIC`
cidr_mask	valid values: `16-28`
map_public_ip_on_launch	`true` by default for public subnets
reserved	`false` by default. more details below

Subnets are assigned CIDR blocks in the order they are defined

Every Subnet Group entry in the subnetConfiguration list is assigned a CIDR block based on the VPC cidr property and the subnet group cidrMask property.

Let's say you have a VPC (cidr:10.0.0.0/16) with a single AZ with 3 entries (cidrMask:24) in the subnetConfiguration list:

const vpc = new Vpc(this, 'lambda-vpc', {
    'cidr': "10.0.0.0/16",
    'maxAzs': 1,
    'subnetConfiguration': [{
        cidrMask: 24,
        name: 'the-shy-one',
        subnetType: SubnetType.PRIVATE_ISOLATED,
    },
    {
        cidrMask: 24,
        name: 'the-cute-one',
        subnetType: SubnetType.PRIVATE_WITH_NAT
    },
    {
        cidrMask: 24,
        name: 'the-rebel',
        subnetType: SubnetType.PUBLIC
    }
    ],
    'vpcName': 'generic-boy-band'
})

This VPC will have 3 subnets with the following blocks: 10.0.0.0/24, 10.0.0.1/24 and 10.0.0.2/24.

If the VPC has 2 AZs instead, there will be 2 blocks per subnet group - 10.0.0.0/24 and 10.0.0.1/24 for the first subnet group, 10.0.0.2/24 and 10.0.0.3/24 for the second subnet group and so on.

Trying to add a new subnet group entry in the middle of the configuration list after the initial deployment will not work. For example, if we modify the example above:

const vpc = new Vpc(this, 'lambda-vpc', {
    'cidr': "10.0.0.0/16",
    'maxAzs': 1,
    'subnetConfiguration': [{
        cidrMask: 24,
        name: 'the-shy-one',
        subnetType: SubnetType.PRIVATE_ISOLATED,
    },
    {
        cidrMask: 24,
        name: 'the-cute-one',
        subnetType: SubnetType.PRIVATE_WITH_NAT
    },
    // NEW ENTRY
    {
        cidrMask: 24,
        name: 'the-copy-cat',
        subnetType: SubnetType.PRIVATE_WITH_NAT
    },
    {
        cidrMask: 24,
        name: 'the-rebel',
        subnetType: SubnetType.PUBLIC
    }
    ],
    'vpcName': 'generic-boy-band'
})

updated VPC config

It will fail with the error:

❗

Resource handler returned message: "The CIDR [..] conflicts with another subnet"

Subnet groups are deployed to every AZ

Every Subnet Group entry in the subnetConfiguration list creates a subnet per AZ in the VPC. There is no way to specify different AZs for different subnet groups, nor can you limit a subnet to a single AZ. For example, say your VPC is configured with 2 AZs. You can't have SUBNET-GROUP-1 subnets deployed in us-west-2a and us-west-2b, and SUBNET-GROUP-2 deployed in us-west-2b only, at least with the L2 VPC construct.

This seemed odd to me, so I posted a question about it on the AWS subreddit. See the discussion in that thread for more details, but here's why I am convinced this is a good default:

AZ-aware AWS resources allow specifying the number of AZs to deploy resources into. For example, you can configure an EC2 auto-scaling group to only deploy 2 instances, even if you have 3 AZs available. If I'm creating a VPC with multiple AZs, it means I anticipate needing higher availability guarantees, even if I don't need it immediately for all my applications. Thus, the default subnet group configuration ensures I have that reserved address space whenever I choose to start using the additional AZs (and it does not cost anything).

This also ensures all subnets in a subnet group form one contiguous block of CIDR address ranges. This simplifies rules for similar subnets. This page has some examples that show how contiguous blocks can be useful.

"Reserved" subnet blocks

Subnet Group configurations also provide a reserved boolean property. Read a detailed description here, but this property essentially allows you to block certain CIDR blocks without actually creating the subnet resource.

Example: I have a VPC configured with a single AZ. There are 2 kinds of subnets in this VPC, "application" and "database" subnets. All resources in the "application" subnet will have similar access, which will differ from the access granted to resources in the "database" subnet. I expect to eventually need 5 "application" subnets and 2 "database" subnets, but I only need 2 "application" subnets and a single "database" subnet today.

Note: Remember CIDR blocks are allocated in the order you define subnet groups, and you cannot add new groups in the middle of the configuration list.

Option 1: Define 3 subnet groups corresponding to the 3 subnets I need today. Add new subnet group entries as needed.

const vpc = new Vpc(this, 'lambda-vpc', {
    'cidr': "10.0.0.0/16",
    'maxAzs': 1,
    'subnetConfiguration': [{
        cidrMask: 24,
        name: 'application-1',
        subnetType: SubnetType.PRIVATE_WITH_NAT,
    },
    {
        cidrMask: 24,
        name: 'application-2',
        subnetType: SubnetType.PRIVATE_WITH_NAT
    },
    {
        cidrMask: 24,
        name: 'database-1',
        subnetType: SubnetType.PRIVATE
    }],
    'vpcName': 'fake-org'
})

Behavior: Each new subnet group will block a new CIDR block. "Application" and "database" subnets will be interspersed, making rules and configurations based on IP address ranges difficult to manage.

Option 2: Define 5 "application" subnet groups and 2 "database" subnet groups in order today. The 3 subnet groups I need today will have the reserved property set to false (default). The others will have it set to true.

const vpc = new Vpc(this, 'lambda-vpc', {
    'cidr': "10.0.0.0/16",
    'maxAzs': 1,
    'subnetConfiguration': [{
        cidrMask: 24,
        name: 'application-1',
        subnetType: SubnetType.PRIVATE_WITH_NAT,
    },
    {
        cidrMask: 24,
        name: 'application-2',
        subnetType: SubnetType.PRIVATE_WITH_NAT
    },
    {
        cidrMask: 24,
        name: 'application-3',
        subnetType: SubnetType.PRIVATE_WITH_NAT,
        reserved: true
    },
    {
        cidrMask: 24,
        name: 'application-4',
        subnetType: SubnetType.PRIVATE_WITH_NAT,
        reserved: true
    },
    {
        cidrMask: 24,
        name: 'application-5',
        subnetType: SubnetType.PRIVATE_WITH_NAT,
        reserved: true
    },
    {
        cidrMask: 24,
        name: 'database-1',
        subnetType: SubnetType.PRIVATE
    },
    {
        cidrMask: 24,
        name: 'database-2',
        subnetType: SubnetType.PRIVATE,
        reserved: true
    }],
    'vpcName': 'fake-org'
})

Behavior: "Application" and "database" subnets will have their own IP address ranges. There are 4 subnet CIDR ranges that have been reserved, but have no associated resources. Over time, the 4 reserved subnets can be deployed as subnet resources.

That's it for VPC subnet groups. Look for a post soon on how to deploy a VPC with security best practices using CDK.

As always, you can reach me @rohchak

Lets Encrypt + Haproxy

Rohan Chakravarthy — Wed, 08 Jun 2022 07:53:26 GMT

I've used a few different approaches for renewing the Let's Encrypt certs for my domain over the years, but I recently found this great docker image that encapsulates everything into a single container.

Create an haproxy cfg file

Here's my Haproxy config file, slightly modified from the one provided in the repo since I'm serving my site on port 8080 locally. I've stored it in /etc/haproxy/haproxy.cfg

global
    maxconn 20480
    ############# IMPORTANT #################################
    ## DO NOT SET CHROOT OTHERWISE YOU HAVE TO CHANGE THE  ##
    ## acme-http01-webroot.lua file                        ##
    # chroot /jail                                         ##
    #########################################################
    lua-load /etc/haproxy/acme-http01-webroot.lua
    #
    # SSL options
    ssl-default-bind-ciphers AES256+EECDH:AES256+EDH:!aNULL;
    tune.ssl.default-dh-param 4096

    # workaround for bug #14 (Cert renewal blocks HAProxy indefinitely with Websocket connections)
    hard-stop-after 3s

# DNS runt-time resolution on backend hosts
resolvers docker
    nameserver dns "127.0.0.11:53"

defaults
    log global
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    option forwardfor
    option httplog

    # never fail on address resolution
    default-server init-addr last,libc,none

frontend http
    bind *:80
    mode http
    acl url_acme_http01 path_beg /.well-known/acme-challenge/
    http-request use-service lua.acme-http01 if METH_GET url_acme_http01
    redirect scheme https code 301 if !{ ssl_fc }

frontend https
    bind *:443 ssl crt /etc/haproxy/certs/ no-sslv3 no-tls-tickets no-tlsv10 no-tlsv11
    http-response set-header Strict-Transport-Security "max-age=16000000; includeSubDomains; preload;"
    default_backend www

backend www
    server ghost localhost:8080 check
    http-request add-header X-Forwarded-Proto https if { ssl_fc }

Run the docker container

Once you have the haproxy file set up, run the following command:

DOMAINS="[YOUR COMMA SEPARATED LIST OF DOMAINS]"
EMAIL="[YOUR EMAIL]"
docker run --name haproxy -d \
    --net="host" \
    -e CERTS=$DOMAINS \
    -e EMAIL=$EMAIL \
    -e STAGING=false \
    --restart=always \
    -v /home/ubuntu/haproxy:/etc/haproxy \
    -p 80:80 -p 443:443 \
    ghcr.io/tomdess/docker-haproxy-certbot:master

That's it! The container runs a cron job that checks your cert weekly and updates it if required

AWS Subscription Required: The AWS Access Key Id needs a subscription for the service

Rohan Chakravarthy — Wed, 08 Jun 2022 07:52:14 GMT

I hit an odd error while bootstrapping a new account through cdk bootstrap:

SubscriptionRequiredException: The AWS Access Key Id needs a subscription for the service

I found a few different reasons for this error, but they all essentially boil down to trying to use a feature that is not enabled in your account. I tried running some other commands but I soon realized all resource creation was disabled in my account.

I'd created this account through my AWS Organization so I initially thought it might just be a delay in account setup. However, I kept seeing this error even after waiting for an hour.

Solution

I started poking around my accounts, and eventually realized I had an overdue bill in the main organization account due to a cancelled credit card. Paying the bill immediately resolved this error!

Other Potential Reasons

Here are other links that cover other reasons for this error:

AWS China regions don't support WAF: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1579
Services launched after account creation might not be enabled for your account: https://aws.amazon.com/premiumsupport/knowledge-center/error-access-service/

Moving Towards High-Value Health Care in the US (Introduction)

Rohan Chakravarthy — Mon, 04 Jan 2021 03:02:00 GMT

It's taken me way too long to do an in-depth study of the current state of healthcare in the US considering building software solutions to provide quality care is now my full time job!

I think we all know something is wrong with the healthcare system in the US. The US spends more on health care than any other country in the world. 1/3rd of all the funds raised via GoFundMe are for medical expenses. Despite this, health outcomes are not any better than those in other developed countries. The US is actually worse in some common health metrics like life expectancy, infant mortality, and unmanaged diabetes.

I've been reading through Dave Chase's "The CEO's guide to restoring the American Dream": How to Deliver World Class Health Care to your Employees at Half the Cost and working my way through the Health Informatics in the Cloud course so I decided to summarize what I'm learning in a series of blog posts. The structure of the posts is probably going to follow that of the book, though the content is also sourced from other papers and articles (which I've referenced), and conclusions I've drawn from said content. I am most definitely relatively new to this space, so feel free to point out any flawed conclusions.

With this first post, I'm going to provide some background and talk about the status quo in the US healthcare system (as I understand it).

Disclaimer: All opinions are my own

Terminology

Medicare: Federal Health Insurance that covers people over 65 (80+% of beneficiaries), and younger people with certain disabilities and chronic conditions. Part A (Hospital Insurance) covers in-patient treatments and Part B (Medical Insurance) covers outpatient care, medical supplies and preventative care
Acute Conditions: These are time-sensitive conditions that generally respond to treatment and which can reach a resolution. Conditions that require urgent, emergency or critical care fall into this category. This covers conditions ranging from a broken ankle to heart attacks.
Chronic Conditions: By definition a Chronic condition is not curable. The goal while treating a chronic condition is disease management to improve the patient's quality of life. With improvements in medicine, many previously "Terminal" conditions are now Chronic. Examples include Diabetes, Hypertension and Cancer.

Some crazy numbers

The Health Care Industry spent $1.2 Billion lobbying to influence the Affordable Care Act in 2009. That seems like a lot, but annual healthcare spending in the US was $3.8 Trillion in 2019. The lobbying money was a drop in the bucket.
There are 0 laws that hold healthcare organizations responsible for misdiagnosis. Meanwhile 1/3rd of all deaths in the US are caused by medical errors, 5% of all diagnoses are incorrect, and >20% of incorrect diagnoses cause life-altering or life-changing consequences
FDA approvals don't mean as much as you'd expect. 57% of cancer medication approved by the FDA between 2008-2012 has unknown effects on overall survival or failed to show gains in survival rates
Almost 50% of the Adult Population has at least 1 chronic condition in the US, and 27% has 2 or more.
More than 80% of adults over the age of 65 have at least 1 chronic disease

This number is >60% for 2 or more and >20% for 5 or more!
Patients with 5 or more chronic conditions account for 67% of Medicare expenditure

History

Surprisingly, the employer-provided health insurance model that is prevalent today can be traced back to a WW2 policy to prevent hyper-inflation!
Up until the late 1930s, individuals paid most health-care costs out of pocket and relied on individual health insurance plans to offset any unforeseen large expenses. During WW2, the US government introduced price and wage controls in an attempt to prevent hyper-inflation. However, in a concession to placate labor groups, employer-sponsored health benefits were excluded from this wage cap. This resulted in employers offering increasingly elaborate health benefits as a means to attract employees. The IRS subsequently made all such health benefits tax exempt for both employers and employees.

This resulted in the current status quo, with employer-provided health insurance being the norm for the following reasons:

Employers were now against any kind of reform that resulted in health benefits being taxed, since this meant that payroll taxes would go up
Since health benefits now covered more than just unforeseen large expenses, employees were more likely to visit doctors and hospitals. In theory, this seems ideal, but in practice this incentivizes hospital systems to increase prices since the employee often doesn't see the actual cost. Thus, Hospitals were now also incentivized to oppose any such reform.
Insurance Providers had a much larger pool of covered individuals, and had customers (employers) who were willing to pay higher premiums over time
Buying individual health insurance became more expensive than opting for an employer-provided plan

Current State

Over time, this move towards employer-sponsored heath care led to what is arguably an extremely broken health care system we see today.

Annual Healthcare Premium Increases

Over time, annual increases in healthcare premiums became the norm. Most businesses expect a 11-14% annual increase in costs, and insurance brokers take advantage of this knowledge to bump up per-employee premiums annually. In contrast, median middle class wages increased by ~1% annually from 2010-2016. The majority of an employer's per-employee payroll cost increase never reaches the employee! This doesn't even account for the fact that annual out of pocket healthcare expenses continue to increase for the employee anyway.

More Specialty Care, Less Primary Care

Employee demands for benefits increased, resulting in offerings like High Deductible Health Plans(HDHPs). HDHPs are intended to incentivize consumer-driven healthcare, giving access to a huge number of providers and specialists. However, this results in patients choosing Specialists over Primary Care Physicians. Specialists dealing with Acute Conditions tend to prescribe a lot more expensive tests than a PCP, leading to increased costs.
In addition, patients with Chronic health conditions see multiple specialists within the span of a year, with very little communication between them. A patient with 5 or more chronic conditions will, on average, see 14 providers and fill 50 prescriptions every year, for the rest of their lives. The lack of communication can lead to multiple adverse consequences, including, but not limited to, bad medication interactions and unchecked compounding effects leading to serious, acute-care episodes.
This kind of single-condition, specialist-based model, along with a lack of emphasis on preventative care, is why you will often read that the US healthcare system rewards acute care over other forms of care. If the emphasis was on Primary Care, we would have a lot more preventative care, leading to fewer instances of chronic conditions caused by lifestyle factors and lower overall medical care costs.

Obfuscated Pricing

The current system has created a cycle of bad incentives. Hospitals charge higher prices on paper for procedures. The insurance providers "negotiate" down the prices, supposedly on behalf of the customer paying the insurance premium. They use the price drop as a way to prove their value to customers. However, the following is happening behind the hood:

Hospitals know they are never going to see the entire on-paper price of a service from insurance, so they continue to hike the prices
Insurance Providers don't care about the price quoted. They only care about the actual charge, which is significantly lower after they "negotiate" the price down
The larger the drop in prices, the more the insurance provider has "saved" their customers
No matter what the final charge ends up being, the insurance provider gets a cut of that charge
As a result, the insurance provider is making money with higher premiums And via the cut from the payment to the hospital

This is why if patients don't share their insurance information and instead ask for cash-only prices, they often see ridiculous drops in prices. The book talks about an extreme example where an MRI cost $3500 via insurance, and $475 with cash.

Impact on Small Business & Lower-Income Employees

Offering healthcare benefits is now relatively expensive for smaller businesses, and not mandatory by law. As a result only about 30% of businesses with <50 employees offer health benefits. This means employees with the lowest average income also end up having to pay out of pocket for health insurance.

Even when low-income employees (making <$25k annually) end up getting company provided insurance, their premiums are comparable to those paid by high income employees OR they get fewer benefits. They are also the least likely to see any kind of tax benefits from healthcare benefits exemptions.

Regulations (or lack thereof)

From my (basically outsider) perspective, there is a shockingly little regulation on some of the core contributors to the current state of health care. I'm sure the health care industry's massive lobbying power has nothing to do with it.

Quality of Care

Health Care providers are not accountable for the quality of care provided to their patients. Indeed, with the focus on short, acute-care episodes there was no real standard way to measure the quality. Some states, such as Ohio, have recently started down the path of assessing quality based on pre-defined episode-based care.

Studies have shown that in patient visits preceding hospitalizations, 20% of diagnoses were incorrect. Overall, 5% of all diagnoses are incorrect. This leads to money spent on treatments and medication that has no benefit (and could have harm), over-treatment and, in some cases, death. A Johns Hopkins study showed that medical errors are the 3rd leading cause of death in the US. Yet, there are no regulations that hold providers accountable for misdiagnosis.

It is important to note here that the providers themselves are under a huge amount of stress to see more patients in a shorter amount of time, and have a huge administrative burden due to hospital policies. A study showed that for every hour that a provider sees patients, they spend another 2 on administrative tasks. The System is broken.

Lack of Transparency

Insurance carriers have no legal obligation to share claims data with the employer paying the premiums. In many cases carriers refuse to share claims data, and when they do access is often provided to only a small subset of all claims. This lack of transparency prevents employers (the actual customers of this service) from performing any kind of basic discrepancy analysis. This also allows hospitals and care providers with lower quality care outcomes to charge disproportionately higher costs without any oversight.

Blatant Conflicts of Interests

The same insurance carrier can administer the health plan for an employer, as well as the hospital through which care is being provided. Hospitals are often Huge employers, which means a massive, guaranteed annual income stream for the insurance carrier. Along with the lack of transparency mentioned above, this means carriers will often not try to negotiate the price of services with these hospitals (to keep them on as clients), leading to higher premiums for other employers serviced by that carrier. This is a clear conflict of interest, since insurance carriers should technically be aiming to work in the interests of all their clients individually. In addition, there is no regulation preventing a hospital from owning a insurance carrier!

Insurance brokers are supposed to work on behalf of their clients to try and find the most attractive plan and insurance carrier for them. However, insurance carriers give brokers a year-end bonus based on their client retention rate. Brokers are not obligated to disclose this to their clients. Another case of a clear conflict of interest - the Broker is incentivized to get their clients to renew their plan, regardless of actual value.

Similarly, Benefits Managers can hire benefits "Consultants" paid for by insurance companies and brokers. The consultant's salary is being paid for by one of the manager's prospective health insurance options. Why would the consultant ever suggest an alternative?

Somehow, none of these things are required disclosures for hospitals, insurance providers or brokers.

Summary

This post is already way longer than I'd expected, so I'm going to stop here. There are a Lot of avenues for improvement, and there are multiple case studies proving that there are approaches to health care (even with the employer-paid model) that result in better overall health while significantly reducing health care costs. Technology can definitely play an important role in cutting costs and improving efficiency, but meaningful change requires both employers and patient to make deliberate choices informed by data and case studies. The next couple of posts will aim to summarize some of the proposed alternatives to the current state.

References:

[1] "The Growing Burden of Chronic Disease in America"

[2] "Prevalence of Multiple Chronic Conditions Among US Adults, 2018"

[3] "Medicare Payments: How Much Do Chronic Conditions Matter?"

[4] "Lobbyists swarm Capitol to influence health reform"

[5] "Cancer Drugs Approved on the Basis of a Surrogate End Point and Subsequent Overall Survival"

[6] "How Does Growth in Health Care Costs Affect the American Family?"

[7] "Private Firms offering Health Care by size"

[8] "Ohio's Episode-Based Care"

[9] "Health care is getting more and more expensive, and low-wage workers are bearing more of the cost"

[10] "Which income class are you?"

[11] "The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations"

[12] "Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties"

[13] "GoFundMe's place in the health care system"

Interesting announcements at Ignite 2018

Rohan Chakravarthy — Mon, 05 Nov 2018 18:19:09 GMT

I thought I'd share some of the announcements at Microsoft Ignite 2018 that I found really interesting. Obviously this is a subset of announcements, related only to products I've used or am planning to use in the future.

IoT Hub/Edge:

There’s a Jenkins plugin for edge modules that enables builds and deployments to the devices: https://azure.microsoft.com/en-us/blog/developer-tooling-improvements-for-azure-iot-edge/

IoT Edge Module Marketplace. A place for hardware manufacturers/3rd party software vendors to publish hardware specific modules: https://azure.microsoft.com/en-us/blog/publish-your-azure-iot-edge-modules-in-azure-marketplace/
Edge devices can now restart and establish connections to the edge hub even if the hub isn’t connected to the internet (Still requires a one-time sync): https://azure.microsoft.com/en-us/blog/extended-offline-operation-with-azure-iot-edge/
Device Twin property based routing in IoT Hub: https://azure.microsoft.com/en-us/blog/a-powerful-and-intuitive-way-to-route-device-messages-in-azure-iot-hub/
Digital Twins in public preview: https://azure.microsoft.com/en-us/blog/announcing-the-public-preview-of-azure-digital-twins/
The Device Provisioning service has much higher limits now
Edge support for Blobs: https://docs.microsoft.com/en-us/azure/iot-edge/how-to-store-data-blob

Cosmos DB:

Support for multiple masters (which allows scaling writes across regions): https://azure.microsoft.com/en-us/blog/azure-cosmos-db-database-for-intelligent-cloud-intelligent-edge-era/

Reserved Capacity (subscription plan vs pay-as-you-go): https://azure.microsoft.com/en-us/blog/announcing-general-availability-of-azure-cosmos-db-reserved-capacity/

Containers/Container Registry/K8s:

The Registry now has support for Helm Chart Repos, Docker Content Trust and ACR tasks: https://azure.microsoft.com/en-us/blog/azure-container-registry-public-preview-of-helm-chart-repositories-and-more/

ACR tasks are pretty cool – if you have multiple images dependent on a certain base image, you can trigger updated builds for all the images when you update the base image.
Also – I think it can do a lot of what pipelines can?
Azure Container Instances (ACI) can be deployed into existing VNETs: https://azure.microsoft.com/en-us/updates/aci-vnet/
K8S is available on Stack in preview

Ops:

As you may have already heard me say elsewhere, Azure Pipelines is AMAZING: https://azure.microsoft.com/en-us/blog/azure-pipelines-is-the-ci-cd-solution-for-any-language-any-platform-any-cloud/

Configuring a build is really clean. (I'll create a build using a public repo soon!)
Allows you to connect “service connections” like our docker registry to pull images from, k8s clusters, jenkins
Build steps can run on either the VM or within (one or more) containers
Allows defining multiple “jobs” that run in parallel
Extensions for deploying to AWS, Azure, K8S clusters
Resource specific alerts (configurable alerts for platform issues. This might be useful for alerting systems): https://azure.microsoft.com/en-us/blog/get-notified-when-your-azure-resources-become-unavailable/
Deployment Manager (for multi-stage/multi-region deployments): https://azure.microsoft.com/en-us/blog/azure-deployment-manager-now-in-public-preview/

Misc:

Recommendation system for models based on your dataset. I’m assuming this will work well for generic problems. Definitely something interesting to play around with!: https://azure.microsoft.com/en-us/blog/announcing-automated-ml-capability-in-azure-machine-learning/
HDInsight supports Hadoop 3.0:
https://azure.microsoft.com/en-us/blog/azure-hdinsight-brings-next-generation-hadoop-3-0-and-enterprise-security-to-the-cloud/
https://azure.microsoft.com/en-us/blog/deep-dive-into-azure-hdinsight-4-0/
Azure CDN is GA: https://azure.microsoft.com/en-us/blog/microsoft-s-content-delivery-network-is-now-generally-available/
Functions V2 is GA: https://azure.microsoft.com/en-us/blog/introducing-azure-functions-2-0/

Support for Java and Python is still in preview
Consumption plan for Linux is in preview
Event Hubs is available on Azure Stack
Service Fabric is available on Azure Stack

As always, feel free to reach out to me @rohchak if you have any questions!

Generate valid signed X509 client certificates with pyopenssl

Rohan Chakravarthy — Fri, 05 Oct 2018 19:18:56 GMT

I spent a while today trying to figure out why a certificate deemed valid by the openssl verify command was invalid on my Windows machine.

Errors such as:

"This certificate has an invalid digital signature."
"The integrity of this certificate cannot be guaranteed. The certificate may have been corrupted or may have been altered"

I used the pyopenssl library to generate my CA cert as well as the client certificate. You might already have an intermediate certificate and won't need to generate the CA cert. I'll add a link to working code at the end of this post. Feel free to scroll down if that's what you're interested in.

There are a couple of reasons why you might be seeing these errors:

Public Key Length is <1024 bits. See this stackoverflow post and this blog post for more details
Your Certificate Revocation List(CRL) Endpoints have been misconfigured or aren't reachable
You used the pyopenssl library and added the "subjectKeyIdentifier" X509Extension before setting the public key to use.

I'll let you guess which one I hit :)

Here's what happened. My signed client cert generation script looked something like this:

client_cert.add_extensions([
        crypto.X509Extension(b"authorityKeyIdentifier", False, b"keyid", issuer=root_ca_cert),
    ])

client_cert.add_extensions([
        crypto.X509Extension(b"subjectKeyIdentifier", False, b"hash", subject=client_cert),
    ])

client_cert.set_issuer(root_ca_subj)
client_cert.set_pubkey(client_key)

This took me a while to identify, but the client certificate generated had the extensions looking this:

X509v3 Authority Key Identifier: 
    keyid:DA:39:A3:EE:5E:6B:4B:0D:32:55:BF:EF:95:60:18:90:AF:D8:07:09
X509v3 Subject Key Identifier: 
    DA:39:A3:EE:5E:6B:4B:0D:32:55:BF:EF:95:60:18:90:AF:D8:07:09

Notice how the Subject Key and Authority Key are identical? The only time that should happen is in the case of a self signed cert.

Digging in further this key was actually the Subject Key of my CA certificate (The one I was trying to use to sign this client cert).

There's more to it, but at a high level if a CA (with Subject Key ID = CA_ski) is signing a client cert, the Client cert should have:

Subject Key ID = [something unique]

Authority Key ID = CA_ski

Alright. So how is this subject key id generated? Let's go look at the (theoretical) source of truth :). From RFC 5280:

For end entity certificates, subject key identifiers SHOULD be derived from the public key

And that's when I found my bug. If you look at my code, I'm setting the public key after adding the subjectKeyIdentifier extension. As a result the library seems to default to using the CA's key to generate the subjectKeyIdentifier.

The fix is to set the public key on the client cert before adding the subjectKeyIdentifier. So the code should look more like this:

client_cert.set_issuer(root_ca_subj)
client_cert.set_pubkey(client_key)

client_cert.add_extensions([
        crypto.X509Extension(b"authorityKeyIdentifier", False, b"keyid", issuer=root_ca_cert),
    ])

client_cert.add_extensions([
        crypto.X509Extension(b"subjectKeyIdentifier", False, b"hash", subject=client_cert),
    ])

The client cert generated by this will have the right authorityKeyIdentifier set and correctly generate a unique subjectKeyIdentifier based on the client public key!

Feel free to reach out to me @rohchak if you have any questions!

Here's the code:

CA Cert gen:

Signed Client Cert Gen:

SSSD with Active Directory on Ubuntu

Rohan Chakravarthy — Mon, 16 Jul 2018 00:05:19 GMT

We're in the middle of deploying multiple Hadoop clusters with different flavors. Since many of Azure's larger customers use an on-prem Active Directory forest for authentication, extending those identities and permissions to their Hadoop clusters was an important requirement.

Once the Hadoop cluster's been Kerberized, various security/identity features including user group mappings require SSSD (There are other methods, but I none of them seemed as secure - for eg: LDAP requires saving credentials in a file somewhere on disk)

I found many different install guides for getting SSSD with Active Directory working on Centos hosts and it always seemed like something was broken when it came to following the same steps on Ubuntu. I've included links to some of the resources I've used, but none of them worked exactly as advertised on Ubuntu.

The following steps will get you a domain-joined, Ubuntu 16.04 machine that allows SSH access using Active Directory credentials.

This guide does not include the steps to get a Kerberos Realm and KDC setup. There are many guides that go through that initial process. I've included some of those links at the end of my post.

Here's a description of the variables we'll use (Pay attention to the casing in the examples):

AD_DOMAIN: mydomain.local
AD_REALM: MYDOMAIN.LOCAL
WORKGROUP: MYDOMAIN

Install the relevant components

apt install -y krb5-user samba sssd chrony

Configure Samba for Netbios

vim /etc/samba/smb.conf

# Delete the workgroup line and add these:
   workgroup = WORKGROUP
   client signing = yes
   client use spnego = yes
   kerberos method = secrets and keytab
   realm = AD_REALM
   security = ads

Create the sssd conf file

vim /etc/sssd/sssd.conf

[sssd]
services = nss, pam, ssh, autofs, pac
config_file_version = 2
domains = AD_DOMAIN
override_space = _

[domain/AD_DOMAIN]
id_provider = ad
auth_provider = ad
chpass_provider = ad
access_provider = ad
enumerate = False
krb5_realm = AD_REALM
ldap_schema = ad
ldap_id_mapping = True
cache_credentials = True
ldap_access_order = expire
ldap_account_expire_policy = ad
ldap_force_upper_case_realm = true
fallback_homedir = /home/%d/%u
default_shell = /bin/false
ldap_referrals = true
use_fully_qualified_names = False

[nss]
memcache_timeout = 3600
override_shell = /bin/bash

Set sssd conf permissions

chown root:root /etc/sssd/sssd.conf
chmod 600 /etc/sssd/sssd.conf

Join the machine to the domain

You need a valid kerberos ticket for an Active Directory user with Domain Join privileges for this step

kinit domain_join_user@AD_REALM
net ads join -k

Ensure pam creates a new user's home directory on successful login

vim /etc/pam.d/common-session

# Add this line to the end
session optional                        pam_mkhomedir.so

Restart all the relevant services.

systemctl restart smbd.service nmbd.service
systemctl restart sssd.service

Test your config:

getent passwd ad_user@AD_REALM
sudo su - ad_user@AD_REALM

If that was successful, you're good to go! You should be able to SSH into this machine with your Active Directory credentials.

Troubleshooting:

SSSD conf typo:

If you've been unlucky, and had a typo in your sssd conf you may have to reboot your VM in safe mode and delete the sssd.conf file before continuing with boot.

Glitchy install:

I've had some machines where the install simply freezes and there's no way to successfully continue with the install. In those cases, I would recommend completely purging the installed components and restarting. Use the following commands to completely purge the installed components:

  apt remove --purge -y samba sssd chrony
  apt-get autoremove -y 
  apt-get purge -y samba samba-common

Debugging SSSD:

Add the debug_level = [1..9] statement under each section in sssd.conf you want to debug.

Here are some of the links that I've used as a reference:

Feel free to reach out to me @rohchak if you have any questions! - chances are I've worked my way through it :)

Deploying an AWS EMR cluster with on-prem/cross-cloud Active Directory Authentication

Rohan Chakravarthy — Mon, 09 Jul 2018 20:48:30 GMT

If you're ever in the enviable position of having to get your AWS Elastic Map Reduce (EMR) cluster authenticating against an on-prem/cross-cloud Active Directory instance this post is for you!

Let's break this down into the separate pieces we're going to need:

A VPN/Direct-Connect connection to the on-prem/cross-cloud Active Directory network
Kerberos Authentication

AWS actually has all of this pretty well documented, so I'm not going to list individual steps. However, I'll list a couple of gotchas that ended up taking us a couple of days to work through.

First, the resources:

Setting up a VPN connection to your AD network: https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/SetUpVPNConnections.html
Instead of deploying an Windows Server EC2 instance, use a machine on your internal network/your router and work through the steps.
Deploying a Kerberized EMR cluster with a cross-realm AD trust: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-kerberos-cross-realm.html

Things to look out for:

VPN

You will need to initiate a connection from your external network to the AWS VPC to activate the VPN after you configure it. The easiest way to do this is to allow incoming ICMP packets to an existing EC2 instance in your VPC and ping it.

Cross-Realm Trust

While specifying the DHCP option set to specify your AD DC as a DNS server, there is a line that specifies:xx.xx.xx.xx,AmazonProvidedDNS. You literally have to enter the string AmazonProvidedDNS after the IP of your DC. I've never used a DHCP option set before so that tripped me up for bit.
Follow the casing exactly as specified in the post for the realms, domains and servers
You do not have to add individual users. The EMR deployment handles PAM and sssd configs. If your cluster has been set up correctly, you should be able to ssh into the cluster with AD_username@yourdomain.com and the user's AD password. The first time you login as a user the user's home directory is automatically created, and a kerberos ticket is requested.
You can absolutely configure the trust to be transitive. Not sure why their documentation specifies a non-transitive one (This is something that can be changed after initial deployment so is not a big deal)

Slow Kerberos auth/tickets?

Kerberos tries using UDP before TCP by default. Switching to TCP significantly sped things up. Add the following line to the [libdefaults] section
udp_preference_limit = 1. This will prioritize TCP over UDP.

Feel free to reach out to me @rohchak if you have any questions!

Monitoring Kubernetes with Prometheus 2.1+, Grafana 5.1+ and Helm

Rohan Chakravarthy — Tue, 26 Jun 2018 23:05:16 GMT

I recently deployed a new Kubernetes cluster and needed to get my usual Prometheus + Grafana monitoring set up. For my last few deployments I've used a Helm chart that is at least 6 months old, so I thought I'd go with the latest and greatest this time around - and I'm so glad I did!

There have been some really nifty improvements. My favorite is being able to specify datasources and dashboards while installing the Grafana chart. In my case this meant I could install Prometheus and configure my Grafana dashboards to use it as a datasource during install - with absolutely zero manual configuration.

I was planning to create a single Helm chart that installs both these charts, but Helm doesn't currently surface the extremely helpful notes from the Prometheus and Grafana charts. There's an open Helm Issue with a PR, so hopefully that gets resolved soon! Until then:

Prereqs

If your cluster is RBAC enabled, make sure you've created a service account for tiller and have it bound to an appropriate role. I'm generally the only one using these smaller clusters so I just take the lazy way out and bind it to the cluster admin role. This Bitnani post is a great resource if you want to limit what your tiller deployment can do.
If your cluster is not RBAC enabled, be sure to disable RBAC for both the Grafana and Prometheus charts.
Ensure you've updated your helm repo. This threw me off for a bit (alright, 2 hours) because the values in stable weren't what I was seeing in the Github charts repo (That's what I get for not using helm for a couple of months :/)

Quick install:

First Prometheus, so we have a working datasource:
helm install stable/prometheus --version 6.7.4 --name my-prometheus

Next, we're going to deploy Grafana with some dashboards configured to pull data from our Prometheus instance. I've included both the official dashboard from Prometheus as well as one that provides cluster and pod-level information:
helm install --name my-grafana stable/grafana --version 1.11.6 -f values.yml

Here's the values.yml file I used:

Follow the instructions from the Grafana chart notes and when you login you should see your dashboards already pulling data!

Feel free to reach out to me @rohchak if you have any questions!

Kubernetes for Everything! Part 2 - On demand Jenkins build agents

Rohan Chakravarthy — Mon, 31 Jul 2017 05:00:05 GMT

When I found out Kubernetes had support for Windows containers, I was pretty excited. I work with applications running on both Operating Systems so this opens up a lot of opportunities.

I plan to explore building a CI/CD pipeline that can scale based on load, set up monitoring (both cluster and application logs) and deploy both .NET apps in Windows containers and other apps in Linux containers — all on Kubernetes.

This is part 2 in a series, in which I explore spinning up on-demand build pods that run the builds, publish the artifacts to Azure blob storage and are then destroyed. This has a couple of advantages:

A clean build environment:
We own a lot of .NET projects, some of which have been around for a while and use different versions of the framework. That can sometimes mean our build machines have multiple versions of Nuget, MSBuild and .NET; which has tripped up our builds more than once. This allows us to define multiple docker images, each with it's own version of the framework and associated tools. As you'll see, the base image stays the same - the only difference is in the version of .NET we install.
Less resource wastage:
There is also the added advantage of not having Jenkins build agents just idling away, using cluster resources when there are no builds.

Part 2: Jenkins with on-demand agents

This post assumes you have a Jenkins master pod deployed on your cluster already. If not, Part 1 goes through that initial setup. Let's get started!

Setting up the Kubernetes plugin

Before you set up builds, you'll need to configure the plugin so it can talk to your cluster.

Install the plugin
Configure the plugin in global settings. The important fields here are:
- Jenkins URL: the internal kubernetes service URL assigned to your Jenkins master service
- Container Cleanup Timeout: This is the amount of time after which the plugin destroys a build pod. This one is particularly important for larger windows server images, since the larger images can take a while to pull, initialize and run the build. For me, 15 mins worked well even for some of our larger projects, but this is something you can play around with to get right.

Jenkinsfile for Ubuntu based builds

Once you've set up the plugin and got a multistage pipeline setup for a repository, the plugin allows for some really cool use cases. For example, it allows you to define multiple build containers in a single build pod, perform container specific actions within those containers, and pass the output to another container in the same pod. Underneath the hood, it achieves this by using shared volumes.

So if you decide you want to build a docker image from your latest commit and then deploy your code on a Kubernetes cluster, only if your branch is master, this is a valid Jenkinsfile:

Notice how we can run certain commands in the context of specific containers. Another important point to note is that the plugin uses the default jnlp image if you don't specify a containerTemplate with it's name set to jnlp. This is an important point when we move to Windows builds.

Base image for Windows based builds

Kubernetes currently only supports one Windows container per pod. Unfortunately, that means we can't take advantage of specialized containers within the Jenkinsfile like we did with the Ubuntu builds. Instead I built a base windowsservercore image, and then added specific packages to make them specialized. I used the windows image here from my previous post as the base, but with chocolatey and git installed. Chocolatey is a package for Windows that allows us to run headless package installations. Add these lines to your Dockerfile to install chocolatey:

# Install git through chocolatey and add git to the path
ENV chocolateyUseWindowsCompression false
RUN iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1')); \
    choco install -v -y git

Using chocolatey, we can install almost any package that we'd need for builds. Here's a snippet for .NET 4.5.2:

RUN choco install netfx-4.5.2-devpack

MSBuild for VS2017 also has a standalone package that comes without the entire VS2017 package:

# Install msbuild (vs2017) and add to PATH
RUN Invoke-WebRequest "https://aka.ms/vs/15/release/vs_BuildTools.exe" -OutFile vs_BuildTools.exe -UseBasicParsing ; \
        Start-Process -FilePath 'vs_BuildTools.exe' -ArgumentList '--quiet', '--norestart', '--locale en-US' -Wait ; \
        Remove-Item .\vs_BuildTools.exe ; \
        Remove-Item -Force -Recurse 'C:\Program Files (x86)\Microsoft Visual Studio\Installer'
RUN setx /M PATH $($Env:PATH + ';' + ${Env:ProgramFiles(x86)} + '\Microsoft Visual Studio\2017\BuildTools\MSBuild\15.0\Bin')

You can see that once we have that base image set up, everything else is as simple as adding a couple of extra packages for different build environments.

Note: There seems to be a bug of some kind while mapping volumes in Kubernetes with Windows. If you set C:\Jenkins as your build folder, you'll see an error along the lines of \ContainerVolumes .. is not valid. The workaround is to mount the folder as a separate drive, and use it for your builds:

# For some reason just using C:\Jenkins does not work - it tries to map to \ContainerVolumes in k8s. The workaround is to mount the folder as a drive and use it as the working directory for builds
RUN set-itemproperty -path 'HKLM:\SYSTEM\CurrentControlSet\Control\Session Manager\DOS Devices' -Name 'G:' -Value '\??\C:\Jenkins' -Type String

Jenkinsfile for Windows based builds

Here's an example of a Jenkinsfile that I've used to build one of our .NET projects:

After a successful build, it uploads the build artifact to Azure blob storage using a small script I wrote. Run the script with the --help flag for all the options.

Final thoughts

Having never worked with headless installations in Windows before, discovering and using Chocolatey was amazingly helpful. Although the packages come with no guarantees for production environments, I've not had a problem with any of them so far. Kicking off builds requiring a version of the .NET framework not on our build machine was a tedious process, and this setup definitely makes that process much easier.

There are a lot of good examples of what you can do with the Jenkins Kubernetes plugin on their Github page. They're written specifically with respect to the Ubuntu jnlp image though.

There was a brief bug in the ACS-engine deployment of Kubernetes 1.6.6 which resulted in our windows containers not having any internet connectivity. That was frustrating, but very quickly fixed. 1.7 now has added support for managed disks on Azure, which should be interesting to play around with as well!

Next up, Monitoring!

Feel free to reach out to me @rohchak if you have any questions!

[Update 2018/06/26: Monitoring, the post I was supposed to write a year ago]

Kubernetes for Everything! (With Windows and Linux on Azure) Part 1 - Jenkins

Rohan Chakravarthy — Wed, 19 Jul 2017 09:36:10 GMT

When I found out Kubernetes had support for Windows containers, I was pretty excited. I work with applications running on both Operating Systems so this opens up a lot of opportunities.

I hope to share what I've learnt through these posts, starting with employing our favorite butler!

Part 1: Jenkins on a hybrid Windows/Linux Kubernetes cluster

In this post, I'll explain how to get a traditional Jenkins cluster with one Ubuntu and one Windows agent working. In the next one, I'll talk about on-demand dynamic agents that only spin up for a build, save the artifact and are then shut down - clean build environments and no wasted resources!

Deploying the cluster on Azure

Deploying a hybrid Windows/linux cluster isn't supported directly through the Azure Container Service (ACS) command line tools or the portal, so we need to generate a custom ARM template. The open source acs-engine codebase makes that really easy.

Just follow the instructions here to run and build acs-engine inside a container and then generate the Kubernetes ARM template in the _output folder.
For reference, this is what my kubernetes.json file looks like

*** Note that if/when you want to update the cluster (modify an existing agent pool, add a new pool, etc) you should use the generated apimodel.json file instead of kubernetes.json so you keep all the same cert info, etc.**

Deploy the Jenkins master pod on a Linux node

I've used the official Jenkins Dockerhub image for master.

All we need to do here is create the deployment and service in Kubernetes and optionally add a persistent volume. I also like to create storage classes to differentiate between using SSDs and HDDs.

(Run kubectl apply -f [filename] for all the YAML files)

Create the storage classes (optional)

Create the persistent storage volume

Create the service and deployment

Note the nodeselector block

Configure Jenkins master

Figure out the name of the pod running Jenkins:
kubectl get pods
Get the password:

kubectl exec [POD_NAME] cat /var/jenkins_home/secrets/initialAdminPassword

Navigate to the IP reserved for the Jenkins-master service and enter the password.
Click through the rest of the setup and you're done!

Build the Linux agent

Add a new Jenkins agent through the UI

![New Agent](https://cdn.rawgit.com/rchakra3/static-assets/5a2d8018/jenkins-kubernetes/permanent-agents/Jenkins-ubuntu-agent.png)

Once you create the agent you'll see a screen with details on setting up an agent. We're only interested in the secret.

Pass the secret in as an argument to the docker build:

Build the Windows agent

Use the same steps to set up a new Jenkins agent and get a new secret to pass to the windows container build:

This will just get your cluster running. Here's a link to one that installs some basic .NET build tools.

Create the service and deployment for both

Again, note the nodeSelector to ensure it gets scheduled on the right nodes (based on OS)

Ubuntu:

Windows:

Final thoughts

This initial setup was really not too difficult - the only thing that was an issue for me was that the windows agent connection kept timing out if I set JENKINS_JNLP_URL to the public IP. As soon as I set it to the internal Kubernetes service IP things started running smoothly.

I'm excited to see how everything else works out!

Next up - dynamic, on-demand Windows/Linux agents using the Jenkins Kubernetes plugin!

Feel free to reach out to me @rohchak if you have any questions!

Update: Here's a link to Part 2!

Rohan Chakravarthy

Threat Modeling (for beginners)

Table of Contents

Structuring Threats

Trust Boundaries

Two Phases: Start top-down, end bottom-up

Defense in Depth

Threat Identification Model

You are more qualified than you think

AWS VPC Subnet Groups

Subnets are assigned CIDR blocks in the order they are defined

Subnet groups are deployed to every AZ

"Reserved" subnet blocks

Lets Encrypt + Haproxy

Steps

Create an haproxy cfg file

Run the docker container

AWS Subscription Required: The AWS Access Key Id needs a subscription for the service

Solution

Other Potential Reasons

Moving Towards High-Value Health Care in the US (Introduction)

Terminology

Some crazy numbers

History

Current State

Annual Healthcare Premium Increases

More Specialty Care, Less Primary Care

Obfuscated Pricing

Impact on Small Business & Lower-Income Employees

Regulations (or lack thereof)

Quality of Care

Lack of Transparency

Blatant Conflicts of Interests

Summary

References:

Interesting announcements at Ignite 2018

IoT Hub/Edge:

Cosmos DB:

Containers/Container Registry/K8s:

Ops:

Misc:

Generate valid signed X509 client certificates with pyopenssl

SSSD with Active Directory on Ubuntu

Install the relevant components

Configure Samba for Netbios

Create the sssd conf file

Set sssd conf permissions

Join the machine to the domain

Ensure pam creates a new user's home directory on successful login

Restart all the relevant services.

Test your config:

Troubleshooting:

Deploying an AWS EMR cluster with on-prem/cross-cloud Active Directory Authentication

Things to look out for:

VPN

Cross-Realm Trust

Slow Kerberos auth/tickets?

Monitoring Kubernetes with Prometheus 2.1+, Grafana 5.1+ and Helm

Prereqs

Quick install:

Kubernetes for Everything! Part 2 - On demand Jenkins build agents

Part 2: Jenkins with on-demand agents

Setting up the Kubernetes plugin

Jenkinsfile for Ubuntu based builds

Base image for Windows based builds

Jenkinsfile for Windows based builds

Final thoughts

Kubernetes for Everything! (With Windows and Linux on Azure) Part 1 - Jenkins

Part 1: Jenkins on a hybrid Windows/Linux Kubernetes cluster

Deploying the cluster on Azure

Deploy the Jenkins master pod on a Linux node

Create the storage classes (optional)

Create the persistent storage volume

Create the service and deployment

Configure Jenkins master

Build the Linux agent

Build the Windows agent

Create the service and deployment for both

Final thoughts