I've spent the last three years helping engineering organizations build internal tooling infrastructure. In that time, I've watched the same pattern repeat itself at companies of every size: developers spend more time figuring out how to ship than actually shipping. They ping Slack channels asking where the Kubernetes YAML templates live. They open ten browser tabs trying to remember which CI pipeline belongs to which service. They wait two weeks for a cloud account to be provisioned because nobody knows exactly who owns that process.
This is not a people problem. It's an infrastructure problem — specifically, an absence of a well-designed Internal Developer Portal (IDP). In 2026, platform engineering has moved from a niche practice to a mainstream engineering discipline, and the IDP is its most visible artifact. This post is a deep-dive into what it actually takes to build one that scales: the architecture, the implementation details, the political challenges, and the honest trade-offs.
Why Developer Cognitive Load Is the Real Enemy
Before we talk about Backstage or any specific tooling, we need to establish why internal developer portals exist. The answer is deceptively simple: cognitive load kills velocity.
John Sweller's cognitive load theory, originally developed in educational psychology, maps directly onto software engineering. Working memory is finite. When a developer has to context-switch between Confluence, Jira, GitHub, three different Kubernetes dashboards, a legacy Ansible playbook repository, and a Slack thread just to deploy a new service, they're spending cognitive capital on coordination instead of creation.
The numbers back this up. According to the 2025 DORA State of DevOps report, teams with well-established internal tooling and platform engineering practices deploy 4.2x more frequently than those without. More tellingly, their change failure rate is 60% lower — not because they're better developers, but because the environment reduces the surface area for mistakes.
Here's a concrete example of cognitive tax. A developer joins a new team and needs to create a new microservice. Without an IDP, their journey looks something like this:
- Ask in Slack which repo to use as a template (Day 1)
- Clone the template, realize it uses an outdated Node version, ask again (Day 1)
- Figure out which Terraform module provisions the RDS instance (Day 2)
- Submit a request to the cloud team for IAM permissions (Day 2, response in 3 days)
- Set up the CI/CD pipeline by copying YAML from a colleague's repo (Day 5)
- Figure out where to register the service in the service mesh (Day 6)
- Discover that the monitoring dashboards aren't automatically provisioned (Week 2)
With a mature IDP, that entire journey compresses to: pick a template, fill in a form, click Create. The portal handles provisioning, CI/CD wiring, service registration, and observability setup. The developer is writing actual product code on Day 1 afternoon.
Callout — The Hidden Cost of Missing Portals: Most organizations dramatically undercount the cost of fragmented tooling. Time lost to tool discovery, waiting for manual provisioning, and debugging misconfigured pipelines often totals 30–40% of developer time. That's not a productivity tip — that's a strategic constraint.
Backstage Architecture: What You're Actually Building
Spotify open-sourced Backstage in 2020, and in 2026 it remains the dominant open-source IDP framework, with adoption at organizations including American Airlines, Expedia, Zalando, and Netflix. But Backstage is frequently misunderstood as a product you "install." It's not. It's a framework — closer to React than to Jira. You're building an application, not deploying one.
Understanding this distinction is critical before you start. The Backstage monorepo contains a frontend application (built on React and Material UI), a backend service (Node.js), and a plugin ecosystem that handles specific integrations. The core architecture revolves around three foundational concepts.
The Software Catalog
The Software Catalog is Backstage's backbone. It's a centralized registry of all your software components — services, APIs, libraries, websites, pipelines, resources. Every entity in the catalog is described by a YAML file (called a catalog-info.yaml) that lives in the component's repository. This file declares the component's kind, owner, lifecycle stage, dependencies, and any links to external tools.
The catalog stores these entities in a database (PostgreSQL in production) and keeps them synchronized with your source repositories via configurable processors. When a developer visits the portal and looks up a service, they see its owner, its current build status, its API documentation, its deployment history, and its runbooks — all surfaced from the catalog.
The Plugin System
Everything in Backstage is a plugin. The catalog is a plugin. TechDocs is a plugin. The Kubernetes integration is a plugin. This means the framework is extensible by design — you can build internal plugins specific to your organization, and there's a community marketplace with hundreds of pre-built integrations for tools like PagerDuty, Datadog, GitHub Actions, ArgoCD, and Vault.
Plugins have both a frontend component (what appears in the Backstage UI) and optionally a backend component (an Express router that proxies to external APIs or processes data server-side). The frontend and backend parts of a plugin are separate npm packages, which means you can mix and match. A particularly useful pattern is building a "aggregating" backend plugin that pulls data from three different internal APIs and exposes a clean unified endpoint for the frontend.
TechDocs
TechDocs is Backstage's built-in documentation system, powered by MkDocs under the hood. Documentation lives as Markdown files in each component's repository alongside the code. When a developer pushes changes, a CI pipeline builds the docs site using @techdocs/cli and publishes static assets to an object storage bucket (S3, GCS, or Azure Blob). The TechDocs plugin then renders those assets inside Backstage.
The genius of this approach is docs-as-code with zero context switching. Documentation lives where the code lives, reviewed in the same pull request, and surfaced in the same portal where developers already spend their time. Teams that adopt TechDocs consistently report better documentation coverage because the barrier to writing and reading docs drops dramatically.
Installing and Customizing Backstage: A Practical Walkthrough
Let me walk through what a real Backstage installation looks like beyond the "getting started" tutorial. The official CLI scaffolds a new app in under five minutes, but what you'll actually be doing for the next several months is customization, integration, and organizational change management.
Start with the scaffold:
npx @backstage/create-app@latest
cd my-backstage-app
yarn dev
This gives you a working local instance with the catalog, scaffolder, TechDocs, and search plugins pre-installed. The directory structure worth understanding immediately:
packages/app/— the React frontend applicationpackages/backend/— the Node.js backend serverplugins/— where your custom internal plugins will liveapp-config.yaml— the main configuration fileapp-config.production.yaml— production overrides
The most important configuration decisions you'll make early on:
Authentication provider. Backstage supports GitHub OAuth, Google, Okta, Microsoft Azure AD, and others out of the box. For most enterprises, Azure AD is the natural choice. The configuration in app-config.yaml is relatively straightforward, but you'll need to register an app in Azure AD and handle the callback URL routing.
Catalog discovery. By default, you register catalog entities manually. In practice, you'll want to configure catalog discovery so Backstage automatically finds catalog-info.yaml files across your GitHub organization. The GitHub discovery provider scans repositories on a schedule and ingests any entity files it finds.
catalog:
providers:
github:
myOrg:
organization: 'my-github-org'
catalogPath: '/catalog-info.yaml'
filters:
branch: 'main'
repository: '.*'
schedule:
frequency: { minutes: 30 }
timeout: { minutes: 3 }
Database. The default SQLite backend is fine for local development, but you'll need PostgreSQL for anything production. Configure it early; migrating later is painful because the catalog entity data model is non-trivial.
Structuring Your Software Catalog for Real Organizations
The Software Catalog is where Backstage delivers its clearest value, but it only works if your entity definitions are thoughtful and consistently maintained. Here's how I recommend structuring them.
Backstage defines several entity kinds. The ones you'll use most:
- Component — a deployable piece of software (service, library, website, data pipeline)
- API — an interface exposed by a component (OpenAPI, AsyncAPI, gRPC, GraphQL)
- System — a logical grouping of components that serve a common purpose
- Domain — a business domain grouping related systems
- Resource — infrastructure a component depends on (databases, S3 buckets, SQS queues)
- Group — a team or organizational unit
- User — an individual (usually auto-populated from your identity provider)
A well-structured catalog-info.yaml for a backend service looks like this:
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payments-service
description: Handles all payment processing for checkout flows
tags:
- java
- payments
- critical
annotations:
github.com/project-slug: myorg/payments-service
backstage.io/techdocs-ref: dir:.
pagerduty.com/service-id: P1234ABC
datadoghq.com/dashboard-url: 'https://app.datadoghq.com/dashboard/xyz'
spec:
type: service
lifecycle: production
owner: group:payments-team
system: checkout
dependsOn:
- component:order-service
- resource:payments-db
- resource:stripe-api
providesApis:
- payments-api
The annotations field is where third-party plugin integrations plug in. The PagerDuty annotation tells the PagerDuty plugin which service to pull on-call information from. The Datadog annotation surfaces the dashboard directly in the Backstage component page. Each plugin documentation specifies the annotation keys it reads.
For large organizations, the hardest catalog challenge is ownership hygiene. Catalog entries go stale when teams restructure. I recommend:
- A quarterly automated audit that flags components with no code commits in 180 days
- A lint CI step that validates
catalog-info.yamlschema on every PR - A "catalog champion" within each team who's responsible for keeping entries current
- Automated Group and User entities synced from your identity provider (not manually maintained)
Golden Path Templates: The Productivity Multiplier
The term "golden path" comes from the idea that your platform team has designed the optimal route for common development workflows. A golden path template encodes that route as a repeatable, self-service action. This is where platform engineering stops being about tooling and starts being about organizational design.
In Backstage, golden paths are implemented as Software Templates using the Scaffolder plugin. A template is a YAML file that defines:
- A form with input parameters (service name, owner team, cloud region, etc.)
- A series of steps that execute when the form is submitted
- Output links to the created resources (GitHub repo, CI pipeline, deployment dashboard)
Here's a simplified template for creating a new Java Spring Boot microservice:
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: spring-boot-service
title: Spring Boot Microservice
description: Creates a new Spring Boot service with CI/CD and monitoring
tags:
- java
- spring-boot
- recommended
spec:
owner: platform-team
type: service
parameters:
- title: Service Details
required:
- name
- owner
properties:
name:
title: Service Name
type: string
pattern: '^[a-z][a-z0-9-]*$'
owner:
title: Owner Team
type: string
ui:field: OwnerPicker
ui:options:
catalogFilter:
kind: Group
description:
title: Service Description
type: string
steps:
- id: fetch-template
name: Fetch Base Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
- id: publish
name: Publish to GitHub
action: publish:github
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=myorg
defaultBranch: main
- id: register
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['publish'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
output:
links:
- title: Repository
url: ${{ steps['publish'].output.remoteUrl }}
- title: Open in Catalog
entityRef: ${{ steps['register'].output.entityRef }}
The skeleton directory referenced in the template contains the actual file templates, using Nunjucks syntax for variable substitution. Your skeleton for a Spring Boot service would include the pom.xml, the application class, the GitHub Actions workflow, the Dockerfile, the catalog-info.yaml, and the docs/ directory for TechDocs.
Callout — Template Versioning Discipline: Treat your Scaffolder templates like production software. Pin dependency versions in skeleton files, run automated tests against templates in CI (the @backstage/plugin-scaffolder-backend provides dry-run capability), and maintain a changelog. A broken template silently blocks every new service creation until someone notices.
Scaffolder Actions: Going Beyond the Defaults
The built-in Scaffolder actions cover GitHub, GitLab, Bitbucket, and a handful of other integrations. But the real power comes from building custom actions for your organization's internal systems.
Custom actions are Node.js functions registered with the scaffolder backend. They receive input parameters and can call any API — your internal IAM service, a Terraform Cloud workspace, a ServiceNow ticket system, your internal DNS provisioning API. The scaffolder coordinates them in sequence, with each step receiving the output of previous steps.
import { createTemplateAction } from '@backstage/plugin-scaffolder-backend';
import { JsonObject } from '@backstage/types';
export const createProvisionCloudAccountAction = () => {
return createTemplateAction<{
serviceName: string;
environment: string;
teamId: string;
}>({
id: 'internal:provision-cloud-account',
description: 'Provisions AWS account via internal IAM API',
schema: {
input: {
required: ['serviceName', 'environment', 'teamId'],
type: 'object',
properties: {
serviceName: { type: 'string' },
environment: { type: 'string', enum: ['dev', 'staging', 'prod'] },
teamId: { type: 'string' },
},
},
output: {
type: 'object',
properties: {
accountId: { type: 'string' },
roleArn: { type: 'string' },
},
},
},
async handler(ctx) {
const { serviceName, environment, teamId } = ctx.input;
ctx.logger.info(`Provisioning ${environment} account for ${serviceName}`);
const response = await fetch('https://iam-api.internal/accounts', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ serviceName, environment, teamId }),
});
const { accountId, roleArn } = await response.json();
ctx.output('accountId', accountId);
ctx.output('roleArn', roleArn);
},
});
};
Once registered, this action becomes available to any template in your organization. A "create new service" template can now include cloud account provisioning, GitHub repo creation, Kubernetes namespace setup, and PagerDuty service registration — all in a single developer workflow that takes under two minutes and requires no tickets.
Kubernetes and CI/CD Integration
The Backstage Kubernetes plugin is one of the most compelling features for organizations running container workloads. It surfaces Kubernetes deployment status, pod health, and rollout history directly on the catalog component page — no kubectl, no switching to Lens or Argo, no asking the platform team.
Setting up the Kubernetes plugin requires configuring cluster credentials in app-config.yaml and annotating your catalog entities with the relevant labels. The plugin uses the catalog entity's name to query the Kubernetes API for pods with matching labels.
kubernetes:
serviceLocatorMethod:
type: 'multiTenant'
clusterLocatorMethods:
- type: 'config'
clusters:
- url: https://my-k8s-cluster.internal
name: production
authProvider: 'serviceAccount'
skipTLSVerify: false
serviceAccountToken: ${K8S_SERVICE_ACCOUNT_TOKEN}
- url: https://staging-k8s-cluster.internal
name: staging
authProvider: 'serviceAccount'
serviceAccountToken: ${K8S_STAGING_TOKEN}
In your Kubernetes deployment manifests, add the annotation that connects them to the catalog:
apiVersion: apps/v1
kind: Deployment
metadata:
name: payments-service
labels:
backstage.io/kubernetes-id: payments-service
annotations:
backstage.io/kubernetes-id: payments-service
For CI/CD integration, the GitHub Actions plugin (and equivalents for Jenkins, CircleCI, and GitLab CI) pulls pipeline run history and surfaces it alongside the component. Developers can see the last five build statuses, click into failed runs, and understand the deployment pipeline — all without leaving Backstage.
The most powerful pattern I've seen is combining the Kubernetes plugin with ArgoCD: Backstage shows the live deployment state from ArgoCD alongside the triggering CI run from GitHub Actions, giving developers a unified view of "what code is where and how it got there" on a single page.
The Developer Adoption Problem: It's Always Political
I've seen technically excellent developer portals fail because nobody used them. Building it is the easy part. Getting developers to adopt it — and sustain that adoption — is a product management and change management challenge.
The fundamental mistake platform teams make is assuming that "if you build it, they will come." They won't. Developers are time-constrained and tool-fatigued. Adding another portal to their workflow is friction, not relief, unless the portal is genuinely better than the alternative.
What actually drives adoption:
Make the portal the fastest path for high-frequency tasks. Identify the three things developers do most often — provisioning environments, checking deployment status, reading API documentation — and make the portal the fastest way to do those things. If developers naturally reach for Backstage because it saves time, adoption follows.
Mandate catalog registration for new services, not old ones. Don't try to retrofit the catalog onto all existing services at once. That's a boil-the-ocean project that stalls out. Instead, make catalog-info.yaml registration a required step for any new service deployment. The catalog grows organically as teams create new things.
Kill competing tools, don't just add Backstage. If the internal wiki, the service registry spreadsheet, and the runbook Confluence space still exist, developers will keep using them. The portal only wins if it's the single source of truth. This requires political will to deprecate old systems.
Treat it as a product with real customers. Hold monthly "platform office hours." Run NPS surveys specifically on the portal. Build a public roadmap for the portal that developers can influence. Hire or designate someone as the portal's product owner — not just a maintainer.
Embed early adopters in the design process. The teams who adopt first become advocates. Give them early access to new features. Highlight their success publicly. Platform engineering has a marketing component that most engineers underestimate.
The Honest Reality of Backstage Maintenance Costs
Let me be direct about something most Backstage blog posts gloss over: Backstage is expensive to maintain. Not in licensing costs — it's open source — but in engineering time.
Backstage follows a monthly release cadence with regular breaking changes. The plugin ecosystem is maintained by a combination of Spotify, CNCF, and community contributors with varying levels of consistency. When a major Backstage version drops, your custom plugins and configurations often need updates. If you've built a lot of internal customization, this can mean one week of platform engineering time per quarter just on dependency upgrades.
The realistic maintenance picture:
- Minimum viable team: 1 dedicated engineer plus 0.5 FTE for an organization of 100–300 developers
- Healthy team: 2–3 dedicated engineers for 300–1000 developers
- Upgrade cycle: Major version upgrades 2–3 times per year, minor updates monthly
- Plugin maintenance: Budget one sprint per quarter for plugin updates and compatibility fixes
- Catalog hygiene: Ongoing, roughly 10% of one engineer's time
If you don't have the staffing for this, a managed or commercial alternative may make more economic sense.
Backstage Alternatives: Port, Cortex, and OpsLevel
Backstage's dominance doesn't mean it's the right choice for every organization. Three commercial alternatives have matured significantly and deserve serious evaluation.
Port is the most flexible of the commercial options. Its data model is schema-driven and highly configurable — you define your own entity types (called "blueprints") and the relationships between them. This is more powerful than Backstage's fixed entity kinds for organizations with unconventional architectures. Port includes a built-in action runner, a scorecards system for tracking engineering standards, and a RBAC model that Backstage's open-source version lacks. The trade-off is that building non-standard use cases requires learning Port's configuration model deeply, and you're dependent on a vendor's roadmap.
Cortex focuses on service quality and ownership rather than being a full-featured portal. Its killer feature is the Scorecard system: you define standards (every production service must have an on-call rotation, SLO definitions, runbook documentation, and a designated owner), and Cortex scores every service against those standards and sends weekly email reports to team leads. For organizations where the primary problem is ownership clarity and engineering standards, Cortex delivers this with minimal setup time compared to Backstage.
OpsLevel occupies a similar space to Cortex with stronger workflow automation features. OpsLevel's service maturity campaigns let you define an improvement program — "all services must migrate to gRPC by Q3" — and track progress automatically using Git and CI integration. The self-service actions feature is closer to Backstage's scaffolder than Cortex's more reporting-focused approach.
Comparison: Backstage vs Port vs Cortex
| Dimension | Backstage | Port | Cortex |
|---|---|---|---|
| Pricing model | Open source (self-hosted) | SaaS, per-seat | SaaS, per-service |
| Time to first value | Weeks to months | Days to weeks | Days |
| Customizability | Unlimited (build anything) | High (config-driven) | Medium (within product model) |
| Scaffolder / self-service | Excellent (full workflow engine) | Good (action runner) | Good (service actions) |
| Scorecards / standards | Plugin required | Built-in | Excellent (core feature) |
| Maintenance burden | High (self-managed) | Low (vendor-managed) | Low (vendor-managed) |
| Plugin ecosystem | 100+ community plugins | Native integrations | Native integrations |
| Best fit | Large orgs, full control | Mid-size, flexibility needed | Mid-size, standards focus |
| Data residency | Your infrastructure | Vendor cloud (EU option) | Vendor cloud |
My rule of thumb: if you have 3+ engineers available to build and maintain the portal, and you want full control over the implementation, Backstage is the right choice. If you want to be running in production within a month and your primary concern is service ownership visibility, start with Cortex. If you need Backstage-level flexibility but can't afford the maintenance overhead, Port is the middle ground.
Callout — Hybrid Approaches Work: Some organizations run Backstage for the Software Catalog and self-service templates, while using Cortex specifically for the scorecard and engineering standards tracking. These tools aren't mutually exclusive, and the Cortex Backstage integration is mature enough to surface scorecard data inside Backstage entity pages.
Measuring IDP Success: Metrics That Actually Matter
One of the hardest parts of justifying IDP investment to leadership is demonstrating ROI. The benefits are real but diffuse — reduced cognitive load doesn't show up cleanly in a spreadsheet. Here are the metrics I've found most useful for making the case.
Time to first deployment for new services. Measure the calendar days from "decision to create a new service" to "first deployment to production." This is the clearest demonstration of golden path value. Baseline it before the portal, measure it after. A well-functioning IDP typically reduces this from 2–3 weeks to 1–2 days.
MTTR for incidents. When an incident occurs, how long does it take the on-call engineer to identify the affected services, find the runbook, and understand the deployment history? Portal adoption directly affects this number.
Catalog coverage. What percentage of production services have a catalog-info.yaml with a verified owner, API documentation, and observability links? This is your data quality metric, and it should trend upward over time.
Scaffolder usage. How many new services are created via golden path templates versus ad hoc? This metric tells you whether the portal's self-service function is working.
Developer satisfaction score. A quarterly survey specifically about the portal, separate from general developer experience surveys, lets you track qualitative sentiment and identify friction points before they become adoption problems.
Key Takeaways
- Developer cognitive load is the core problem an IDP solves. Before evaluating any tool, map out exactly how much time your developers spend on coordination, tool-switching, and waiting for manual provisioning. That number is your baseline ROI case.
- Backstage is a framework, not a product. You're building an application, which means you're taking on the responsibility of a software product: it needs a product owner, a roadmap, and dedicated engineering time. Treat it accordingly from Day 1.
- The Software Catalog is the foundation. Before building flashy features, invest heavily in catalog quality. Stale or incomplete catalog data undermines trust in the portal faster than any missing feature.
- Golden path templates deliver the fastest developer ROI. Even a simple template that reduces service creation from two weeks to two hours will generate more goodwill than months of UI improvements. Start there.
- Adoption is a product and political challenge, not a technical one. Identify the friction, eliminate competing tools, and make portal use the path of least resistance for the tasks developers perform daily.
- Honestly assess your maintenance capacity before choosing Backstage. For organizations that can't dedicate at least 1–2 engineers to portal maintenance, Port or Cortex will deliver better outcomes with significantly lower overhead.
Building your developer portal? — See what I built
댓글
댓글 쓰기