Sampling processor

The sampling processor implements probabilistic sampling to reduce data volume while preserving signal. Use it to keep all errors and slow requests while aggressively sampling routine success cases, reducing costs without losing diagnostic value.

When to use sampling processor

The sampling processor supports different capabilities depending on your telemetry data type:

For Logs and Events

Logs and Events support conditional sampling with customizable rules based on severity, attributes, and other criteria:

Keep 100% of errors while sampling success cases: Preserve all diagnostic data, drop routine traffic
Sample high-volume services more aggressively: Different sampling rates by service tier or importance
Preserve slow requests while sampling fast ones: Keep performance outliers for analysis
Apply different sampling rates per environment or service: Production at 10%, staging at 50%, test at 100%

For Traces

Traces support only global rate-based sampling. Reduce overall trace volume with a uniform sampling rate.

For Metrics

Metrics sampling is not currently supported by the sampling processor. Use the filter processor to drop unwanted metrics instead.

How sampling works

The sampling processor uses probabilistic sampling with conditional rules:

Default sampling percentage: Default rate applied to all data that doesn't match conditional rules.
Rules: Override the default rate when specific conditions match.
Source of randomness: Consistent field (like trace_id) ensures related data is sampled together.

Evaluation order: Rules are evaluated in the order defined. The first matching rule determines the sampling rate. If no rules match, the default sampling percentage applies.

Configuration

Add a sampling processor to your pipeline:

probabilistic_sampler/Logs:
        description: Probabilistic sampling for all logs
        config:
          default_sampling_percentage: 100
          rules:
            - name: sample the log records for ruby test service
              description: sample the log records for ruby test service with 70%
              sampling_percentage: 70
              source_of_randomness: trace.id
              conditions:
                - resource.attributes["service.name"] == "ruby-test-service"

Config fields:

default_sampling_percentage: Default sampling rate (0-100) for data that doesn't match rules.
rules: Array of rules (evaluated in order) - only supported for logs and events.
- name: Rule identifier.
- description: Human-readable description.
- sampling_percentage: Sampling rate for matched data (0-100).
- source_of_randomness: Field to use for sampling decision (typically trace_id).
- conditions: List of OTTL expressions to match telemetry.

Sampling strategies

Keep valuable data, drop routine traffic

The most common pattern for logs and events: preserve all diagnostic data (errors, slow requests), aggressively sample routine success cases.

probabilistic_sampler/Logs:
  description: "Intelligent log sampling"
  config:
    default_sampling_percentage: 5  # Sample 5% of everything else
    rules:
      - name: "preserve-errors"
        description: "Keep all errors and fatals"
        sampling_percentage: 100
        source_of_randomness: "trace.id"
        conditions: 
          - 'severity_text == "ERROR" or severity_text == "FATAL"'

      - name: "preserve-warnings"
        description: "Keep most warnings"
        sampling_percentage: 50
        source_of_randomness: "trace.id"
        conditions: 
          - 'severity_text == "WARN"'

Result: 100% of errors + 50% of warnings + 5% of everything else

Sample by service tier

Different sampling rates for different service importance:

probabilistic_sampler/Logs:
  description: "Service tier sampling"
  config:
    default_sampling_percentage: 10
    rules:
      - name: "critical-services"
        description: "Keep most traces from critical services"
        sampling_percentage: 80
        source_of_randomness: "trace.id"
        conditions: 
          - 'resource.attributes["service.name"] == "checkout" or resource.attributes["service.name"] == "payment"'

      - name: "standard-services"
        description: "Medium sampling for standard services"
        sampling_percentage: 30
        source_of_randomness: "trace.id"
        conditions: 
          - 'resource.attributes["service.tier"] == "standard"'

Sample by environment

Higher sampling in test environments, lower in production:

probabilistic_sampler/Logs:
  description: "Environment-based sampling"
  config:
    default_sampling_percentage: 10  # Production default
    rules:
      - name: "test-environment"
        description: "Keep all test data"
        sampling_percentage: 100
        source_of_randomness: "trace.id"
        conditions: 
          - 'resource.attributes["environment"] == "test"'

      - name: "staging-environment"
        description: "Keep half of staging data"
        sampling_percentage: 50
        source_of_randomness: "trace.id"
        conditions: 
          - 'resource.attributes["environment"] == "staging"'

Preserve slow requests

Keep performance outliers for analysis:

probabilistic_sampler/Logs:
  description: "Preserve important logs"
  config:
    default_sampling_percentage: 1  # Sample 1% of routine logs
    rules:
      - name: "critical-logs"
        description: "Keep all error and fatal logs"
        sampling_percentage: 100
        source_of_randomness: "trace.id"
        conditions: 
          - 'severity_text == "ERROR" or severity_text == "FATAL"'

      - name: "warning-logs"
        description: "Keep half of warning logs"
        sampling_percentage: 50
        source_of_randomness: "trace.id"
        conditions: 
          - 'severity_text == "WARN"'
      
      - name: "traced-logs"
        description: "Keep logs with trace context"
        sampling_percentage: 50
        source_of_randomness: "trace.id"
        conditions: 
          - 'trace_id != nil and trace_id.string != "00000000000000000000000000000000"'

Note: Duration is in nanoseconds (1 second = 1,000,000,000 ns).

Complete examples

Example 1: Intelligent trace sampling for distributed tracing

For traces, you can only configure the default sampling percentage. This percentage applies to all traces uniformly, including error traces and slow traces:

probabilistic_sampler/Traces:
  description: Probabilistic sampling for traces
  config:
    default_sampling_percentage: 55

Example 2: Log volume reduction

Dramatically reduce log volume while keeping diagnostic data:

probabilistic_sampler/Logs:
  description: "Aggressive log sampling, preserve errors"
  config:
    default_sampling_percentage: 2  # Keep 2% of routine logs
    rules:
      - name: "keep-errors-fatals"
        description: "Keep all errors and fatals"
        sampling_percentage: 100
        source_of_randomness: "trace.id"
        conditions: 
          - 'severity_number >= 17'  # ERROR and above

      - name: "keep-some-warnings"
        description: "Keep 25% of warnings"
        sampling_percentage: 25
        source_of_randomness: "trace.id"
        conditions: 
          - 'severity_number >= 13 and severity_number < 17'  # WARN

Example 3: Sample by HTTP status code

Sample all failures (100%) and sample a fraction of successes (5%):

probabilistic_sampler/Logs:
  description: "Sample by HTTP response status"
  config:
    default_sampling_percentage: 5  # 5% of successes
    rules:
      - name: "keep-server-errors"
        description: "Keep all 5xx errors"
        sampling_percentage: 100
        source_of_randomness: "trace.id"
        conditions: 
          - 'attributes["http.status_code"] >= 500'

      - name: "keep-client-errors"
        description: "Keep all 4xx errors"
        sampling_percentage: 100
        source_of_randomness: "trace.id"
        conditions: 
          - 'attributes["http.status_code"] >= 400 and attributes["http.status_code"] < 500'

Example 4: Multi-tier service sampling

Different rates for different importance levels:

probabilistic_sampler/Logs:
  description: "Business criticality sampling"
  config:
    default_sampling_percentage: 1
    rules:
      # Critical business services: keep 80%
      - name: "critical-services"
        description: "High sampling for critical services"
        sampling_percentage: 80
        source_of_randomness: "trace.id"
        conditions: 
          - 'attributes["business_criticality"] == "critical"'

      # Important services: keep 40%
      - name: "important-services"
        description: "Medium sampling for important services"
        sampling_percentage: 40
        source_of_randomness: "trace.id"
        conditions: 
          - 'attributes["business_criticality"] == "important"'

      # Standard services: keep 10%
      - name: "standard-services"
        description: "Low sampling for standard services"
        sampling_percentage: 10
        source_of_randomness: "trace.id"
        conditions: 
          - 'attributes["business_criticality"] == "standard"'

Example 5: Time-based sampling (off-peak reduction)

Higher sampling during business hours (requires external attribute tagging):

probabilistic_sampler/Logs:
  description: "Time-based sampling (requires time attribute)"
  config:
    default_sampling_percentage: 5  # Off-peak default
    rules:
      - name: "business-hours"
        description: "Higher sampling during business hours"
        sampling_percentage: 50
        source_of_randomness: "trace.id"
        conditions: 
          - 'attributes["is_business_hours"] == true'

Example 6: Sample by endpoint pattern

Keep all admin endpoints, sample public API aggressively:

probabilistic_sampler/Logs:
  description: "Endpoint-based sampling"
  config:
    default_sampling_percentage: 10
    rules:
      - name: "admin-endpoints"
        description: "Keep all admin traffic"
        sampling_percentage: 100
        source_of_randomness: "trace.id"
        conditions: 
          - 'IsMatch(attributes["http.path"], "^/admin/.*")'

      - name: "api-endpoints"
        description: "Sample public API"
        sampling_percentage: 5
        source_of_randomness: "trace.id"
        conditions: 
          - 'IsMatch(attributes["http.path"], "^/api/.*")'

Source of randomness

The source_of_randomness field determines which attribute is used to make consistent sampling decisions.

Common values:

trace_id: For distributed traces (ensures all spans in a trace are sampled together)
span_id: For individual span sampling (not recommended for distributed tracing)
Custom attribute: Any attribute that provides randomness

Why it matters: Using trace_id ensures that when you sample a trace, you get ALL spans from that trace, not just random individual spans. This is critical for understanding distributed transactions.

Performance considerations

Order rules by frequency: Put the most frequently matched conditions first to reduce evaluation time
Source of randomness performance: Using trace_id is very efficient as it's already available
Sampling happens after other processors: Place sampling near the end of your pipeline to avoid wasting CPU on data that will be dropped

Efficient pipeline ordering:

steps:
      # ... receive steps...
      probabilistic_sampler/Logs:
        description: Probabilistic sampling for all logs
        output:
          - filter/Logs
        config:
          rules:
            - name: sample the log records for ruby test service
              description: sample the log records for ruby test service with 70%
              sampling_percentage: 70
              source_of_randomness: trace.id
              conditions:
                - resource.attributes["service.name"] == "ruby-test-service"
          default_sampling_percentage: 100
      probabilistic_sampler/Traces:
        description: Probabilistic sampling for traces
        output:
          - filter/Traces
        config:
          default_sampling_percentage: 100
      filter/Logs:
        description: Apply drop rules and data processing for logs
        output:
          - transform/Logs
        config:
          error_mode: ignore
          rules:
            - name: drop the log records
              description: drop all records which has severity text INFO
              conditions:
                - log.severity_text == "INFO"
              context: log
      # ... filter steps ...
      # ... transdormer steps ...

Cost impact examples

Example: 1TB/day → 100GB/day

Before sampling:

1TB of logs per day
90% are INFO level routine operations
8% are WARN
2% are ERROR/FATAL

With intelligent sampling:

probabilistic_sampler/Logs:
  description: "Sample logs by severity level"
  config:
    default_sampling_percentage: 2  # Sample 2% of INFO and below
    rules:
      - name: "errors"
        description: "Keep all error logs"
        sampling_percentage: 100  # Keep 100% of errors
        source_of_randomness: "trace.id"
        conditions: 
          - 'severity_number >= 17'
      
      - name: "warnings"
        description: "Keep quarter of warning logs"
        sampling_percentage: 25  # Keep 25% of warnings
        source_of_randomness: "trace.id"
        conditions: 
          - 'severity_number >= 13 and severity_number < 17'

After sampling:

INFO: 900GB × 2% = 18GB
WARN: 80GB × 25% = 20GB
ERROR/FATAL: 20GB × 100% = 20GB
Total: ~58GB/day (94% reduction)
All errors preserved for troubleshooting

OpenTelemetry resources

Next steps

Learn about Transform processor for data enrichment before sampling
See Filter processor for dropping unwanted data
Review YAML configuration reference for complete syntax

When to use sampling processor.css-21sua1{background:none;border:none;width:0;padding:0;}