OpenTelemetryを使用した自己ホスト型 Kafka の監視

OpenTelemetry Collector Linux ホストに直接インストールして、セルフホスト型Apache Kafka クラスターを監視します。

アーキテクチャー

次の図は、監視アーキテクチャーとNew Relicへのデータの流れを示しています。

Self-hosted Kafka monitoring architecture with OpenTelemetry

インストレーション手順

以下の手順に従って、ブローカーにOpenTelemetry Javaエージェントをインストールし、コレクターをデプロイしてメトリクスを収集してNew Relicに送信することで、包括的な Kafka 監視をセットアップします。

あなたが始める前に

以下のものを用意してください:

New Relicアカウント
コレクターから Kafka ブートストラップサーバーポート (通常は 9092) へのネットワークアクセス

OpenTelemetry Javaエージェントをダウンロードする

OpenTelemetry Javaエージェントは、 Kafka ブローカーに接続されたJavaエージェントとして実行され、Kafka と JMX メトリクスを収集し、OTLP 経由でコレクターに送信します。

bash

$# Create directory for OpenTelemetry components
$mkdir -p ~/opentelemetry
$
$# Download OpenTelemetry Java Agent
$curl -L -o ~/opentelemetry/opentelemetry-javaagent.jar \
>  https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

JMXカスタム設定の作成

OpenTelemetry Javaエージェント JMX 設定ファイルを作成して、JMX MBean から Kafka メトリクスを収集します。

次の設定でファイル~/opentelemetry/jmx-custom-config.yamlを作成します。

---
rules:
  # Per-topic custom metrics using custom MBean commands
  - bean: kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*
    metricAttribute:
      topic: param(topic)
    mapping:
      Count:
        metric: kafka.prod.msg.count
        type: counter
        desc: The number of messages per topic
        unit: "{message}"

  - bean: kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=*
    metricAttribute:
      topic: param(topic)
      direction: const(in)
    mapping:
      Count:
        metric: kafka.topic.io
        type: counter
        desc: The bytes received or sent per topic
        unit: By

  - bean: kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=*
    metricAttribute:
      topic: param(topic)
      direction: const(out)
    mapping:
      Count:
        metric: kafka.topic.io
        type: counter
        desc: The bytes received or sent per topic
        unit: By

  # Cluster-level metrics using controller-based MBeans
  - bean: kafka.controller:type=KafkaController,name=GlobalTopicCount
    mapping:
      Value:
        metric: kafka.cluster.topic.count
        type: gauge
        desc: The total number of global topics in the cluster
        unit: "{topic}"

  - bean: kafka.controller:type=KafkaController,name=GlobalPartitionCount
    mapping:
      Value:
        metric: kafka.cluster.partition.count
        type: gauge
        desc: The total number of global partitions in the cluster
        unit: "{partition}"

  - bean: kafka.controller:type=KafkaController,name=FencedBrokerCount
    mapping:
      Value:
        metric: kafka.broker.fenced.count
        type: gauge
        desc: The number of fenced brokers in the cluster
        unit: "{broker}"

  - bean: kafka.controller:type=KafkaController,name=PreferredReplicaImbalanceCount
    mapping:
      Value:
        metric: kafka.partition.non_preferred_leader
        type: gauge
        desc: The count of topic partitions for which the leader is not the preferred leader
        unit: "{partition}"

  # Broker-level metrics using ReplicaManager MBeans
  - bean: kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount
    mapping:
      Value:
        metric: kafka.partition.under_min_isr
        type: gauge
        desc: The number of partitions where the number of in-sync replicas is less than the minimum
        unit: "{partition}"

  # Broker uptime metric using JVM Runtime
  - bean: java.lang:type=Runtime
    mapping:
      Uptime:
        metric: kafka.broker.uptime
        type: gauge
        desc: Broker uptime in milliseconds
        unit: ms

  # Leader count per broker
  - bean: kafka.server:type=ReplicaManager,name=LeaderCount
    mapping:
      Value:
        metric: kafka.broker.leader.count
        type: gauge
        desc: Number of partitions for which this broker is the leader
        unit: "{partition}"

  # JVM metrics
  - bean: java.lang:type=GarbageCollector,name=*
    mapping:
      CollectionCount:
        metric: jvm.gc.collections.count
        type: counter
        unit: "{collection}"
        desc: total number of collections that have occurred
        metricAttribute:
          name: param(name)
      CollectionTime:
        metric: jvm.gc.collections.elapsed
        type: counter
        unit: ms
        desc: the approximate accumulated collection elapsed time in milliseconds
        metricAttribute:
          name: param(name)

  - bean: java.lang:type=Memory
    unit: By
    prefix: jvm.memory.
    dropNegativeValues: true
    mapping:
      HeapMemoryUsage.committed:
        metric: heap.committed
        desc: current heap usage
        type: gauge
      HeapMemoryUsage.max:
        metric: heap.max
        desc: current heap usage
        type: gauge
      HeapMemoryUsage.used:
        metric: heap.used
        desc: current heap usage
        type: gauge

  - bean: java.lang:type=Threading
    mapping:
      ThreadCount:
        metric: jvm.thread.count
        type: gauge
        unit: "{thread}"
        desc: Total thread count (Kafka typical range 100-300 threads)

  - bean: java.lang:type=OperatingSystem
    prefix: jvm.
    dropNegativeValues: true
    mapping:
      SystemLoadAverage:
        metric: system.cpu.load_1m
        type: gauge
        unit: "{run_queue_item}"
        desc: System load average (1 minute) - alert if > CPU count
      AvailableProcessors:
        metric: cpu.count
        type: gauge
        unit: "{cpu}"
        desc: Number of processors available
      ProcessCpuLoad:
        metric: cpu.recent_utilization
        type: gauge
        unit: '1'
        desc: Recent CPU utilization for JVM process (0.0 to 1.0)
      SystemCpuLoad:
        metric: system.cpu.utilization
        type: gauge
        unit: '1'
        desc: Recent CPU utilization for whole system (0.0 to 1.0)
      OpenFileDescriptorCount:
        metric: file_descriptor.count
        type: gauge
        unit: "{file_descriptor}"
        desc: Number of open file descriptors - alert if > 80% of ulimit

  - bean: java.lang:type=ClassLoading
    mapping:
      LoadedClassCount:
        metric: jvm.class.count
        type: gauge
        unit: "{class}"
        desc: Currently loaded class count

  - bean: java.lang:type=MemoryPool,name=*
    type: gauge
    unit: By
    metricAttribute:
      name: param(name)
    mapping:
      Usage.used:
        metric: jvm.memory.pool.used
        desc: Memory pool usage by generation (G1 Old Gen, Eden, Survivor)
      Usage.max:
        metric: jvm.memory.pool.max
        desc: Maximum memory pool size
      CollectionUsage.used:
        metric: jvm.memory.pool.used_after_last_gc
        desc: Memory used after last GC (shows retained memory baseline)
  
  - bean: kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
    mapping:
      Count:
        metric: kafka.message.count
        type: counter
        desc: The number of messages received by the broker
        unit: "{message}"

  - bean: kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec
    metricAttribute:
      type: const(fetch)
    mapping:
      Count:
        metric: &metric kafka.request.count
        type: &type counter
        desc: &desc The number of requests received by the broker
        unit: &unit "{request}"

  - bean: kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec
    metricAttribute:
      type: const(produce)
    mapping:
      Count:
        metric: *metric
        type: *type
        desc: *desc
        unit: *unit

  - bean: kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec
    metricAttribute:
      type: const(fetch)
    mapping:
      Count:
        metric: &metric kafka.request.failed
        type: &type counter
        desc: &desc The number of requests to the broker resulting in a failure
        unit: &unit "{request}"

  - bean: kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec
    metricAttribute:
      type: const(produce)
    mapping:
      Count:
        metric: *metric
        type: *type
        desc: *desc
        unit: *unit

  - beans:
      - kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce
      - kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer
      - kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower
    metricAttribute:
      type: param(request)
    unit: ms
    mapping:
      Count:
        metric: kafka.request.time.total
        type: counter
        desc: The total time the broker has taken to service requests
      50thPercentile:
        metric: kafka.request.time.50p
        type: gauge
        desc: The 50th percentile time the broker has taken to service requests
      99thPercentile:
        metric: kafka.request.time.99p
        type: gauge
        desc: The 99th percentile time the broker has taken to service requests
      Mean:
        metric: kafka.request.time.avg
        type: gauge
        desc: The average time the broker has taken to service requests

  - bean: kafka.network:type=RequestChannel,name=RequestQueueSize
    mapping:
      Value:
        metric: kafka.request.queue
        type: gauge
        desc: Size of the request queue
        unit: "{request}"

  - bean: kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
    metricAttribute:
      direction: const(in)
    mapping:
      Count:
        metric: &metric kafka.network.io
        type: &type counter
        desc: &desc The bytes received or sent by the broker
        unit: &unit By

  - bean: kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
    metricAttribute:
      direction: const(out)
    mapping:
      Count:
        metric: *metric
        type: *type
        desc: *desc
        unit: *unit

  - beans:
      - kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce
      - kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch
    metricAttribute:
      type: param(delayedOperation)
    mapping:
      Value:
        metric: kafka.purgatory.size
        type: gauge
        desc: The number of requests waiting in purgatory
        unit: "{request}"

  - bean: kafka.server:type=ReplicaManager,name=PartitionCount
    mapping:
      Value:
        metric: kafka.partition.count
        type: gauge
        desc: The number of partitions on the broker
        unit: "{partition}"

  - bean: kafka.controller:type=KafkaController,name=OfflinePartitionsCount
    mapping:
      Value:
        metric: kafka.partition.offline
        type: gauge
        desc: The number of partitions offline
        unit: "{partition}"

  - bean: kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
    mapping:
      Value:
        metric: kafka.partition.under_replicated
        type: gauge
        desc: The number of under replicated partitions
        unit: "{partition}"

  - bean: kafka.server:type=ReplicaManager,name=IsrShrinksPerSec
    metricAttribute:
      operation: const(shrink)
    mapping:
      Count:
        metric: kafka.isr.operation.count
        type: counter
        desc: The number of in-sync replica shrink and expand operations
        unit: "{operation}"

  - bean: kafka.server:type=ReplicaManager,name=IsrExpandsPerSec
    metricAttribute:
      operation: const(expand)
    mapping:
      Count:
        metric: kafka.isr.operation.count
        type: counter
        desc: The number of in-sync replica shrink and expand operations
        unit: "{operation}"

  - bean: kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica
    mapping:
      Value:
        metric: kafka.max.lag
        type: gauge
        desc: The max lag in messages between follower and leader replicas
        unit: "{message}"

  - bean: kafka.controller:type=KafkaController,name=ActiveControllerCount
    mapping:
      Value:
        metric: kafka.controller.active.count
        type: gauge
        desc: For KRaft mode, the number of active controllers in the cluster. For ZooKeeper, indicates whether the broker is the controller broker.
        unit: "{controller}"

  - bean: kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
    mapping:
      Count:
        metric: kafka.leader.election.rate
        type: counter
        desc: The leader election count
        unit: "{election}"

  - bean: kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
    mapping:
      Count:
        metric: kafka.unclean.election.rate
        type: counter
        desc: Unclean leader election count - increasing indicates broker failures
        unit: "{election}"

  - bean: kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
    unit: ms
    type: gauge
    prefix: kafka.logs.flush.
    mapping:
      Count:
        metric: count
        unit: '{flush}'
        type: counter
        desc: Log flush count
      50thPercentile:
        metric: time.50p
        desc: Log flush time - 50th percentile
      99thPercentile:
        metric: time.99p
        desc: Log flush time - 99th percentile

Kafkaブローカーを構成する

Kafka を開始する前に、 KAFKA_OPTS環境変数を設定して、 OpenTelemetry Javaエージェントを Kafka ブローカーに接続します。

単一ブローカーの例:

bash

$OTEL_AGENT="$HOME/opentelemetry/opentelemetry-javaagent.jar"
$JMX_CONFIG="$HOME/opentelemetry/jmx-custom-config.yaml"
$
$nohup env KAFKA_OPTS="-javaagent:$OTEL_AGENT \
>    -Dotel.jmx.enabled=true \
>    -Dotel.jmx.config=$JMX_CONFIG \
>    -Dotel.resource.attributes=broker.id=1,kafka.cluster.name=my-kafka-cluster \
>    -Dotel.exporter.otlp.endpoint=http://localhost:4317 \
>    -Dotel.exporter.otlp.protocol=grpc \
>    -Dotel.metrics.exporter=otlp \
>    -Dotel.metric.export.interval=30000" \
>    bin/kafka-server-start.sh config/server.properties &

重要

マルチブローカーの集中: 複数のブローカーの場合は、各ブローカーの-Dotel.resource.attributes問題で一意のbroker.id値 (例: broker.id=1 、 broker.id=2 、 broker.id=3) を持つ同じ設定を使用します。

nohup - Kafka ブローカーをバックグラウンドで実行し、シェルセッションが終了しても継続します。
-javaagent - OpenTelemetry JavaエージェントをKafkaブローカーJVMに接続します
-Dotel.jmx.enabled=true JMX メトリクス収集を有効にします
-Dotel.jmx.config カスタム JMX メトリクス設定ファイルを指定します
-Dotel.resource.attributes メタデータを追加します: 一意のbroker.idと kafka.cluster.name
-Dotel.exporter.otlp.endpoint OpenTelemetry Collector を指します (デフォルト: localhost:4317)
-Dotel.exporter.otlp.protocol=grpc OTLPにgRPCプロトコルを使用する
-Dotel.metrics.exporter=otlp OTLP経由でメトリクスを送信します
-Dotel.metric.export.interval=30000 メトリクスを 30 秒ごとにエクスポートします
& - コマンドをバックグラウンドで実行します
リモートコレクター（異なるホスト）の場合:
bash
```
$-Dotel.exporter.otlp.endpoint=http://collector-host:4317
```
完全な設定オプションについては、 Javaエージェント設定ガイドを参照してください。

コレクター設定の作成

~/opentelemetry/kafka-config.yamlにメインのOpenTelemetry Collector設定を作成します。

receivers:
  # OTLP receiver for Kafka and JMX metrics from Java agents and application telemetry
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"

  # Kafka metrics receiver for cluster-level metrics
  kafkametrics:
    brokers: ${env:KAFKA_BOOTSTRAP_BROKER_ADDRESSES}
    protocol_version: 2.0.0
    scrapers:
      - brokers
      - topics
      - consumers
    collection_interval: 30s
    topic_match: ".*"
    metrics:
      kafka.topic.min_insync_replicas:
        enabled: true
      kafka.topic.replication_factor:
        enabled: true
      kafka.partition.replicas:
        enabled: false
      kafka.partition.oldest_offset:
        enabled: false
      kafka.partition.current_offset:
        enabled: false

processors:
  batch/aggregation:
    send_batch_size: 1024
    timeout: 30s

  resourcedetection:
    detectors: [env, ec2, system]
    system:
      resource_attributes:
        host.name:
          enabled: true
        host.id:
          enabled: true

  resource:
    attributes:
      - action: insert
        key: kafka.cluster.name
        value: ${env:KAFKA_CLUSTER_NAME}

  transform/remove_broker_id:
    metric_statements:
      # Remove broker.id from resource attributes for cluster-level metrics
      - context: resource
        statements:
          - delete_key(attributes, "broker.id")

  transform/remove_extra_attributes:
    metric_statements:
      - context: resource
        statements:
          # Delete all attributes starting with "process."
          - delete_matching_keys(attributes, "^process\\..*")
          # Delete all attributes starting with "telemetry."
          - delete_matching_keys(attributes, "^telemetry\\..*")
          - delete_key(attributes, "host.arch")
          - delete_key(attributes, "os.description")

  filter/include_cluster_metrics:
    metrics:
      include:
        match_type: regexp
        metric_names:
          - "kafka\\.partition\\.offline"
          - "kafka\\.(leader|unclean)\\.election\\.rate"
          - "kafka\\.partition\\.non_preferred_leader"
          - "kafka\\.broker\\.fenced\\.count"
          - "kafka\\.cluster\\.partition\\.count"
          - "kafka\\.cluster\\.topic\\.count"

  filter/exclude_cluster_metrics:
    metrics:
      exclude:
        match_type: regexp
        metric_names:
          - "kafka\\.partition\\.offline"
          - "kafka\\.(leader|unclean)\\.election\\.rate"
          - "kafka\\.partition\\.non_preferred_leader"
          - "kafka\\.broker\\.fenced\\.count"
          - "kafka\\.cluster\\.partition\\.count"
          - "kafka\\.cluster\\.topic\\.count"

  transform/des_units:
    metric_statements:
      - context: metric
        statements:
          - set(description, "") where description != ""
          - set(unit, "") where unit != ""

  cumulativetodelta:

  metricstransform/kafka_topic_sum_aggregation:
    transforms:
      - include: kafka.partition.replicas_in_sync
        action: insert
        new_name: kafka.partition.replicas_in_sync.total
        operations:
          - action: aggregate_labels
            label_set: [topic]
            aggregation_type: sum
      
      - include: kafka.partition.replicas
        action: insert
        new_name: kafka.partition.replicas.total
        operations:
          - action: aggregate_labels
            label_set: [topic]
            aggregation_type: sum

  filter/remove_partition_level_replicas:
    metrics:
      exclude:
        match_type: strict
        metric_names:
          - kafka.partition.replicas_in_sync

exporters:
  otlp/newrelic:
    endpoint: ${env:NEW_RELIC_OTLP_ENDPOINT}
    headers:
      api-key: ${env:NEW_RELIC_LICENSE_KEY}
    compression: gzip
    timeout: 30s

service:
  pipelines:
    # Broker metrics pipeline (excludes cluster-level metrics)
    metrics/broker:
      receivers: [otlp, kafkametrics]
      processors: [resourcedetection, resource, filter/exclude_cluster_metrics, transform/remove_extra_attributes, transform/des_units, cumulativetodelta, metricstransform/kafka_topic_sum_aggregation, filter/remove_partition_level_replicas, batch/aggregation]
      exporters: [otlp/newrelic]

    # Cluster metrics pipeline (only cluster-level metrics, no broker.id)
    metrics/cluster:
      receivers: [otlp]
      processors: [resourcedetection, resource, filter/include_cluster_metrics, transform/remove_broker_id, transform/remove_extra_attributes, transform/des_units, cumulativetodelta, batch/aggregation]
      exporters: [otlp/newrelic]

アーキテクチャーのハイライト:

OTLP レシーバー: ポート 4317 上のgRPCを介して、Kafka ブローカー上で実行されているOpenTelemetry Javaエージェントから Kafka および JMX メトリクスを受信します。
2 つのパイプラインによるアプローチ: クラスタレベルのメトリクスは、broker.id なしで送信され、クラスタエンティティにマップされます。
メトリクスフィルタリング: ブローカー固有のメトリクスをクラスターレベルのメトリクスから分離し、重複を回避します。
集計: パーティションレベルのメトリクスをトピックごとに自動的に集計します。

環境変数の設定

コレクターをインストールする前に、必要な環境変数を設定します。

bash

$export NEW_RELIC_LICENSE_KEY="YOUR_LICENSE_KEY"
$export KAFKA_CLUSTER_NAME="my-kafka-cluster"
$export KAFKA_BOOTSTRAP_BROKER_ADDRESSES="localhost:9092"
$export NEW_RELIC_OTLP_ENDPOINT="https://otlp.nr-data.net:4317" # US region

交換する：

YOUR_LICENSE_KEY New Relicライセンスキーを使用して
my-kafka-cluster Kafka クラスターの一意の名前を付ける
localhost:9092 Kafka ブートストラップブローカーのアドレスを入力します。ブローカーが複数ある場合は、コンマ区切りのリストを使用します。 broker1:9092,broker2:9092,broker3:9092
OTLP エンドポイント: https://otlp.nr-data.net:4317 (米国リージョン) またはhttps://otlp.eu01.nr-data.net:4317 (EU リージョン) を使用します。その他のエンドポイントの設定については、「OTLP エンドポイントの設定」を参照してください。

コレクターをインストールして起動する

NRDOT Collector (New Relic のディストリビューション) または OpenTelemetry Collector を選択します。

ヒント

NRDOT Collector は、New Relic のサポートを受けた OpenTelemetry Collector の New Relic ディストリビューションです。

バイナリをダウンロードしてインストールする

ホスト OS 用の NRDOT Collectorバイナリをダウンロードしてインストールします。以下の例は linux_amd64 アーキテクチャーの場合です。

bash

$# Set version and architecture
$NRDOT_VERSION="1.9.0"
$ARCH="amd64"  # or arm64
$
$# Download and extract
$curl "https://github.com/newrelic/nrdot-collector-releases/releases/download/${NRDOT_VERSION}/nrdot-collector_${NRDOT_VERSION}_linux_${ARCH}.tar.gz" \
>  --location --output collector.tar.gz
$tar -xzf collector.tar.gz
$
$# Move to a location in PATH (optional)
$sudo mv nrdot-collector /usr/local/bin/
$
$# Verify installation
$nrdot-collector --version

重要

他の OS およびアーキテクチャーについては、 NRDOT Collectorリリースにアクセスし、システムに適切なバイナリをダウンロードしてください。

コレクターを起動する

設定ファイルを使用してコレクターを実行して、監視を開始します。

bash

$nrdot-collector --config ~/opentelemetry/kafka-config.yaml

コレクターは、数分以内に Kafka メトリクスのNew Relicへの送信を開始します。

バイナリをダウンロードしてインストールする

ホスト OS 用のOpenTelemetry Collector Contrib バイナリをダウンロードしてインストールします。以下の例は linux_amd64 アーキテクチャーの場合です。

bash

$# Set version and architecture
$# Check https://github.com/open-telemetry/opentelemetry-collector-releases/releases/latest for the latest version
$OTEL_VERSION="<collector_version>"
$ARCH="amd64"
$
$# Download the collector
$curl -L -o otelcol-contrib.tar.gz \
>  "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcol-contrib_${OTEL_VERSION}_linux_${ARCH}.tar.gz"
$
$# Extract the binary
$tar -xzf otelcol-contrib.tar.gz
$
$# Move to a location in PATH (optional)
$sudo mv otelcol-contrib /usr/local/bin/
$
$# Verify installation
$otelcol-contrib --version

その他の OS については、 OpenTelemetry Collectorリリースページをご覧ください。

コレクターを起動する

設定ファイルを使用してコレクターを実行して、監視を開始します。

bash

$otelcol-contrib --config ~/opentelemetry/kafka-config.yaml

コレクターは、数分以内に Kafka メトリクスのNew Relicへの送信を開始します。

(オプション) 計装プロデューサーまたは消費者アプリケーション

重要

言語サポート: 現在、 OpenTelemetry Javaエージェントを使用した Kafka クライアント計装ではJavaアプリケーションのみがサポートされています。

Kafka プロデューサおよびコンシューマアプリケーションからアプリケーションレベルのテレメトリーを収集するには、ステップ 1でダウンロードしたOpenTelemetry Javaエージェントを使用します。

エージェントを使用してアプリケーションを開始します。

bash

$java \
>  -javaagent:$HOME/opentelemetry/opentelemetry-javaagent.jar \
>  -Dotel.service.name="order-process-service" \
>  -Dotel.resource.attributes="kafka.cluster.name=my-kafka-cluster" \
>  -Dotel.exporter.otlp.endpoint=http://localhost:4317 \
>  -Dotel.exporter.otlp.protocol="grpc" \
>  -Dotel.metrics.exporter="otlp" \
>  -Dotel.traces.exporter="otlp" \
>  -Dotel.logs.exporter="otlp" \
>  -Dotel.instrumentation.kafka.experimental-span-attributes="true" \
>  -Dotel.instrumentation.messaging.experimental.receive-telemetry.enabled="true" \
>  -Dotel.instrumentation.kafka.producer-propagation.enabled="true" \
>  -Dotel.instrumentation.kafka.enabled="true" \
>  -jar your-kafka-application.jar

交換する：

order-process-service プロデューサーまたは消費者アプリケーションの一意の名前を付ける
my-kafka-cluster コレクター設定で使用されているのと同じクラスタ名を持つ

ヒント

上記の設定は、localhost:4317 で実行されているOpenTelemetry Collectorにテレメトリーを送信します。

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"

exporters:
  otlp/newrelic:
    endpoint: https://otlp.nr-data.net:4317
    headers:
      api-key: "${NEW_RELIC_LICENSE_KEY}"
    compression: gzip
    timeout: 30s

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp/newrelic]
    metrics:
      receivers: [otlp]
      exporters: [otlp/newrelic]
    logs:
      receivers: [otlp]
      exporters: [otlp/newrelic]

これにより、処理をカスタマイズしたり、フィルターを追加したり、複数のバックエンドにルーティングしたりできるようになります。その他のエンドポイントの設定については、「OTLP エンドポイントの設定」を参照してください。

Javaエージェントは、コード変更なしですぐに使用できる Kafka 計装を提供し、以下をキャプチャします。

リクエストのレイテンシ
スループットメトリクス
エラー率
分散型トレース

高度な設定については、 Kafka 計装ドキュメントを参照してください。

（オプション）Kafkaブローカーログを転送する

Kafka ブローカーログを収集して New Relic に送信するには、OpenTelemetry Collector でファイルログレシーバーを構成します。

~/opentelemetry/kafka-config.yamlでコレクター設定を更新して、ファイルログレシーバーを追加します。

受信者セクションに追加:

receivers:
  # ... existing receivers (otlp, kafkametrics) ...
  
  # File log receiver for Kafka broker logs
  filelog/kafka_broker_1:
    include:
      - /path/to/kafka/logs/server.log
    start_at: end
    multiline:
      line_start_pattern: '^\['
    resource:
      broker.id: "1"
      kafka.cluster.name: ${env:KAFKA_CLUSTER_NAME}

サービスセクションにログパイプラインを追加します。

service:
  pipelines:
    # ... existing pipelines (metrics/broker, metrics/cluster) ...
    
    # Logs pipeline for Kafka broker logs
    logs:
      receivers: [filelog/kafka_broker_1]
      processors: [batch/aggregation, resourcedetection]
      exporters: [otlp/newrelic]

設定メモ:

/path/to/kafka/logs/server.log実際の Kafka ログファイルパスに更新します (例: ~/kafka/logs/server.log)
broker.idリソースのプロパティは、ログを特定のブローカーのメトリクスおよびエンティティと関連付けます。
ブローカーが複数ある場合は、それぞれのブローカー ID を持つ個別のfilelogレシーバー (例: filelog/kafka_broker_2 、 filelog/kafka_broker_3) を作成します。
multilineパターンはログが[で始まることを前提としています - ログ形式が異なる場合は調整してください
ログ転送を有効にする前に、ログの量と収集コストを考慮する
完全な設定オプションについては、ファイルログレシーバーのドキュメントを参照してください。

設定を更新したら、コレクターを再起動します。

bash

$# If running in foreground, stop with Ctrl+C and restart
$nrdot-collector --config ~/opentelemetry/kafka-config.yaml
$# Or for OpenTelemetry Collector
$otelcol-contrib --config ~/opentelemetry/kafka-config.yaml

Kafka ブローカーログは次の 2 つの場所に表示されます。

ブローカーエンティティ: New Relicの Kafka ブローカーエンティティに移動して、その特定のブローカーに関連付けられたログを表示します。
ログUI : 次のようなフィルターを備えたログUIを使用して、すべてのKafkaログを書き込みます。 kafka.cluster.name = 'my-cluster'
NRQL を使用してログをクエリすることもできます。
```
FROM Log SELECT * WHERE kafka.cluster.name = 'my-kafka-cluster'
```

詳細: メトリクスコレクションをカスタマイズする

jmx-custom-config.yamlルールを拡張することで、さらに Kafka メトリクスを追加できます。

OpenTelemetry JMX メトリクス設定構文について学ぶ
利用可能なMBean名はKafka監視ドキュメントで確認してください。

これにより、特定の監視ニーズに基づいて、Kafka ブローカーによって公開された JMX メトリクスを収集できるようになります。

データを検索する

数分後、Kafka メトリクスがNew Relicに表示されるはずです。 New Relic UI のさまざまなビューで Kafka メトリクスを探索する詳細な手順については、「データの検索」を参照してください。

NRQL を使用してデータをクエリすることもできます。

FROM Metric SELECT * WHERE kafka.cluster.name = 'my-kafka-cluster'

トラブルシューティング

最初にこれらのコマンドを実行してセットアップを確認してください。結果を使用して、どのトラブルシューティングセクションに従うべきかを特定します。

コレクターが実行中かどうかを確認します:

bash

$# Check if port 4317 is listening (best indicator collector is running)
$ss -tlnp | grep 4317
$
$# Search for collector process (using bracket trick to exclude grep itself)
$ps aux | grep "[k]afka-config.yaml"
$
$# Or search for common collector names
$ps aux | grep -E "[n]rdot-collector|[o]telcol"

結果が表示されない場合、コレクターは実行されていません。手順 6 に従って開始します。

JavaエージェントがKafkaブローカーにアタッチされているかどうかを確認します。

bash

$# Search for Kafka processes with Java agent attached
$ps aux | grep "[o]pentelemetry-javaagent"

注: このコマンドは、クラスパスを含む完全な Java プロセスを表示しますが、これは非常に長くなる可能性があります (ブローカーごとに 3 行以上)。これは予想通りです。出力で-javaagent:/path/to/opentelemetry-javaagent.jarを探します。

テストポートの接続性:

bash

$# Test Kafka bootstrap port (9092)
$timeout 5 bash -c "</dev/tcp/localhost/9092" 2>/dev/null && echo "Port 9092 open" || echo "Port 9092 closed"
$
$# Test OTLP collector port (4317)
$timeout 5 bash -c "</dev/tcp/localhost/4317" 2>/dev/null && echo "Port 4317 open" || echo "Port 4317 closed"

コレクターログを確認します:

bash

$# View recent collector output
$tail -n 50 ~/logs/collector.log

環境変数を確認します:

bash

$echo $NEW_RELIC_LICENSE_KEY
$echo $KAFKA_CLUSTER_NAME
$echo $KAFKA_BOOTSTRAP_BROKER_ADDRESSES

コレクターデバッグログを有効にする: 設定の問題をトラブルシューティングするための詳細なログを追加します。

コレクター設定に追加します:

service:
  telemetry:
    logs:
      level: "debug"  # Enable detailed collector internal logs

デバッグエクスポーターを追加: New Relicに送信する前にコレクターログでメトリクスを表示

exporters:
  debug:
    verbosity: detailed
    sampling_initial: 5        # Log first 5 metrics
    sampling_thereafter: 200   # Then log every 200th metric

  otlp/newrelic:
    endpoint: https://otlp.nr-data.net:4317
    headers:
      api-key: ${env:NEW_RELIC_LICENSE_KEY}
    compression: gzip
    timeout: 30s

service:
  pipelines:
    metrics/broker:
      receivers: [otlp, kafkametrics]
      processors: [resourcedetection, resource, filter/exclude_cluster_metrics, transform/des_units, cumulativetodelta, metricstransform/kafka_topic_sum_aggregation, batch/aggregation]
      exporters: [debug, otlp/newrelic]  # Add debug exporter

    metrics/cluster:
      receivers: [otlp]
      processors: [resourcedetection, resource, filter/include_cluster_metrics, transform/remove_broker_id, transform/des_units, cumulativetodelta, batch/aggregation]
      exporters: [debug, otlp/newrelic]  # Add debug exporter

次にコレクターを再起動してログを確認します。

bash

$# Check collector output log
$tail -f ~/logs/collector.log
$
$# Look for metric output in the logs

重要: ログのオーバーフローを回避するために、本番環境ではデバッグエクスポーターを削除してください。

まず、初期システムチェックを実行して、コレクターとJavaエージェントが実行されていることを確認します。

コレクターログでエラーを確認します。認証または接続の失敗を探します。

bash

$# Look for errors in collector output
$tail -n 100 ~/logs/collector.log | grep -i "error\|fail\|refuse"
$
$# Check for OTLP receiver activity
$tail -n 100 ~/logs/collector.log | grep -i "otlp\|metric"

まず、初期システムチェックを実行して、 Javaエージェントが Kafka プロセスにアタッチされていることを確認します。

Javaエージェントの初期化についてはブローカーログを確認してください:

bash

$# Find Kafka log directory (common locations)
$find ~ -name "server.log" -path "*/kafka/logs/*" 2>/dev/null
$
$# Check the log file for OpenTelemetry messages
$# Replace with your actual Kafka log path
$tail -100 ~/kafka/logs/server.log 2>/dev/null | grep -i "otel\|jmx" || echo "Log file not found or no OTel messages"
$
$# Check directory where you started Kafka for nohup.out
$ls -lh nohup.out 2>/dev/null && tail -100 nohup.out | grep -i "otel\|jmx" || echo "No nohup.out file found"

Javaエージェント設定の確認: 起動コマンドがステップ 3 と一致することを確認します。

bash

$# Check if broker was started with correct Java agent parameters
$ps aux | grep "[o]pentelemetry-javaagent" | grep -o "Dotel\.[^ ]*"

これにより、すべての-Dotel.*問題が表示されるはずです。確認する：

-Dotel.jmx.enabled=true
-Dotel.jmx.config=<path>
-Dotel.exporter.otlp.endpoint=http://localhost:4317

受信した JMX メトリクスのコレクターログを確認します。

bash

$# Look for metrics coming from brokers
$tail -n 100 ~/logs/collector.log | grep -i "broker.id\|kafka\|jmx"

まず、初期システムチェックを実行して、ポート 4317 がリッスンしていてアクセス可能であることを確認します。

特定の OTLP エラーについてはコレクターログを確認してください。

bash

$# Look for connection refusals or timeouts
$tail -n 100 ~/logs/collector.log | grep -i "connection refused\|context deadline exceeded\|failed to connect"

OTLP レシーバーの確認設定: コレクターが (127.0.0.1ではなく) 0.0.0.0:4317でリッスンしていることを確認します。

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"  # Accepts connections from any interface

リモート接続をテストします(コレクターと Kafka が異なるホスト上にある場合):

bash

$# From Kafka broker machine, test connection to collector
$timeout 5 bash -c "</dev/tcp/COLLECTOR_HOST/4317" 2>/dev/null && echo "Can reach collector" || echo "Cannot reach collector"

1. モニターコレクターのメモリ使用量:

bash

$# Check memory usage of the collector process
$ps aux | grep -E "[n]rdot-collector|[o]telcol" | awk '{print $1, $2, $4, $11}'
$
$# Watch memory usage over time (refresh every 2 seconds)
$watch -n 2 'ps aux | grep -E "[n]rdot-collector|[o]telcol" | awk "{print \$1, \$2, \$4, \$11}"'
$
$# Check overall system memory
$free -h

2. モニタートピックの削減：収集を重要なトピックのみに制限します

receivers:
  kafkametrics:
    brokers: ${env:KAFKA_BOOTSTRAP_BROKER_ADDRESSES}
    collection_interval: 30s
    scrapers:
      - brokers
      - topics  # Consider removing if not needed
      - consumers  # Consider removing if not needed
    topic_match: "^(important-topic-1|important-topic-2)$"  # Filter specific topics

3. 収集頻度を減らす：収集間隔を長くして収集頻度を減らす

コレクターの Kafka メトリクスレシーバーの場合:

receivers:
  kafkametrics:
    collection_interval: 60s  # Increase from 30s to 60s

Javaエージェントからの JMX メトリクスの場合は、ブローカー起動コマンドを更新します。

bash

$-Dotel.metric.export.interval=60000  # Increase from 30000ms to 60000ms

4. バッチ処理の最適化: メモリ内のバッチサイズを削減

processors:
  batch/aggregation:
    send_batch_size: 512  # Reduce from 1024
    timeout: 60s

5. メモリリミッターを追加します。コレクターがメモリ閾値を超えないようにします。

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512       # Hard limit in MiB — drop data if exceeded
    spike_limit_mib: 128 # Allowed spike above limit before dropping

  batch/aggregation:
    send_batch_size: 512
    timeout: 60s

service:
  pipelines:
    metrics/broker:
      receivers: [otlp, kafkametrics]
      processors: [memory_limiter, resourcedetection, resource, filter/exclude_cluster_metrics, transform/remove_extra_attributes, transform/des_units, cumulativetodelta, metricstransform/kafka_topic_sum_aggregation, filter/remove_partition_level_replicas, batch/aggregation]
      exporters: [otlp/newrelic]
    metrics/cluster:
      receivers: [otlp]
      processors: [memory_limiter, resourcedetection, resource, filter/include_cluster_metrics, transform/remove_broker_id, transform/remove_extra_attributes, transform/des_units, cumulativetodelta, batch/aggregation]
      exporters: [otlp/newrelic]

6. 変更後、コレクターを再起動します。

bash

$# Find the collector process ID and stop it
$pkill -f "kafka-config.yaml"
$
$# Restart NRDOT Collector
$nrdot-collector --config ~/opentelemetry/kafka-config.yaml
$
$# Or restart OpenTelemetry Collector
$otelcol-contrib --config ~/opentelemetry/kafka-config.yaml

次のステップ

Kafka メトリクスを調べる- 完全なメトリクスリファレンスを見る
カスタムダッシュボードの作成- Kafka データの視覚化を構築します
アラートのセットアップ- 消費者のラグやレプリケーションが不十分なパーティションなどの重要なメトリクスを監視します

この機械翻訳は、参考として提供されています。

OpenTelemetryを使用した自己ホスト型 Kafka の監視

アーキテクチャー

インストレーション手順

あなたが始める前に

OpenTelemetry Javaエージェントをダウンロードする

JMXカスタム設定の作成

Kafkaブローカーを構成する

重要

設定パラメーター

コレクター設定の作成

設定に関する注意事項

追加の受信機ドキュメント

環境変数の設定

コレクターをインストールして起動する

ヒント

重要

(オプション) 計装プロデューサーまたは消費者アプリケーション

重要

ヒント

サンプルコレクター設定

（オプション）Kafkaブローカーログを転送する

ログ収集を構成する

New Relicでログを見つける

詳細: メトリクスコレクションをカスタマイズする

データを検索する

トラブルシューティング

初期システムチェック

デバッグログを有効にする

New Relicにデータが表示されない

Kafka ブローカーから JMX メトリクスが欠落している

OTLP接続エラー

メモリ使用量が多い

次のステップ

この機械翻訳は、参考として提供されています。

OpenTelemetryを使用した自己ホスト型 Kafka の監視

アーキテクチャー .css-21sua1{background:none;border:none;width:0;padding:0;}

インストレーション手順

あなたが始める前に

OpenTelemetry Javaエージェントをダウンロードする

JMXカスタム設定の作成

Kafkaブローカーを構成する

重要

コレクター設定の作成

設定に関する注意事項

追加の受信機ドキュメント

環境変数の設定

コレクターをインストールして起動する

ヒント

重要

(オプション) 計装プロデューサーまたは消費者アプリケーション

重要

ヒント

サンプルコレクター設定

（オプション）Kafkaブローカーログを転送する

ログ収集を構成する

New Relicでログを見つける

詳細: メトリクス コレクションをカスタマイズする

データを検索する

トラブルシューティング

初期システムチェック

デバッグログを有効にする

New Relicにデータが表示されない

Kafka ブローカーから JMX メトリクスが欠落している

OTLP接続エラー

メモリ使用量が多い

次のステップ

アーキテクチャー

詳細: メトリクスコレクションをカスタマイズする