メッセージキューとストリーミングシステムの比較：重要な違いとユースケース

Oleksii K. DevOpsエンジニア

Add to my AI research

データ処理とメッセージングシステムの世界では、「キュー」や「ストリーミング」といった用語がよく登場します。これらは似たように聞こえるかもしれませんが、それぞれ異なる目的を持っており、システムがデータを扱う方法に大きな影響を与える可能性があります。それらの違いを分かりやすく説明しましょう。

メッセージキューとは何ですか？

オンラインや対面で注文を受けるコーヒーショップを想像してみてください。注文が処理されると、顧客は受け取りを通知されます。このアナロジーでは、注文はキューのメッセージのように機能し、バリスタはそれらを1つずつ処理し、完了した注文をキューから削除します。これがメッセージキューの基本的な動作方法です。

各メッセージは独立して処理される個別のタスクを表します。キュー内のメッセージは順番に消費され、一般的に破壊的消費が行われます。つまり、メッセージが処理されると、キューから削除されます。

メッセージキューの主な特性：

非同期通信：生産者は、消費者が同時に準備ができていなくてもメッセージを送信できます。コーヒーを注文するように、それが作られている間にそばにいる必要はありません。
先入れ先出し（FIFO）：メッセージは受信された順に処理されます。これが厳密な順序に依存する操作、たとえば銀行取引などにとって重要です。設定によっては非FIFO処理を許可するキューもあります。
耐久性：メッセージは消費者が処理するまで確実に保存されます。これにより、システム障害が発生してもメッセージが失われないことが保証されます。
独占配送：各メッセージは1つの消費者インスタンスによってのみ消費され、重複処理が行われることはありません。メッセージは消費者によって確認され次第削除されます。

キューの主な使用例：

メッセージキューは、並列処理とスケーラビリティを必要とするシナリオに最適です。例としては以下のようなものがあります：

在庫管理：在庫レベルをリアルタイムで追跡および更新する。
医療システム：患者の流れと予約スケジュールを管理する。
レストラン業務：顧客の注文と予約を処理する。

ストリーミングメッセージとは何ですか？

今度は、音楽がリアルタイムで流れ、観客がそれを体験するライブコンサートを想像してください。ストリーミングメッセージはデータの連続的なフローとリアルタイム処理に焦点を当てています。

ストリーミングメッセージの主な特性：

リアルタイム処理：ストリーミングメッセージは、ちょうどストリーミングサービスで音楽を聴くように、生成され次第即座に消費されます。
イベント駆動型アーキテクチャ：データは利用可能になったらすぐに消費者にプッシュされ、即時の反応を可能にします。たとえば、ソーシャルメディアのフィードは新しい投稿、いいね、コメントで動的に更新されます。
スケーラビリティ：ストリーミングシステムは、大量のデータを処理することができ、リアルタイムの分析、モニタリング、機械学習に適しています。
メッセージ保持：メッセージは指定された期間保存され、一括処理またはエラー回復のために再生されることができます。保持は時間（例：7日）またはサイズ（例：パーティションあたり1GB）に基づきます。

ストリーミングの主な使用例：

ストリーミングは現代生活に欠かせず、以下のようなアプリケーションを支えています：

株価監視：トレーダーにリアルタイムの更新を提供する。
不正検出：疑わしい活動を即座に識別する。
顧客サービス分析：リアルタイムでのインタラクションと感情の追跡。

Apache Kafkaでキューを使用する理由は？

Confluentでは、Apache Kafkaを多様なデータワークロードに対応するためのユニバーサルソリューションにすることを目的としています。従来のメッセージングシステムは、順序と速度の間でユーザーに選択を迫ることが多いですが、Kafkaはキューサポートを導入することでこのギャップを埋め、メッセージを逐次的または並行的に処理する柔軟性を提供します。

この追加により、Kafkaの柔軟性が向上し、ストリーミングとキューベースのワークフローの両方をサポートでき、より広範なユースケースに対応可能になります。

Apache Kafkaでキューがどのようにサポートされているか？

Kafkaは各メッセージに一意のオフセットを割り当てるログベースのアーキテクチャを採用しています。消費者はメッセージを順番に読み取り、フォールトトレランスを確保し、メッセージの再生を可能にします。新しいハイブリッドモデルでは、Kafkaは従来のキューのメリットとログベースの設計を組み合わせています。

並列処理：メッセージは複数の消費者によって同時に消費される可能性があります。
再生機能：メッセージは復旧や再処理のために再生されることができます。
高スループット：Kafkaはスケーラビリティと信頼性を維持しつつ、必要に応じて順不同処理を可能にします。

Kafkaにおけるコンシューマグループとシェアグループ

Kafkaでは、コンシューマグループがトピックからのデータ消費の管理を行います。各コンシューマグループは、トピックのパーティションからデータを読み取るために協働する複数の消費者で構成されます。グループ内ではパーティションと消費者の間に1:1の関係があります。ただし、消費者の数がパーティションの数を超えるとスケーリングが非効率になることがあります。

シェアグループは、特に従来のキューシステムに似たワークロードには、より柔軟なアプローチを提供します。シェアグループでは、同じパーティションから複数の消費者がデータを読み取ることができ、データ共有と処理の詳細な制御が可能になります。

シェアグループの主要な特徴：

同時読み取り：シェアグループの複数の消費者が同じパーティションから読み取ることができます。
動的スケーリング：トピックをリパーティションせずに、ピーク負荷を処理するためにより多くの消費者を追加できます。
個別の確認：メッセージは一つ一つ確認され、バッチ処理を最適化しつつ、未処理のメッセージの再配信が可能です。
独立した消費：異なるシェアグループの消費者が同じトピックにアクセスしても干渉しません。

シェアグループは順序を保証しますか？

完全ではありません。バッチ内ではレコードはオフセット順に並んでいますが、バッチ間では順序が保証されません。たとえば、消費者がバッチの途中でクラッシュした場合、別の消費者が後続のメッセージを先に処理する可能性があり、バッチ間での順不同配送が発生します。

実際の例：小売の販売イベント

大規模なセールイベントを開催する小売業者を考えてみましょう。チェックアウトシステムは急増する注文を効率的に処理する必要があります。シェアグループを使用すると：

並列処理：注文は複数の作業者に分配され、同時に処理されます。
動的なリソース配分：システムは、ピーク時に消費者を追加し、需要が低下したときにスケールダウンすることができます。
効率的な処理：厳密な順序を要求せず、注文が迅速に処理されます。

この柔軟性により、システムは変動するワークロードにシームレスに適応し、顧客満足度とリソースの最適化を保証します。

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.