Nachrichtenwarteschlangen vs. Streaming-Systeme: Wichtige Unterschiede und Anwendungsfälle

Oleksii K. DevOps-Ingenieur

Add to my AI research

In der Welt der Datenverarbeitung und Nachrichtensysteme tauchen häufig Begriffe wie „Warteschlange“ und „Streaming“ auf. Obwohl sie ähnlich klingen mögen, haben sie unterschiedliche Zwecke und können erheblich beeinflussen, wie Systeme Daten handhaben. Lassen Sie uns ihre Unterschiede auf einfache Weise erklären.

Was sind Nachrichten-Warteschlangen?

Stellen Sie sich ein Café vor, in dem Kunden Bestellungen online oder persönlich aufgeben. Sobald eine Bestellung bearbeitet ist, wird der Kunde darüber informiert, dass er sie abholen kann. In dieser Analogie funktionieren Bestellungen wie Nachrichten in einer Warteschlange, und der Barista bearbeitet sie einzeln und entfernt jede Bestellung aus der Warteschlange, sobald sie fertiggestellt ist. So funktioniert im Wesentlichen eine Nachrichten-Warteschlange.

Jede Nachricht stellt eine eigenständige Aufgabe dar, die unabhängig bearbeitet wird. Nachrichten in der Warteschlange werden der Reihenfolge nach konsumiert, und ihr Verbrauch ist typischerweise zerstörerisch, d. h., sobald eine Nachricht verarbeitet ist, wird sie aus der Warteschlange gelöscht.

Wichtige Merkmale von Nachrichten-Warteschlangen:

Asynchrone Kommunikation: Produzenten können Nachrichten senden, ohne dass Verbraucher gleichzeitig bereit sein müssen. So wie beim Kaffeebestellen muss man nicht danebenstehen, während er zubereitet wird.
First In, First Out (FIFO): Nachrichten werden in der Reihenfolge verarbeitet, in der sie empfangen werden, was für Vorgänge entscheidend ist, die auf strikte Reihenfolge angewiesen sind, wie z.B. Banktransaktionen. Einige Warteschlangen ermöglichen je nach Konfiguration eine nicht-FIFO-Verarbeitung.
Dauerhaftigkeit: Nachrichten werden zuverlässig gespeichert, bis ein Verbraucher sie verarbeitet. Dies stellt sicher, dass keine Nachrichten verloren gehen, selbst wenn es zu Systemausfällen kommt.
Exklusive Zustellung: Jede Nachricht wird nur von einer Verbraucherinstanz konsumiert, was eine doppelte Verarbeitung verhindert. Nachrichten werden gelöscht, sobald sie vom Verbraucher bestätigt werden.

Häufige Anwendungsfälle für Warteschlangen:

Nachrichten-Warteschlangen sind ideal für Szenarien, die parallele Verarbeitung und Skalierbarkeit erfordern. Beispiele sind:

Bestandsverwaltung: Verfolgung und Aktualisierung der Lagerbestände in Echtzeit.
Gesundheitssysteme: Verwaltung von Patientenfluss und Terminplanung.
Restaurantbetrieb: Bearbeitung von Kundenbestellungen und Reservierungen.

Was sind Streaming-Nachrichten?

Stellen Sie sich nun ein Live-Konzert vor, bei dem die Musik kontinuierlich fließt und das Publikum sie in Echtzeit erlebt. Streaming-Nachrichten konzentrieren sich auf einen kontinuierlichen Datenfluss und die Echtzeitverarbeitung.

Wichtige Merkmale von Streaming-Nachrichten:

Echtzeitverarbeitung: Streaming-Nachrichten werden sofort konsumiert, sobald sie produziert werden, ähnlich wie beim Musikhören bei einem Streaming-Dienst.
Ereignisgesteuerte Architektur: Daten werden an Verbraucher weitergeleitet, sobald sie verfügbar sind, was sofortige Reaktionen ermöglicht. So aktualisieren sich beispielsweise Feeds in sozialen Medien dynamisch mit neuen Posts, Likes und Kommentaren.
Skalierbarkeit: Streaming-Systeme können riesige Datenmengen verarbeiten, was sie geeignet für Echtzeitanalysen, Monitoring und maschinelles Lernen macht.
Nachrichtenaufbewahrung: Nachrichten werden für einen bestimmten Zeitraum gespeichert und können für Batch-Verarbeitung oder Fehlerbehebung wieder abgespielt werden. Die Aufbewahrung basiert auf der Zeit (z. B. 7 Tage) oder der Größe (z. B. 1GB pro Partition).

Häufige Anwendungsfälle für Streaming:

Streaming ist integraler Bestandteil des modernen Lebens und treibt Anwendungen an wie:

Aktienkursüberwachung: Bereitstellung von Echtzeit-Updates für Händler.
Betrugserkennung: Sofortige Erkennung verdächtiger Aktivitäten.
Kundenservice-Analytik: Überwachung der Interaktionen und Stimmungen in Echtzeit.

Warum Warteschlangen in Apache Kafka verwenden?

Bei Confluent wollen wir Apache Kafka zu einer universellen Lösung für vielfältige Datenlasten machen und Abhängigkeiten von proprietären Systemen beseitigen. Traditionelle Nachrichtensysteme erfordern oft, dass Benutzer zwischen Ordnung und Geschwindigkeit wählen. Kafka überbrückt diese Lücke nun, indem es Warteschlangensupport einführt, der den Benutzern die Flexibilität bietet, Nachrichten entweder sequentiell oder gleichzeitig zu verarbeiten.

Diese Ergänzung erhöht Kafkas Vielseitigkeit, indem es sowohl Streaming- als auch Warteschlangen-basierte Workflows unterstützt und damit eine breitere Palette von Anwendungsfällen abdeckt.

Wie werden Warteschlangen in Apache Kafka unterstützt?

Kafka verwendet eine logbasierte Architektur, bei der jede Nachricht eine eindeutige Offset-Zuweisung erhält. Verbraucher lesen Nachrichten der Reihe nach, was Ausfallsicherheit gewährleistet und das Wiederabspielen von Nachrichten ermöglicht. Mit dem neuen hybriden Modell kombiniert Kafka die Vorteile traditioneller Warteschlangen mit seinem logbasierten Design:

Parallele Verarbeitung: Nachrichten können von mehreren Verbrauchern gleichzeitig konsumiert werden.
Wiedergabefähigkeit: Nachrichten können zum Zwecke der Wiederherstellung oder erneuten Verarbeitung abgespielt werden.
Hoher Durchsatz: Kafka behält seine Skalierbarkeit und Zuverlässigkeit bei, während es wenn nötig eine nicht-sequenzielle Verarbeitung ermöglicht.

Verbrauchergruppen vs. Share-Gruppen in Kafka

In Kafka verwalten Verbrauchergruppen, wie Daten aus Themen konsumiert werden. Jede Verbrauchergruppe besteht aus mehreren Verbrauchern, die zusammenarbeiten, um die Partitionen eines Themas zu lesen. Innerhalb einer Gruppe besteht eine 1:1 Beziehung zwischen Partitionen und Verbrauchern. Allerdings kann die Skalierung ineffizient werden, wenn die Anzahl der Verbraucher die Anzahl der Partitionen übersteigt.

Share-Gruppen bieten einen flexibleren Ansatz, insbesondere für Arbeitslasten, die traditionellen Warteschlangensystemen ähneln. Sie ermöglichen es mehreren Verbrauchern, von denselben Partitionen zu lesen und so eine feinere Kontrolle über die Datenfreigabe und -verarbeitung zu erhalten.

Schlüsselfunktionen von Share-Gruppen umfassen:

Gleichzeitiges Lesen: Mehrere Verbraucher in einer Share-Gruppe können von derselben Partition lesen.
Dynamische Skalierung: Es können mehr Verbraucher hinzugefügt werden, um Spitzenlasten zu bewältigen, ohne Themen neu zu partitionieren.
Einzelbestätigungen: Nachrichten werden einzeln bestätigt, was die Batch-Verarbeitung optimiert und die erneute Zustellung unverarbeiteter Nachrichten ermöglicht.
Unabhängiger Konsum: Verbraucher in verschiedenen Share-Gruppen können auf dieselben Themen zugreifen, ohne sich zu beeinträchtigen.

Garantiert die Share-Gruppe die Reihenfolge?

Nicht vollständig. Innerhalb eines Batches sind die Datensätze in der Reihenfolge der Offsets, aber die Reihenfolge zwischen den Batches ist nicht garantiert. Beispiel: Wenn ein Verbraucher mitten im Batch abstürzt, kann ein anderer Verbraucher die nachfolgenden Nachrichten zuerst verarbeiten, was zu einer nicht-sequenziellen Lieferung über Batches hinweg führt.

Praxisbeispiel: Große Verkaufsveranstaltung im Einzelhandel

Stellen Sie sich einen Einzelhändler vor, der eine massive Verkaufsveranstaltung durchführt. Das Kassensystem muss einen Ansturm von Bestellungen effizient bearbeiten. Mit Share-Gruppen:

Parallele Verarbeitung: Die Bestellungen werden auf mehrere Mitarbeiter zur gleichzeitigen Verarbeitung verteilt.
Dynamische Ressourcenallokation: Das System kann in Spitzenzeiten Verbraucher hinzufügen und während Ruhephasen reduzieren.
Effiziente Verarbeitung: Bestellungen werden schnell bearbeitet, ohne dass eine strikte Reihenfolge erforderlich ist.

Diese Flexibilität ermöglicht es dem System, sich nahtlos an schwankende Arbeitslasten anzupassen und so die Kundenzufriedenheit und Ressourcennutzung zu optimieren.

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.