Files de messages vs systèmes de streaming : principales différences et cas d'utilisation

Files de messages vs systèmes de streaming : principales différences et cas d’utilisation

Oleksii K. Ingénieur DevOps

Add to my AI research

Dans le monde du traitement des données et des systèmes de messagerie, des termes comme « file d’attente » et « streaming » reviennent souvent. Bien qu’ils puissent sembler similaires, ils ont des rôles distincts et peuvent influencer considérablement la façon dont les systèmes gèrent les données. Décomposons leurs différences de manière simple.

Qu’est-ce qu’une file d’attente de messages ?

Imaginez un café où les clients passent des commandes en ligne ou en personne. Une fois une commande traitée, le client est notifié pour la récupérer. Dans cette analogie, les commandes fonctionnent comme des messages dans une file d’attente, et le barista les traite une à une, retirant chaque commande de la file une fois terminée. C’est essentiellement ainsi qu’opère une file d’attente de messages.

Chaque message représente une tâche distincte à gérer indépendamment. Les messages dans la file sont consommés dans l’ordre, et leur consommation est généralement destructive, ce qui signifie qu’une fois un message traité, il est supprimé de la file.

Caractéristiques clés des files d’attente de messages :

Communication asynchrone : Les producteurs peuvent envoyer des messages sans que les consommateurs doivent être prêts simultanément. Comme pour commander un café, vous n’avez pas besoin d’attendre pendant qu’il est préparé.
Premier entré, premier sorti (FIFO) : Les messages sont traités dans l’ordre où ils sont reçus, ce qui est crucial pour les opérations qui dépendent d’un séquençage strict, comme les transactions bancaires. Certaines files d’attente peuvent permettre un traitement non-FIFO, selon la configuration.
Durabilité : Les messages sont stockés de façon fiable jusqu’à ce qu’un consommateur les traite. Cela garantit qu’aucun message n’est perdu, même en cas de pannes système.
Livraison exclusive : Chaque message est consommé par une seule instance de consommateur, garantissant qu’il n’y a pas de traitement en double. Les messages sont supprimés une fois reconnus par le consommateur.

Cas d’utilisation courants pour les files d’attente :

Les files d’attente de messages sont idéales pour les scénarios nécessitant un traitement parallèle et une évolutivité. Exemples :

Gestion des stocks : Suivi et mise à jour des niveaux de stock en temps réel.
Systèmes de santé : Gestion du flux de patients et de la planification des rendez-vous.
Opérations de restaurant : Gestion des commandes des clients et des réservations.

Qu’est-ce que les messages en streaming ?

Imaginez maintenant un concert en direct où la musique s’écoule de manière continue, et le public en fait l’expérience en temps réel. Les messages en streaming se concentrent sur un flux continu de données et un traitement en temps réel.

Caractéristiques clés des messages en streaming :

Traitement en temps réel : Les messages en streaming sont consommés immédiatement lorsqu’ils sont produits, un peu comme écouter de la musique sur un service de streaming.
Architecture axée sur les événements : Les données sont poussées vers les consommateurs dès qu’elles sont disponibles, permettant des réactions instantanées. Par exemple, les fils d’actualité des réseaux sociaux se mettent à jour de façon dynamique avec de nouveaux posts, likes et commentaires.
Évolutivité : Les systèmes de streaming peuvent traiter des volumes massifs de données, les rendant adaptés à l’analyse en temps réel, la surveillance et l’apprentissage automatique.
Rétention des messages : Les messages sont stockés pendant une période spécifiée et peuvent être rejoués pour un traitement par lots ou une récupération d’erreur. La rétention est basée sur le temps (par exemple, 7 jours) ou la taille (par exemple, 1 Go par partition).

Cas d’utilisation courants pour le streaming :

Le streaming est essentiel à la vie moderne, alimentant des applications telles que :

Surveillance des prix des actions : Fournir des mises à jour en temps réel aux traders.
Détection de fraude : Identifier instantanément les activités suspectes.
Analytique du service client : Suivi des interactions et des sentiments en temps réel.

Pourquoi utiliser les files d’attente dans Apache Kafka ?

Chez Confluent, notre objectif est de faire d’Apache Kafka une solution universelle pour divers traitements de données, éliminant la dépendance aux systèmes propriétaires. Les systèmes de messagerie traditionnels obligent souvent les utilisateurs à choisir entre l’ordre et la vitesse. Kafka comble désormais cette lacune en introduisant le support des files d’attente, offrant aux utilisateurs la flexibilité de traiter les messages de manière séquentielle ou concurrente.

Cette addition améliore la polyvalence de Kafka, lui permettant de supporter à la fois des flux en streaming et des flux basés sur des files d’attente, répondant ainsi à une plus large gamme de cas d’utilisation.

Comment les files d’attente sont-elles prises en charge dans Apache Kafka ?

Kafka utilise une architecture basée sur le log où chaque message se voit attribuer un identifiant unique. Les consommateurs lisent les messages de manière séquentielle, assurant la tolérance aux pannes et permettant la relecture des messages. Avec le nouveau modèle hybride, Kafka combine les avantages des files d’attente traditionnelles et de son design basé sur le log :

Traitement parallèle : Les messages peuvent être consommés par plusieurs consommateurs simultanément.
Capacité de relecture : Les messages peuvent être rejoués pour une récupération ou un retraitement.
Débit élevé : Kafka maintient sa scalabilité et sa fiabilité tout en permettant un traitement hors ordre lorsque nécessaire.

Groupes de consommateurs vs. groupes de partage dans Kafka

Dans Kafka, les groupes de consommateurs gèrent la façon dont les données sont consommées à partir des sujets. Chaque groupe de consommateurs est composé de plusieurs consommateurs travaillant ensemble pour lire les partitions d’un sujet. Il existe une relation 1:1 entre les partitions et les consommateurs au sein d’un groupe. Cependant, l’évolutivité peut devenir inefficace lorsque le nombre de consommateurs dépasse le nombre de partitions.

Les groupes de partage offrent une approche plus flexible, en particulier pour les charges de travail ressemblant à des systèmes de file d’attente traditionnels. Ils permettent à plusieurs consommateurs de lire les mêmes partitions, offrant un contrôle plus fin sur le partage et le traitement des données.

Caractéristiques clés des groupes de partage :

Lecture concurrente : Plusieurs consommateurs dans un groupe de partage peuvent lire la même partition.
Évolutivité dynamique : Plus de consommateurs peuvent être ajoutés pour gérer les pics de charge sans avoir besoin de repartitionner les sujets.
Accusés de réception individuels : Les messages sont accusés de réception un par un, optimisant le traitement en lot tout en permettant la retransmission des messages non traités.
Consommation indépendante : Les consommateurs dans différents groupes de partage peuvent accéder aux mêmes sujets sans interférence.

Les groupes de partage garantissent-ils l’ordre ?

Pas entièrement. Au sein d’un lot, les enregistrements sont dans l’ordre par identifiant, mais l’ordre entre les lots n’est pas garanti. Par exemple, si un consommateur s’arrête au milieu d’un lot, un autre consommateur peut traiter les messages suivants en premier, entraînant une livraison hors ordre entre les lots.

Exemple du monde réel : événement de vente au détail

Considérez un détaillant organisant un événement de vente massive. Le système de caisse doit gérer une vague de commandes efficacement. Avec les groupes de partage :

Traitement parallèle : Les commandes sont réparties entre plusieurs travailleurs pour un traitement concurrent.
Allocation dynamique des ressources : Le système peut ajouter des consommateurs pendant les périodes de pointe et réduire l’échelle pendant les périodes creuses.
Traitement efficace : Les commandes sont traitées rapidement sans nécessiter un séquençage strict.

Cette flexibilité permet au système de s’adapter sans heurt aux charges de travail fluctuantes, garantissant la satisfaction des clients et l’optimisation des ressources.

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.