Colas de Mensajes vs. Sistemas de Streaming: Diferencias Clave y Casos de Uso

Oleksii K. Ingeniero DevOps

Add to my AI research

En el mundo del procesamiento de datos y sistemas de mensajería, términos como «cola» y «transmisión» suelen aparecer. Aunque puedan sonar similares, cumplen propósitos distintos y pueden impactar significativamente en cómo los sistemas manejan los datos. Vamos a desglosar sus diferencias de una manera sencilla.

¿Qué son las colas de mensajes?

Imagina una cafetería donde los clientes hacen pedidos en línea o en persona. Una vez que un pedido es procesado, se notifica al cliente para que lo recoja. En esta analogía, los pedidos funcionan como mensajes en una cola, y el barista los procesa uno a uno, eliminando cada pedido de la cola una vez completado. Así es esencialmente cómo opera una cola de mensajes.

Cada mensaje representa una tarea discreta que debe manejarse independientemente. Los mensajes en la cola se consumen en orden, y su consumo es típicamente destructivo, lo que significa que una vez que se procesa un mensaje, se elimina de la cola.

Características clave de las colas de mensajes:

Comunicación asíncrona: Los productores pueden enviar mensajes sin requerir que los consumidores estén listos simultáneamente. Al igual que pedir un café, no necesitas esperar mientras se prepara.
Primero en entrar, primero en salir (FIFO): Los mensajes se procesan en el orden en que son recibidos, lo cual es crucial para operaciones que dependen de una secuencia estricta, como las transacciones bancarias. Algunas colas pueden permitir procesamiento no FIFO, según la configuración.
Durabilidad: Los mensajes se almacenan de manera confiable hasta que un consumidor los procesa. Esto asegura que no se pierdan mensajes, incluso si hay fallos en el sistema.
Entrega exclusiva: Cada mensaje es consumido por solo una instancia de consumidor, asegurándose de que no haya procesamiento duplicado. Los mensajes se eliminan una vez que el consumidor los reconoce.

Casos de uso comunes para las colas:

Las colas de mensajes son ideales para escenarios que requieren procesamiento en paralelo y escalabilidad. Ejemplos incluyen:

Gestión de inventario: Seguimiento y actualización de niveles de stock en tiempo real.
Sistemas de salud: Gestión del flujo de pacientes y programación de citas.
Operaciones de restaurantes: Manejo de pedidos y reservas de clientes.

¿Qué son los mensajes de transmisión?

Ahora, imagina un concierto en vivo donde la música fluye continuamente y el público la experimenta en tiempo real. Los mensajes de transmisión se centran en un flujo continuo de datos y procesamiento en tiempo real.

Características clave de los mensajes de transmisión:

Procesamiento en tiempo real: Los mensajes de transmisión se consumen inmediatamente a medida que se producen, al igual que escuchar música en un servicio de streaming.
Arquitectura impulsada por eventos: Los datos se envían a los consumidores tan pronto como están disponibles, permitiendo reacciones instantáneas. Por ejemplo, los feeds de redes sociales se actualizan dinámicamente con nuevas publicaciones, likes y comentarios.
Escalabilidad: Los sistemas de transmisión pueden procesar volúmenes masivos de datos, lo que los hace adecuados para análisis en tiempo real, monitoreo y aprendizaje automático.
Retención de mensajes: Los mensajes se almacenan por un período específico y pueden ser reproducidos para procesamiento por lotes o recuperación de errores. La retención se basa en el tiempo (por ejemplo, 7 días) o el tamaño (por ejemplo, 1GB por partición).

Casos de uso comunes para la transmisión:

La transmisión es integral para la vida moderna, impulsando aplicaciones como:

Monitoreo de precios de acciones: Proveer actualizaciones en tiempo real a los traders.
Detección de fraude: Identificar actividad sospechosa instantáneamente.
Análisis de servicio al cliente: Seguimiento de interacciones y sentimientos en tiempo real.

¿Por qué usar colas en Apache Kafka?

En Confluent, nuestro objetivo es hacer de Apache Kafka una solución universal para diversas cargas de trabajo de datos, eliminando la dependencia de sistemas propietarios. Los sistemas de mensajería tradicionales a menudo requieren que los usuarios elijan entre orden y velocidad. Kafka ahora cierra esta brecha introduciendo soporte para colas, ofreciendo a los usuarios flexibilidad para procesar mensajes de manera secuencial o concurrente.

Esta adición mejora la versatilidad de Kafka, permitiéndole soportar tanto flujos de trabajo basados en transmisión como en colas, satisfaciendo así un rango más amplio de casos de uso.

¿Cómo se soportan las colas en Apache Kafka?

Kafka emplea una arquitectura basada en logs donde cada mensaje se le asigna un desplazamiento único. Los consumidores leen mensajes de manera secuencial, asegurando tolerancia a fallos y permitiendo la re reproducción de mensajes. Con el nuevo modelo híbrido, Kafka combina los beneficios de las colas tradicionales y su diseño basado en logs:

Procesamiento en paralelo: Los mensajes pueden ser consumidos por múltiples consumidores simultáneamente.
Capacidad de re reproducción: Los mensajes pueden ser reproducidos para recuperación o reprocesamiento.
Alto rendimiento: Kafka mantiene su escalabilidad y fiabilidad mientras habilita el procesamiento desordenado cuando es necesario.

Grupos de consumidores vs. Grupos compartidos en Kafka

En Kafka, los grupos de consumidores gestionan cómo se consumen los datos de los temas. Cada grupo de consumidores comprende múltiples consumidores trabajando juntos para leer de las particiones de un tema. Existe una relación 1:1 entre particiones y consumidores dentro de un grupo. Sin embargo, la escalabilidad puede volverse ineficiente cuando el número de consumidores supera el número de particiones.

Los grupos compartidos ofrecen un enfoque más flexible, especialmente para cargas de trabajo que se asemejan a los sistemas de colas tradicionales. Permiten que múltiples consumidores lean de las mismas particiones, habilitando un control más detallado sobre el intercambio y procesamiento de datos.

Características clave de los grupos compartidos incluyen:

Lectura concurrente: Múltiples consumidores en un grupo compartido pueden leer de la misma partición.
Escalado dinámico: Se pueden añadir más consumidores para manejar picos de carga sin necesidad de repartir particiones.
Reconocimientos individuales: Los mensajes se reconocen uno a uno, optimizando el procesamiento por lotes al tiempo que permite la re entrega de mensajes no procesados.
Consumo independiente: Los consumidores en diferentes grupos compartidos pueden acceder a los mismos temas sin interferencias.

¿Garantiza el grupo compartido el orden?

No completamente. Dentro de un lote, los registros están en orden por desplazamiento, pero el orden entre lotes no está garantizado. Por ejemplo, si un consumidor falla a mitad de un lote, otro consumidor puede procesar mensajes subsecuentes primero, llevando a una entrega desordenada entre lotes.

Ejemplo del mundo real: Evento de ventas minoristas

Considera un minorista organizando un gran evento de ventas. El sistema de caja debe manejar una avalancha de pedidos de manera eficiente. Con grupos compartidos:

Procesamiento en paralelo: Los pedidos se distribuyen entre múltiples trabajadores para procesamiento concurrente.
Asignación de recursos dinámica: El sistema puede añadir consumidores durante picos y disminuir durante periodos de baja demanda.
Procesamiento eficiente: Los pedidos se procesan rápidamente sin requerir una secuenciación estricta.

Esta flexibilidad permite que el sistema se adapte sin problemas a cargas de trabajo fluctuantes, asegurando la satisfacción del cliente y la optimización de recursos.

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.