Filas de Mensagens vs. Sistemas de Streaming: Diferenças Principais e Casos de Uso

Oleksii K. Engenheiro DevOps

Add to my AI research

No mundo do processamento de dados e sistemas de mensagens, termos como “fila” e “streaming” frequentemente aparecem. Embora possam soar semelhantes, eles servem a propósitos distintos e podem impactar significativamente a forma como os sistemas lidam com os dados. Vamos explicar suas diferenças de uma forma simples.

O Que São Filas de Mensagens?

Imagine uma cafeteria onde os clientes fazem pedidos online ou pessoalmente. Uma vez que um pedido é processado, o cliente é notificado para pegá-lo. Nessa analogia, os pedidos funcionam como mensagens em uma fila, e o barista os processa um de cada vez, removendo cada pedido da fila uma vez concluído. Isso é essencialmente como uma fila de mensagens opera.

Cada mensagem representa uma tarefa discreta a ser manipulada de forma independente. As mensagens na fila são consumidas na ordem, e seu consumo é tipicamente destrutivo, o que significa que, uma vez processada, a mensagem é excluída da fila.

Características Principais das Filas de Mensagens:

Comunicação Assíncrona:Produtores podem enviar mensagens sem requerer que os consumidores estejam prontos simultaneamente. Como pedir café, você não precisa ficar esperando enquanto ele é preparado.
Primeiro a Entrar, Primeiro a Sair (FIFO):As mensagens são processadas na ordem em que são recebidas, o que é crucial para operações que dependem de sequenciamento estrito, como transações bancárias. Algumas filas podem permitir processamento não FIFO, dependendo da configuração.
Durabilidade:As mensagens são armazenadas de forma confiável até que um consumidor as processe. Isso garante que nenhuma mensagem se perca, mesmo em caso de falha do sistema.
Entrega Exclusiva:Cada mensagem é consumida por apenas uma instância de consumidor, garantindo que não haja processamento duplicado. As mensagens são excluídas uma vez que sejam confirmadas pelo consumidor.

Casos de Uso Comuns para Filas:

As filas de mensagens são ideais para cenários que exigem processamento paralelo e escalabilidade. Exemplos incluem:

Gestão de Inventário:Acompanhamento e atualização de níveis de estoque em tempo real.
Sistemas de Saúde:Gestão do fluxo de pacientes e agendamento de compromissos.
Operações de Restaurante:Gerenciamento de pedidos e reservas de clientes.

O Que São Mensagens de Streaming?

Agora, imagine um concerto ao vivo onde a música flui continuamente, e o público a experimenta em tempo real. Mensagens de streaming se concentram em um fluxo contínuo de dados e processamento em tempo real.

Características Principais das Mensagens de Streaming:

Processamento em Tempo Real:As mensagens de streaming são consumidas imediatamente à medida que são produzidas, assim como ouvir música em um serviço de streaming.
Arquitetura Dirigida por Eventos:Os dados são enviados aos consumidores assim que estão disponíveis, permitindo reações instantâneas. Por exemplo, feeds de redes sociais atualizam dinamicamente com novas postagens, curtidas e comentários.
Escalabilidade:Os sistemas de streaming podem processar grandes volumes de dados, tornando-os adequados para análises em tempo real, monitoramento e aprendizado de máquina.
Retenção de Mensagens:As mensagens são armazenadas por um período especificado e podem ser reproduzidas para processamento em lote ou recuperação de erros. A retenção é baseada no tempo (por exemplo, 7 dias) ou no tamanho (por exemplo, 1GB por partição).

Casos de Uso Comuns para Streaming:

O streaming é integral na vida moderna, alimentando aplicações como:

Monitoramento de Preços de Ações:Fornecimento de atualizações em tempo real para comerciantes.
Detecção de Fraude:Identificação instantânea de atividades suspeitas.
Análise de Atendimento ao Cliente:Monitoramento de interações e sentimento em tempo real.

Por Que Usar Filas no Apache Kafka?

Na Confluent, nosso objetivo é tornar o Apache Kafka uma solução universal para cargas de dados diversas, eliminando a dependência de sistemas proprietários. Sistemas de mensagens tradicionais frequentemente exigem que os usuários escolham entre ordem e velocidade. O Kafka agora preenche essa lacuna, introduzindo suporte a filas, oferecendo aos usuários flexibilidade ao processar as mensagens, tanto sequencial quanto simultaneamente.

Essa adição aumenta a versatilidade do Kafka, permitindo-o suportar tanto fluxos de trabalho baseados em streaming quanto em filas, atendendo assim um espectro mais amplo de casos de uso.

Como as Filas São Suportadas no Apache Kafka?

O Kafka emprega uma arquitetura baseada em log onde cada mensagem recebe um deslocamento único. Os consumidores leem as mensagens sequencialmente, garantindo tolerância a falhas e permitindo a reprodução de mensagens. Com o novo modelo híbrido, o Kafka combina os benefícios das filas tradicionais e seu design baseado em log:

Processamento Paralelo:As mensagens podem ser consumidas por múltiplos consumidores simultaneamente.
Capacidade de Reproduzir:As mensagens podem ser reproduzidas para recuperação ou reprocessamento.
Alta Taxa de Transferência:O Kafka mantém sua escalabilidade e confiabilidade enquanto possibilita o processamento fora de ordem quando necessário.

Grupos de Consumidores vs. Grupos de Compartilhamento no Kafka

No Kafka, grupos de consumidores gerenciam como os dados são consumidos dos tópicos. Cada grupo de consumidores inclui múltiplos consumidores trabalhando juntos para ler as partições de um tópico. Existe uma relação 1:1 entre partições e consumidores dentro de um grupo. No entanto, o dimensionamento pode se tornar ineficiente quando o número de consumidores excede o número de partições.

Os grupos de compartilhamento oferecem uma abordagem mais flexível, especialmente para cargas de trabalho que se assemelham a sistemas de fila tradicionais. Eles permitem que múltiplos consumidores leiam das mesmas partições, possibilitando um controle mais granular sobre o compartilhamento e processamento de dados.

Características Principais dos Grupos de Compartilhamento:

Leitura Concorrente:Vários consumidores em um grupo de compartilhamento podem ler da mesma partição.
Escalabilidade Dinâmica:Mais consumidores podem ser adicionados para lidar com picos de carga sem precisar reparticionar tópicos.
Confirmações Individuais:As mensagens são confirmadas uma por uma, otimizando o processamento em lote enquanto permitem o reenvio de mensagens não processadas.
Consumo Independente:Consumidores em diferentes grupos de compartilhamento podem acessar os mesmos tópicos sem interferência.

O Grupo de Compartilhamento Garante a Ordem?

Nem sempre. Dentro de um lote, os registros estão em ordem pelo deslocamento, mas a ordem entre lotes não é garantida. Por exemplo, se um consumidor falhar no meio de um lote, outro consumidor pode processar mensagens subsequentes primeiro, levando a uma entrega fora de ordem entre lotes.

Exemplo do Mundo Real: Evento de Vendas no Varejo

Considere um varejista organizando um grande evento de vendas. O sistema de checkout deve lidar com um aumento de pedidos de forma eficiente. Com grupos de compartilhamento:

Processamento Paralelo:Os pedidos são distribuídos entre múltiplos trabalhadores para processamento concorrente.
Alocação Dinâmica de Recursos:O sistema pode adicionar consumidores durante os picos e diminuir durante períodos tranquilos.
Processamento Eficiente:Os pedidos são processados rapidamente sem exigir sequenciamento estrito.

Essa flexibilidade permite que o sistema se adapte perfeitamente a cargas de trabalho flutuantes, garantindo a satisfação do cliente e a otimização dos recursos.

Name	Descripiton
PHPSESSID	Preserves user session state across page requests. Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
sp_i	Used to store information about authenticated User.
sp_r	Used to store information about authenticated User.
sp_a	Used to store information about authenticated User.

Name	Descripiton
tuuid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
tuuid_last_update	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
um	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
umeh	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded.
na_sc_x	Used by the social sharing platform AddThis to keep a record of parts of the site that has been visited in order to recommend other parts of the site.
APID	Collects anonymous data related to the user's visits to the website.
IDSYNC	Collects anonymous data related to the user's visits to the website.
_cc_aud	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_cc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_dc	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
_cc_id	Collects anonymous statistical data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The purpose is to segment the website's users according to factors such as demographics and geographical location, in order to enable media and marketing agencies to structure and understand their target groups to enable customised online advertising.
dpm	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
acs	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
clid	Collects anonymous data related to the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been loaded, with the purpose of displaying targeted ads.
KRTBCOOKIE_#	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PUBMDCID	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
PugT	Registers a unique ID that identifies the user's device during return visits across websites that use the same ad network. The ID is used to allow targeted ads.
ssi	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
_tmid	Registers a unique ID that identifies the user's device upon return visits. The ID is used to target ads in video clips.
wam-sync	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
wui	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
AFFICHE_W	Used by the advertising platform Weborama to determine the visitor's interests based on pages visits, content clicked and other actions on the website.
B	Collects anonymous data related to the user's website visits, such as the number of visits, average time spent on the website and what pages have been loaded. The registered data is used to categorise the users' interest and demographical profiles with the purpose of customising the website content depending on the visitor.
1P_JAR	These cookies are used to gather website statistics, and track conversion rates.
APISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
HSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
NID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SAPISID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
SIDCC	Security cookie to protect users data from unauthorised access.
SSID	Google set a number of cookies on any page that includes a Google reCAPTCHA. While we have no control over the cookies set by Google, they appear to include a mixture of pieces of information to measure the number and behaviour of Google reCAPTCHA users.
__utmx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.
__utmxx	This cookie is associated with Google Website Optimizer, a tool designed to help site owners improve their wbesites. It is used to distinguish between two varaitions a webpage that might be shown to a visitor as part of an A/B split test. This helps site owners to detemine which version of a page performs better, and therefore helps to improve the website.

Name	Descripiton
_hjid	Hotjar cookie. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInSample	This cookie is associated with web analytics functionality and services from Hot Jar, a Malta based company. It uniquely identifies a visitor during a single browser session and indicates they are included in an audience sample.
intercom-id-[xxx]	This cookie is used by Intercom as a session so that users can continue a chat as they move through the site.
intercom-session-[xxx]	Used to keeping track of sessions and remember logins and conversations.
demdex	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
CookieConsent	Stores the user's cookie consent state for the current domain.
__cfduid	Used by the content network, Cloudflare, to identify trusted web traffic.
ss	These cookies enable the website to provide enhanced functionality and personalisation . They may be set by us or by third party providers whose services we have added to our pages. These services may include the Live Chat facility, Contact Us form(s), the Product Quotation forms and submission process, and the Email Newsletter sign up functionality .

Name	Descripiton
_ga	This cookie name is asssociated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. This cookie is used to distinguish unique users by assigning a randomly generated number as a client identifier. It is included in each page. Registers a unique ID that is used to generate statistical data on how the visitor uses the website. request in a site and used to calculate visitor, session and campaign data for the sites analytics reports. By default it is set to expire after 2 years, although this is customisable by website owners.
_gat	Used by Google Analytics to throttle request rate. This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 10 minutes.
_gid	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
IDE	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
r/collect	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	Used to check if the user's browser supports cookies.
collect	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
ads/user-lists/#	These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.
c	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
khaos	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
put_#	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpb	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
rpx	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.
tap.php	Registers anonymised user data, such as IP address, geographical location, visited websites, and what ads the user has clicked, with the purpose of optimising ad display based on the user's movement on websites that use the same ad network.