KAFKA

1

A Kafka producer application wants to send log messages to a topic that does not include any key. What are the properties that are mandatory to configure for the producer configuration? (select three)

key

partition

key.serializer

value

value.serializer

bootstrap.servers

key.serializer , value.serializer

bootstrap.servers


Explanation
Both key and value serializer are mandatory.

2

Select all that applies

min.insync.replicas is a topic setting

min.insync.replicas only matters if acks=all

acks is a producer setting

acks is a topic setting

min.insync.replicas is a producer setting

min.insync.replicas matters regardless of the values of acks

min.insync.replicas is a topic setting

min.insync.replicas only matters if acks=all

acks is a producer setting


Explanation
acks is a producer setting min.insync.replicas is a topic or broker setting and is only effective when acks=all

3

How will you read all the messages from a topic in your KSQL query?

Use KSQL CLI to set auto.offset.reset property to earliest

KSQL reads from the beginning of a topic, by default.

KSQL reads from the end of a topic. This cannot be changed.

Use KSQL CLI to set auto.offset.reset property to earliest


Explanation
Consumers can set auto.offset.reset property to earliest to start consuming from beginning. For KSQL, SET ‘auto.offset.reset’=’earliest’;

4

What isn’t a feature of the Confluent schema registry?

Store avro data

Store schemas

Enforce compatibility rules

Store avro data


Explanation
Data is stored on brokers.

5

is KSQL ANSI SQL compliant?

No

Yes

No


Explanation
KSQL is not ANSI SQL compliant, for now there are no defined standards on streaming SQL languages

6

A consumer starts and has auto.offset.reset=latest, and the topic partition currently has data for offsets going from 45 to 2311. The consumer group has committed the offset 643 for the topic before. Where will the consumer read from?

it will crash

offset 45

offset 2311

offset 643

offset 643


Explanation
The offsets are already committed for this consumer group and topic partition, so the property auto.offset.reset is ignored

7

If a topic has a replication factor of 3…

Each partition will live on 2 different brokers

Each partition will live on 4 different brokers

Each partition will live on 3 different brokers

3 replicas of the same data will live on 1 broker

Each partition will live on 3 different brokers


Explanation
Replicas are spread across available brokers, and each replica = one broker. RF 3 = 3 brokers

8

A customer has many consumer applications that process messages from a Kafka topic. Each consumer application can only process 50 MB/s. Your customer wants to achieve a target throughput of 1 GB/s. What is the minimum number of partitions will you suggest to the customer for that particular topic?

20

50

1

10

20


Explanation
each consumer can process only 50 MB/s, so we need at least 20 consumers consuming one partition so that 50 * 20 = 1000 MB target is achieved.

9

In Avro, adding a field to a record without default is a __ schema evolution

backward

forward

breaking

full

forward


Explanation
Clients with old schema will be able to read records saved with new schema.

10

A kafka topic has a replication factor of 3 and min.insync.replicas setting of 1. What is the maximum number of brokers that can be down so that a producer with acks=all can still produce to the topic?

0

2

1

3

2


Explanation
Two brokers can go down, and one replica will still be able to receive and serve data

11

You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3. How many tasks will you configure for the S3 connector?

6

10

3

2

2


Explanation
You cannot have more sink tasks (= consumers) than the number of partitions, so 2.

12

There are 3 producers writing to a topic with 5 partitions. There are 5 consumers consuming from the topic. How many Controllers will be present in the cluster?

1

5

2

3

1


Explanation
There is only one controller in a cluster at all times.

13

Which of the following Kafka Streams operators are stateful? (select all that apply)

count

flatmap

aggregate

peek

reduce

joining

count , aggregate , joining

14

A Zookeeper ensemble contains 3 servers. Over which ports the members of the ensemble should be able to communicate in default configuration? (select three)

443

3888

80

2888

9092

2181

3888 , 2888 , 2181


Explanation
2181 – client port, 2888 – peer port, 3888 – leader port

15

If I produce to a topic that does not exist, and the broker setting auto.create.topic.enable=true, what will happen?

Kafka will automatically create the topic with the indicated producer settings num.partitions and default.replication.factor

Kafka will automatically create the topic with 1 partition and 1 replication factor

Kafka will automatically create the topic with num.partitions=#of brokers and replication.factor=3

Kafka will automatically create the topic with the broker settings num.partitions and default.replication.factor

Kafka will automatically create the topic with the broker settings num.partitions and default.replication.factor


Explanation
The broker settings comes into play when a topic is auto created

16

Select all the way for one consumer to subscribe simultaneously to the following topics – topic.history, topic.sports, topic.politics? (select two)

consumer.subscribePrefix(“topic.”);

consumer.subscribe(Arrays.asList(“topic.history”, “topic.sports”, “topic.politics”));

consumer.subscribe(“topic.history”); consumer.subscribe(“topic.sports”); consumer.subscribe(“topic.politics”);

consumer.subscribe(Pattern.compile(“topic..*”));

consumer.subscribe(Arrays.asList(“topic.history”, “topic.sports”, “topic.politics”)); consumer.subscribe(Pattern.compile(“topic..*”));

Explanation
Multiple topics can be passed as a list or regex pattern.

17

How will you find out all the partitions where one or more of the replicas for the partition are not in-sync with the leader?

kafka-topics.sh –bootstrap-server localhost:9092 –describe –unavailable- partitions

kafka-topics.sh –broker-list localhost:9092 –describe –under-replicated-partitions

kafka-topics.sh –zookeeper localhost:2181 –describe –unavailable- partitions

kafka-topics.sh –zookeeper localhost:2181 –describe –under-replicated-partitions

kafka-topics.sh –zookeeper localhost:2181 –describe –under-replicated-partitions

18

If I want to send binary data through the REST proxy to topic “test_binary”, it needs to be base64 encoded. A consumer connecting directly into the Kafka topic “test_binary” will receive

json data

binary data

base64 encoded data, it will need to decode it

avro data
Explanation
On the producer side, after receiving base64 data, the REST Proxy will convert it into bytes and then send that bytes payload to Kafka. Therefore consumers reading directly from Kafka will receive binary data.

binary data

19

You want to perform table lookups against a KTable everytime a new record is received from the KStream. What is the output of KStream-KTable join?

KTable

KStream

You choose between KStream or KTable

GlobalKTable

KStream


Explanation
Here KStream is being processed to create another KStream.

20

What is the risk of increasing max.in.flight.requests.per.connection while also enabling retries in a producer?

Reduce throughput

Less resilient

At least once delivery is not guaranteed

Message order not preserved

Message order not preserved


Explanation
Some messages may require multiple retries. If there are more than 1 requests in flight, it may result in messages received out of order. Note an exception to this rule is if you enable the producer setting: enable.idempotence=true which takes care of the out of ordering case on its own.

21

There are two consumers C1 and C2 belonging to the same group G subscribed to topics T1 and T2. Each of the topics has 3 partitions. How will the partitions be assigned to consumers with PartitionAssignor being RoundRobinAssignor?

Two consumers cannot read from two topics at the same time

All consumers will read from all partitions

C1 will be assigned partitions 0 and 2 from T1 and partition 1 from T2. C2 will have partition 1 from T1 and partitions 0 and 2 from T2.

C1 will be assigned partitions 0 and 1 from T1 and T2, C2 will be assigned partition 2 from T1 and T2.

C1 will be assigned partitions 0 and 2 from T1 and partition 1 from T2. C2 will have partition 1 from T1 and partitions 0 and 2 from T2.


Explanation
The correct option is the only one where the two consumers share an equal number of partitions amongst the two topics of three partitions.

22

If I want to have an extremely high confidence that leaders and replicas have my data, I should use

acks=1, replication factor=3, min.insync.replicas=2

acks=all, replication factor=3, min.insync.replicas=2

acks=all, replication factor=2, min.insync.replicas=1

acks=all, replication factor=3, min.insync.replicas=1

acks=all, replication factor=3, min.insync.replicas=2


Explanation
acks=all means the leader will wait for all in-sync replicas to acknowledge the record. Also the min in-sync replica setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes.

23

Which KSQL queries write to Kafka?

SHOW STREAMS and EXPLAIN statements

COUNT and JOIN

CREATE STREAM WITH and CREATE TABLE WITH

CREATE STREAM AS SELECT and CREATE TABLE AS SELECT

SELECT … FROM foo WHERE ….

COUNT and JOIN

CREATE STREAM WITH and CREATE TABLE WITH

CREATE STREAM AS SELECT and CREATE TABLE AS SELECT


Explanation
SHOW STREAMS and EXPLAIN statements run against the KSQL server that the KSQL client is connected to. They don’t communicate directly with Kafka. CREATE STREAM WITH and CREATE TABLE WITH write metadata to the KSQL command topic. Persistent queries based on CREATE STREAM AS SELECT and CREATE TABLE AS SELECT read and write to Kafka topics. Non-persistent queries based on SELECT that are stateless only read from Kafka topics, for example SELECT … FROM foo WHERE …. Non-persistent queries that are stateful read and write to Kafka, for example, COUNT and JOIN. The data in Kafka is deleted automatically when you terminate the query with CTRL-C.

24

I am producing Avro data on my Kafka cluster that is integrated with the Confluent Schema Registry. After a schema change that is incompatible, I know my data will be rejected. Which component will reject the data?

Zookeeper

The Confluent Schema Registry

The Kafka Producer itself

The Kafka Broker

The Confluent Schema Registry


Explanation
The Confluent Schema Registry is your safeguard against incompatible schema changes and will be the component that ensures no breaking schema evolution will be possible. Kafka Brokers do not look at your payload and your payload schema, and therefore will not reject data

25

We would like to be in an at-most once consuming scenario. Which offset commit strategy would you recommend?

Commit the offsets in Kafka, before processing the data

Commit the offsets on disk, after processing the data

Commit the offsets in Kafka, after processing the data

Do not commit any offsets and read from beginning

Commit the offsets in Kafka, before processing the data


Explanation
Here, we must commit the offsets right after receiving a batch from a call to .poll()

26

The exactly once guarantee in the Kafka Streams is for which flow of data?

Kafka => External

Kafka => Kafka

External => Kafka

Kafka => Kafka


Explanation
Kafka Streams can only guarantee exactly once processing if you have a Kafka to Kafka topology.

27

You have a Kafka cluster and all the topics have a replication factor of 3. One intern at your company stopped a broker, and accidentally deleted all the data of that broker on the disk. What will happen if the broker is restarted?

The broker will start, and won’t be online until all the data it needs to have is replicated from other leaders

The broker will start, and other topics will also be deleted as the broker data on the disk got deleted

The broker will crash

The broker will start, and won’t have any data. If the broker comes leader, we have a data loss

The broker will start, and won’t be online until all the data it needs to have is replicated from other leaders


Explanation
Kafka replication mechanism makes it resilient to the scenarios where the broker lose data on disk, but can recover from replicating from other brokers. This makes Kafka amazing!

28

What information isn’t stored inside of Zookeeper? (select two)

Schema Registry schemas

Broker registration info

Consumer offset

ACL inforomation

Controller registration

Schema Registry schemas , Consumer offset


Explanation
Consumer offsets are stored in a Kafka topic __consumer_offsets, and the Schema Registry stored schemas in the _schemas topic.

29

To import data from external databases, I should use

Kafka Connect Sink

Kafka Streams

Confluent REST Proxy

Kafka Connect Source

Kafka Connect Source


Explanation
Kafka Connect Sink is used to export data from Kafka to external databases and Kafka Connect Source is used to import from external databases into Kafka.

30

Which of the following errors are retriable from a producer perspective? (select two)

NOT_LEADER_FOR_PARTITION

MESSAGE_TOO_LARGE

TOPIC_AUTHORIZATION_FAILED

INVALID_REQUIRED_ACKS

NOT_ENOUGH_REPLICAS

NOT_LEADER_FOR_PARTITION , NOT_ENOUGH_REPLICAS


Explanation
Both of these are retriable errors, others non-retriable errors.
See the full list of errors and their “retriable”

31

Where are the dynamic configurations for a topic stored?

On the Kafka broker file system

In Zookeeper

In an internal Kafka topic __topic_configuratins

In server.properties

In Zookeeper


Explanation
Dynamic topic configurations are maintained in Zookeeper.

32

A consumer application is using KafkaAvroDeserializer to deserialize Avro messages. What happens if message schema is not present in AvroDeserializer local cache?

Fails silently

Fetches schema from Schema Registry

Throws DeserializationException

Throws SerializationException

Fetches schema from Schema Registry


Explanation
First local cache is checked for the message schema. In case of cache miss, schema is pulled from the schema registry. An exception will be thrown in the Schema Registry does not have the schema (which should never happen if you set it up properly)

33

Your producer is producing at a very high rate and the batches are completely full each time. How can you improve the producer throughput? (select two)

Disable compression

Enable compression

Decrease batch.size

Decrease linger.ms

Increase batch.size

Increase linger.ms

Enable compression , Increase batch.size


Explanation
batch.size controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available memory. Enabling compression can also help make more compact batches and increase the throughput of your producer. Linger.ms will have no effect as the batches are already full

34

To enhance compression, I can increase the chances of batching by using

linger.ms=20

acks=all

max.message.size=10MB

batch.size=65536

linger.ms=20


Explanation
linger.ms forces the producer to wait before sending messages, hence increasing the chance of creating batches that can be heavily compressed.

35

Which of the following setting increases the chance of batching for a Kafka Producer?

Increase linger.ms

Increase the number of producer threads

Increase message.max.bytes

Increase batch.size

Increase linger.ms


Explanation
linger.ms forces the producer to wait to send messages, hence increasing the chance of creating batches

36

A topic receives all the orders for the products that are available on a commerce site. Two applications want to process all the messages independently – order fulfilment and monitoring. The topic has 4 partitions, how would you organise the consumers for optimal performance and resource usage?

Create four consumers in the same group, one for each partition – two for fulfilment and two for monitoring

Create two consumer groups for two applications with 4 consumers in each

Create two consumers groups for two applications with 8 consumers in each

Create 8 consumers in the same group with 4 consumers for each application

Create two consumer groups for two applications with 4 consumers in each


Explanation
two partitions groups – one for each application so that all messages are delivered to both the application. 4 consumers in each as there are 4 partitions of the topic, and you cannot have more consumers per groups than the number of partitions (otherwise they will be inactive and wasting resources)

37

What isn’t an internal Kafka Connect topic?

connect-jars

connect-configs

connect-status

connect-offsets

connect-jars


Explanation
connect-configs stores configurations, connect-status helps to elect leaders for connect, and connect-offsets store source offsets for source connectors

38

How do Kafka brokers ensure great performance between the producers and consumers? (select two)

It leverages zero-copy optimisations to send data straight from the page-cache

It transforms the messages into a binary format

It compresses the messages as it writes to the disk

It buffers the messages on disk, and sends messages from the disk reads

It does not transform the messages

It leverages zero-copy optimisations to send data straight from the page-cache•
It does not transform the messages


Explanation
Kafka transfers data with zero-copy and sends the raw bytes it receives from the producer straight to the consumer, leveraging the RAM available as page cache

39

You are using JDBC source connector to copy data from 2 tables to two Kafka topics. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers. How many tasks are launched?

6

2

3

1

2


Explanation
we have two tables, so the max number of tasks is 2

40

A client connects to a broker in the cluster and sends a fetch request for a partition in a topic. It gets an exception NotLeaderForPartitionException in the response. How does client handle this situation?

Get the Broker id from Zookeeper that is hosting the leader replica and send request to it

Send fetch request to each Broker in the cluster

Send metadata request to the same broker for the topic and select the broker hosting the leader replica

Send metadata request to Zookeeper for the topic and select the broker hosting the leader replica

Send metadata request to the same broker for the topic and select the broker hosting the leader replica


Explanation
In case the consumer has the wrong leader of a partition, it will issue a metadata request. The Metadata request can be handled by any node, so clients know afterwards which broker are the designated leader for the topic partitions. Produce and consume requests can only be sent to the node hosting partition leader.

41

What client protocol is supported for the schema registry? (select two)

HTTP

Websocket

HTTPS

SASL

JDBC

HTTP , HTTPS


Explanation
clients can interact with the schema registry using the HTTP or HTTPS interface

42

A bank uses a Kafka cluster for credit card payments. What should be the value of the property unclean.leader.election.enable?

TRUE

FALSE

FALSE


Explanation
Setting unclean.leader.election.enable to true means we allow out-of-sync replicas to become leaders, we will lose messages when this occurs, effectively losing credit card payments and making our customers very angry.

43

Which is an optional field in an Avro record?

namespace

fields

name

doc

doc


Explanation
doc represents optional description of message

44

Using the Confluent Schema Registry, where are Avro schema stored?

In the Schema Registry embedded SQL database

In the Zookeeper node /schemas

In the _schemas topic

In the message bytes themselves

In the _schemas topic


Explanation
The Schema Registry stores all the schemas in the _schemas Kafka topic

45

A consumer wants to read messages from a specific partition of a topic. How can this be achieved?

Call subscribe(String topic, int partition) passing the topic and partition number as the arguments

Call assign() passing a Collection of TopicPartitions as the argument

Call subscribe() passing TopicPartition as the argument

Call assign() passing a Collection of TopicPartitions as the argument


Explanation
assign() can be used for manual assignment of a partition to a consumer, in which case subscribe() must not be used. Assign() takes a collection of TopicPartition object as an argument

46

Producing with a key allows to…

Influence partitioning of the producer messages

Ensure per-record level security

Add more information to my message

Allow a Kafka Consumer to subscribe to a (topic,key) pair and only receive that data

Influence partitioning of the producer messages


Explanation
Keys are necessary if you require strong ordering or grouping for messages that share the same key. If you require that messages with the same key are always seen in the correct order, attaching a key to messages will ensure messages with the same key always go to the same partition in a topic. Kafka guarantees order within a partition, but not across partitions in a topic, so alternatively not providing a key – which will result in round-robin distribution across partitions – will not maintain such order.

47

You are running a Kafka Streams application in a Docker container managed by Kubernetes, and upon application restart, it takes a long time for the docker container to replicate the state and get back to processing the data. How can you improve dramatically the application restart?

Increase the number of partitions in your inputs topic

Reduce the Streams caching property

Increase the number of Streams threads

Mount a persistent volume for your RocksDB

Mount a persistent volume for your RocksDB


Explanation
Although any Kafka Streams application is stateless as the state is stored in Kafka, it can take a while and lots of resources to recover the state from Kafka. In order to speed up recovery, it is advised to store the Kafka Streams state on a persistent volume, so that only the missing part of the state needs to be recovered.

48

Where are the ACLs stored in a Kafka cluster by default?

Inside the broker’s data directory

Under Zookeeper node /kafka-acl/

Inside the Zookeeper’s data directory

In Kafka topic __kafka_acls

Under Zookeeper node /kafka-acl/


Explanation
ACLs are stored in Zookeeper node /kafka-acls/ by default.

49

Which of the following event processing application is stateless? (select two)

Find the minimum and maximum stock prices for each day of trading

Read events from a stream and modifies them from JSON to Avro

Read log messages from a stream and writes ERROR events into a high-priority stream and the rest of the events into a low-priority stream

Publish the top 10 stocks each day

Read events from a stream and modifies them from JSON to Avro

Read log messages from a stream and writes ERROR events into a high-priority stream and the rest of the events into a low-priority stream


Explanation
Stateless means processing of each message depends only on the message, so converting from JSON to Avro or filtering a stream are both stateless operations

50

To get acknowledgement of writes to only the leader partition, we need to use the config…

acks=0

acks=all

acks=1

acks=1


Explanation
Producers can set acks=1 to get acknowledgement from partition leader only.