1
A Kafka producer application wants to send log messages to a topic that does not include any key. What are the properties that are mandatory to configure for the producer configuration? (select three)
•
key
•
partition
•
key.serializer
•
value
•
value.serializer
•
bootstrap.servers
key.serializer , value.serializer
•
bootstrap.servers
Explanation
Both key and value serializer are mandatory.
2
Select all that applies
•
min.insync.replicas is a topic setting
•
min.insync.replicas only matters if acks=all
•
acks is a producer setting
•
acks is a topic setting
•
min.insync.replicas is a producer setting
•
min.insync.replicas matters regardless of the values of acks
min.insync.replicas is a topic setting
•
min.insync.replicas only matters if acks=all
•
acks is a producer setting
Explanation
acks is a producer setting min.insync.replicas is a topic or broker setting and is only effective when acks=all
3
How will you read all the messages from a topic in your KSQL query?
•
Use KSQL CLI to set auto.offset.reset property to earliest
•
KSQL reads from the beginning of a topic, by default.
•
KSQL reads from the end of a topic. This cannot be changed.
Use KSQL CLI to set auto.offset.reset property to earliest
Explanation
Consumers can set auto.offset.reset property to earliest to start consuming from beginning. For KSQL, SET ‘auto.offset.reset’=’earliest’;
4
What isn’t a feature of the Confluent schema registry?
•
Store avro data
•
Store schemas
•
Enforce compatibility rules
Store avro data
Explanation
Data is stored on brokers.
5
is KSQL ANSI SQL compliant?
•
No
•
Yes
No
Explanation
KSQL is not ANSI SQL compliant, for now there are no defined standards on streaming SQL languages
6
A consumer starts and has auto.offset.reset=latest, and the topic partition currently has data for offsets going from 45 to 2311. The consumer group has committed the offset 643 for the topic before. Where will the consumer read from?
•
it will crash
•
offset 45
•
offset 2311
•
offset 643
offset 643
Explanation
The offsets are already committed for this consumer group and topic partition, so the property auto.offset.reset is ignored
7
If a topic has a replication factor of 3…
•
Each partition will live on 2 different brokers
•
Each partition will live on 4 different brokers
•
Each partition will live on 3 different brokers
•
3 replicas of the same data will live on 1 broker
Each partition will live on 3 different brokers
Explanation
Replicas are spread across available brokers, and each replica = one broker. RF 3 = 3 brokers
8
A customer has many consumer applications that process messages from a Kafka topic. Each consumer application can only process 50 MB/s. Your customer wants to achieve a target throughput of 1 GB/s. What is the minimum number of partitions will you suggest to the customer for that particular topic?
•
20
•
50
•
1
•
10
20
Explanation
each consumer can process only 50 MB/s, so we need at least 20 consumers consuming one partition so that 50 * 20 = 1000 MB target is achieved.
9
In Avro, adding a field to a record without default is a __ schema evolution
•
backward
•
forward
•
breaking
•
full
forward
Explanation
Clients with old schema will be able to read records saved with new schema.
10
A kafka topic has a replication factor of 3 and min.insync.replicas setting of 1. What is the maximum number of brokers that can be down so that a producer with acks=all can still produce to the topic?
•
0
•
2
•
1
•
3
2
Explanation
Two brokers can go down, and one replica will still be able to receive and serve data
11
You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3. How many tasks will you configure for the S3 connector?
•
6
•
10
•
3
•
2
2
Explanation
You cannot have more sink tasks (= consumers) than the number of partitions, so 2.
12
There are 3 producers writing to a topic with 5 partitions. There are 5 consumers consuming from the topic. How many Controllers will be present in the cluster?
•
1
•
5
•
2
•
3
1
Explanation
There is only one controller in a cluster at all times.
13
Which of the following Kafka Streams operators are stateful? (select all that apply)
•
count
•
flatmap
•
aggregate
•
peek
•
reduce
•
joining
count , aggregate , joining
14
A Zookeeper ensemble contains 3 servers. Over which ports the members of the ensemble should be able to communicate in default configuration? (select three)
•
443
•
3888
•
80
•
2888
•
9092
•
2181
3888 , 2888 , 2181
Explanation
2181 – client port, 2888 – peer port, 3888 – leader port
15
If I produce to a topic that does not exist, and the broker setting auto.create.topic.enable=true, what will happen?
•
Kafka will automatically create the topic with the indicated producer settings num.partitions and default.replication.factor
•
Kafka will automatically create the topic with 1 partition and 1 replication factor
•
Kafka will automatically create the topic with num.partitions=#of brokers and replication.factor=3
•
Kafka will automatically create the topic with the broker settings num.partitions and default.replication.factor
Kafka will automatically create the topic with the broker settings num.partitions and default.replication.factor
Explanation
The broker settings comes into play when a topic is auto created
16
Select all the way for one consumer to subscribe simultaneously to the following topics – topic.history, topic.sports, topic.politics? (select two)
•
consumer.subscribePrefix(“topic.”);
•
consumer.subscribe(Arrays.asList(“topic.history”, “topic.sports”, “topic.politics”));
•
consumer.subscribe(“topic.history”); consumer.subscribe(“topic.sports”); consumer.subscribe(“topic.politics”);
•
consumer.subscribe(Pattern.compile(“topic..*”));
consumer.subscribe(Arrays.asList(“topic.history”, “topic.sports”, “topic.politics”)); consumer.subscribe(Pattern.compile(“topic..*”));
Explanation
Multiple topics can be passed as a list or regex pattern.
17
How will you find out all the partitions where one or more of the replicas for the partition are not in-sync with the leader?
•
kafka-topics.sh –bootstrap-server localhost:9092 –describe –unavailable- partitions
•
kafka-topics.sh –broker-list localhost:9092 –describe –under-replicated-partitions
•
kafka-topics.sh –zookeeper localhost:2181 –describe –unavailable- partitions
•
kafka-topics.sh –zookeeper localhost:2181 –describe –under-replicated-partitions
kafka-topics.sh –zookeeper localhost:2181 –describe –under-replicated-partitions
18
If I want to send binary data through the REST proxy to topic “test_binary”, it needs to be base64 encoded. A consumer connecting directly into the Kafka topic “test_binary” will receive
•
json data
•
binary data
•
base64 encoded data, it will need to decode it
•
avro data
Explanation
On the producer side, after receiving base64 data, the REST Proxy will convert it into bytes and then send that bytes payload to Kafka. Therefore consumers reading directly from Kafka will receive binary data.
binary data
19
You want to perform table lookups against a KTable everytime a new record is received from the KStream. What is the output of KStream-KTable join?
•
KTable
•
KStream
•
You choose between KStream or KTable
•
GlobalKTable
KStream
Explanation
Here KStream is being processed to create another KStream.
20
What is the risk of increasing max.in.flight.requests.per.connection while also enabling retries in a producer?
•
Reduce throughput
•
Less resilient
•
At least once delivery is not guaranteed
•
Message order not preserved
Message order not preserved
Explanation
Some messages may require multiple retries. If there are more than 1 requests in flight, it may result in messages received out of order. Note an exception to this rule is if you enable the producer setting: enable.idempotence=true which takes care of the out of ordering case on its own.
21
There are two consumers C1 and C2 belonging to the same group G subscribed to topics T1 and T2. Each of the topics has 3 partitions. How will the partitions be assigned to consumers with PartitionAssignor being RoundRobinAssignor?
•
Two consumers cannot read from two topics at the same time
•
All consumers will read from all partitions
•
C1 will be assigned partitions 0 and 2 from T1 and partition 1 from T2. C2 will have partition 1 from T1 and partitions 0 and 2 from T2.
•
C1 will be assigned partitions 0 and 1 from T1 and T2, C2 will be assigned partition 2 from T1 and T2.
C1 will be assigned partitions 0 and 2 from T1 and partition 1 from T2. C2 will have partition 1 from T1 and partitions 0 and 2 from T2.
Explanation
The correct option is the only one where the two consumers share an equal number of partitions amongst the two topics of three partitions.
22
If I want to have an extremely high confidence that leaders and replicas have my data, I should use
•
acks=1, replication factor=3, min.insync.replicas=2
•
acks=all, replication factor=3, min.insync.replicas=2
•
acks=all, replication factor=2, min.insync.replicas=1
•
acks=all, replication factor=3, min.insync.replicas=1
acks=all, replication factor=3, min.insync.replicas=2
Explanation
acks=all means the leader will wait for all in-sync replicas to acknowledge the record. Also the min in-sync replica setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes.
23
Which KSQL queries write to Kafka?
•
SHOW STREAMS and EXPLAIN statements
•
COUNT and JOIN
•
CREATE STREAM WITH and CREATE TABLE WITH
•
CREATE STREAM AS SELECT and CREATE TABLE AS SELECT
•
SELECT … FROM foo WHERE ….
COUNT and JOIN
•
CREATE STREAM WITH and CREATE TABLE WITH
•
CREATE STREAM AS SELECT and CREATE TABLE AS SELECT
Explanation
SHOW STREAMS and EXPLAIN statements run against the KSQL server that the KSQL client is connected to. They don’t communicate directly with Kafka. CREATE STREAM WITH and CREATE TABLE WITH write metadata to the KSQL command topic. Persistent queries based on CREATE STREAM AS SELECT and CREATE TABLE AS SELECT read and write to Kafka topics. Non-persistent queries based on SELECT that are stateless only read from Kafka topics, for example SELECT … FROM foo WHERE …. Non-persistent queries that are stateful read and write to Kafka, for example, COUNT and JOIN. The data in Kafka is deleted automatically when you terminate the query with CTRL-C.
24
I am producing Avro data on my Kafka cluster that is integrated with the Confluent Schema Registry. After a schema change that is incompatible, I know my data will be rejected. Which component will reject the data?
•
Zookeeper
•
The Confluent Schema Registry
•
The Kafka Producer itself
•
The Kafka Broker
The Confluent Schema Registry
Explanation
The Confluent Schema Registry is your safeguard against incompatible schema changes and will be the component that ensures no breaking schema evolution will be possible. Kafka Brokers do not look at your payload and your payload schema, and therefore will not reject data
25
We would like to be in an at-most once consuming scenario. Which offset commit strategy would you recommend?
•
Commit the offsets in Kafka, before processing the data
•
Commit the offsets on disk, after processing the data
•
Commit the offsets in Kafka, after processing the data
•
Do not commit any offsets and read from beginning
Commit the offsets in Kafka, before processing the data
Explanation
Here, we must commit the offsets right after receiving a batch from a call to .poll()
26
The exactly once guarantee in the Kafka Streams is for which flow of data?
•
Kafka => External
•
Kafka => Kafka
•
External => Kafka
Kafka => Kafka
Explanation
Kafka Streams can only guarantee exactly once processing if you have a Kafka to Kafka topology.
27
You have a Kafka cluster and all the topics have a replication factor of 3. One intern at your company stopped a broker, and accidentally deleted all the data of that broker on the disk. What will happen if the broker is restarted?
•
The broker will start, and won’t be online until all the data it needs to have is replicated from other leaders
•
The broker will start, and other topics will also be deleted as the broker data on the disk got deleted
•
The broker will crash
•
The broker will start, and won’t have any data. If the broker comes leader, we have a data loss
The broker will start, and won’t be online until all the data it needs to have is replicated from other leaders
Explanation
Kafka replication mechanism makes it resilient to the scenarios where the broker lose data on disk, but can recover from replicating from other brokers. This makes Kafka amazing!
28
What information isn’t stored inside of Zookeeper? (select two)
•
Schema Registry schemas
•
Broker registration info
•
Consumer offset
•
ACL inforomation
•
Controller registration
Schema Registry schemas , Consumer offset
Explanation
Consumer offsets are stored in a Kafka topic __consumer_offsets, and the Schema Registry stored schemas in the _schemas topic.
29
To import data from external databases, I should use
•
Kafka Connect Sink
•
Kafka Streams
•
Confluent REST Proxy
•
Kafka Connect Source
Kafka Connect Source
Explanation
Kafka Connect Sink is used to export data from Kafka to external databases and Kafka Connect Source is used to import from external databases into Kafka.
30
Which of the following errors are retriable from a producer perspective? (select two)
•
NOT_LEADER_FOR_PARTITION
•
MESSAGE_TOO_LARGE
•
TOPIC_AUTHORIZATION_FAILED
•
INVALID_REQUIRED_ACKS
•
NOT_ENOUGH_REPLICAS
NOT_LEADER_FOR_PARTITION , NOT_ENOUGH_REPLICAS
Explanation
Both of these are retriable errors, others non-retriable errors.
See the full list of errors and their “retriable”
31
Where are the dynamic configurations for a topic stored?
•
On the Kafka broker file system
•
In Zookeeper
•
In an internal Kafka topic __topic_configuratins
•
In server.properties
In Zookeeper
Explanation
Dynamic topic configurations are maintained in Zookeeper.
32
A consumer application is using KafkaAvroDeserializer to deserialize Avro messages. What happens if message schema is not present in AvroDeserializer local cache?
•
Fails silently
•
Fetches schema from Schema Registry
•
Throws DeserializationException
•
Throws SerializationException
Fetches schema from Schema Registry
Explanation
First local cache is checked for the message schema. In case of cache miss, schema is pulled from the schema registry. An exception will be thrown in the Schema Registry does not have the schema (which should never happen if you set it up properly)
33
Your producer is producing at a very high rate and the batches are completely full each time. How can you improve the producer throughput? (select two)
•
Disable compression
•
Enable compression
•
Decrease batch.size
•
Decrease linger.ms
•
Increase batch.size
•
Increase linger.ms
Enable compression , Increase batch.size
Explanation
batch.size controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available memory. Enabling compression can also help make more compact batches and increase the throughput of your producer. Linger.ms will have no effect as the batches are already full
34
To enhance compression, I can increase the chances of batching by using
•
linger.ms=20
•
acks=all
•
max.message.size=10MB
•
batch.size=65536
linger.ms=20
Explanation
linger.ms forces the producer to wait before sending messages, hence increasing the chance of creating batches that can be heavily compressed.
35
Which of the following setting increases the chance of batching for a Kafka Producer?
•
Increase linger.ms
•
Increase the number of producer threads
•
Increase message.max.bytes
•
Increase batch.size
Increase linger.ms
Explanation
linger.ms forces the producer to wait to send messages, hence increasing the chance of creating batches
36
A topic receives all the orders for the products that are available on a commerce site. Two applications want to process all the messages independently – order fulfilment and monitoring. The topic has 4 partitions, how would you organise the consumers for optimal performance and resource usage?
•
Create four consumers in the same group, one for each partition – two for fulfilment and two for monitoring
•
Create two consumer groups for two applications with 4 consumers in each
•
Create two consumers groups for two applications with 8 consumers in each
•
Create 8 consumers in the same group with 4 consumers for each application
Create two consumer groups for two applications with 4 consumers in each
Explanation
two partitions groups – one for each application so that all messages are delivered to both the application. 4 consumers in each as there are 4 partitions of the topic, and you cannot have more consumers per groups than the number of partitions (otherwise they will be inactive and wasting resources)
37
What isn’t an internal Kafka Connect topic?
•
connect-jars
•
connect-configs
•
connect-status
•
connect-offsets
connect-jars
Explanation
connect-configs stores configurations, connect-status helps to elect leaders for connect, and connect-offsets store source offsets for source connectors
38
How do Kafka brokers ensure great performance between the producers and consumers? (select two)
•
It leverages zero-copy optimisations to send data straight from the page-cache
•
It transforms the messages into a binary format
•
It compresses the messages as it writes to the disk
•
It buffers the messages on disk, and sends messages from the disk reads
•
It does not transform the messages
It leverages zero-copy optimisations to send data straight from the page-cache•
It does not transform the messages
Explanation
Kafka transfers data with zero-copy and sends the raw bytes it receives from the producer straight to the consumer, leveraging the RAM available as page cache
39
You are using JDBC source connector to copy data from 2 tables to two Kafka topics. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers. How many tasks are launched?
•
6
•
2
•
3
•
1
2
Explanation
we have two tables, so the max number of tasks is 2
40
A client connects to a broker in the cluster and sends a fetch request for a partition in a topic. It gets an exception NotLeaderForPartitionException in the response. How does client handle this situation?
•
Get the Broker id from Zookeeper that is hosting the leader replica and send request to it
•
Send fetch request to each Broker in the cluster
•
Send metadata request to the same broker for the topic and select the broker hosting the leader replica
•
Send metadata request to Zookeeper for the topic and select the broker hosting the leader replica
Send metadata request to the same broker for the topic and select the broker hosting the leader replica
Explanation
In case the consumer has the wrong leader of a partition, it will issue a metadata request. The Metadata request can be handled by any node, so clients know afterwards which broker are the designated leader for the topic partitions. Produce and consume requests can only be sent to the node hosting partition leader.
41
What client protocol is supported for the schema registry? (select two)
•
HTTP
•
Websocket
•
HTTPS
•
SASL
•
JDBC
HTTP , HTTPS
Explanation
clients can interact with the schema registry using the HTTP or HTTPS interface
42
A bank uses a Kafka cluster for credit card payments. What should be the value of the property unclean.leader.election.enable?
•
TRUE
•
FALSE
FALSE
Explanation
Setting unclean.leader.election.enable to true means we allow out-of-sync replicas to become leaders, we will lose messages when this occurs, effectively losing credit card payments and making our customers very angry.
43
Which is an optional field in an Avro record?
•
namespace
•
fields
•
name
•
doc
doc
Explanation
doc represents optional description of message
44
Using the Confluent Schema Registry, where are Avro schema stored?
•
In the Schema Registry embedded SQL database
•
In the Zookeeper node /schemas
•
In the _schemas topic
•
In the message bytes themselves
In the _schemas topic
Explanation
The Schema Registry stores all the schemas in the _schemas Kafka topic
45
A consumer wants to read messages from a specific partition of a topic. How can this be achieved?
•
Call subscribe(String topic, int partition) passing the topic and partition number as the arguments
•
Call assign() passing a Collection of TopicPartitions as the argument
•
Call subscribe() passing TopicPartition as the argument
Call assign() passing a Collection of TopicPartitions as the argument
Explanation
assign() can be used for manual assignment of a partition to a consumer, in which case subscribe() must not be used. Assign() takes a collection of TopicPartition object as an argument
46
Producing with a key allows to…
•
Influence partitioning of the producer messages
•
Ensure per-record level security
•
Add more information to my message
•
Allow a Kafka Consumer to subscribe to a (topic,key) pair and only receive that data
Influence partitioning of the producer messages
Explanation
Keys are necessary if you require strong ordering or grouping for messages that share the same key. If you require that messages with the same key are always seen in the correct order, attaching a key to messages will ensure messages with the same key always go to the same partition in a topic. Kafka guarantees order within a partition, but not across partitions in a topic, so alternatively not providing a key – which will result in round-robin distribution across partitions – will not maintain such order.
47
You are running a Kafka Streams application in a Docker container managed by Kubernetes, and upon application restart, it takes a long time for the docker container to replicate the state and get back to processing the data. How can you improve dramatically the application restart?
•
Increase the number of partitions in your inputs topic
•
Reduce the Streams caching property
•
Increase the number of Streams threads
•
Mount a persistent volume for your RocksDB
Mount a persistent volume for your RocksDB
Explanation
Although any Kafka Streams application is stateless as the state is stored in Kafka, it can take a while and lots of resources to recover the state from Kafka. In order to speed up recovery, it is advised to store the Kafka Streams state on a persistent volume, so that only the missing part of the state needs to be recovered.
48
Where are the ACLs stored in a Kafka cluster by default?
•
Inside the broker’s data directory
•
Under Zookeeper node /kafka-acl/
•
Inside the Zookeeper’s data directory
•
In Kafka topic __kafka_acls
Under Zookeeper node /kafka-acl/
Explanation
ACLs are stored in Zookeeper node /kafka-acls/ by default.
49
Which of the following event processing application is stateless? (select two)
•
Find the minimum and maximum stock prices for each day of trading
•
Read events from a stream and modifies them from JSON to Avro
•
Read log messages from a stream and writes ERROR events into a high-priority stream and the rest of the events into a low-priority stream
•
Publish the top 10 stocks each day
Read events from a stream and modifies them from JSON to Avro
•
Read log messages from a stream and writes ERROR events into a high-priority stream and the rest of the events into a low-priority stream
Explanation
Stateless means processing of each message depends only on the message, so converting from JSON to Avro or filtering a stream are both stateless operations
50
To get acknowledgement of writes to only the leader partition, we need to use the config…
•
acks=0
•
acks=all
•
acks=1
acks=1
Explanation
Producers can set acks=1 to get acknowledgement from partition leader only.