Katia 0.1.0 documentation

Katia is a python project ment to be used as a package to create assistants where they are needed.
It is based on the top techs in the market. It uses OPENAI API
to understand what are
you telling to her/him. To speak with you he/she uses AWS Polly
. And, to heard what are
you telling to him/her he/she uses Google recognizer
.
Yo can check the code in GitHub or follow the official documentation.
First steps
Well, so you want a new assistant that is really “real”. You want to speak fluently with him/her, you want that he/she understand you better. This is your package.
Katia will be your friend
Katia real technical name is Kat.I.A
.
This name comes from a very old joke that his creator did time ago (early 10s).
One day he said “I will help to create a cat that will be an AI (in spanish IA) and will help people”
Cat AI -> Kat IA -> Katia

Katia is ment to help people, not to hurt them. It accept a high level of customization. For this reason each people will have a different experience with Katia.
Main core of Katia is chatgpt models, so it will be as powerful as chatgpt, as it uses it’s API.
This means also that depending on the initial prompt you configure or in the things you said to it, it can act in many different ways.
The prompt can be configured via KATIA_EXTRA_DESCRIPTION
parameter. So, for example,
this prompt:
KATIA_EXTRA_DESCRIPTION = "You will try to convince everyone that you are a real person"
Will act different than this:
KATIA_EXTRA_DESCRIPTION = "You will try to convince everyone that you are a frog"
Installation guide
Supported Python versions
Katia requires Python 3.8+
. It will be continuously updated, so it will be supporting
the latest’s versions of python.
Requirements
There ara a few of requirements to run Katia without problems.
First of all, Katia uses kafka
, so you need a kafka url where the topics for the
correct work can be created.
Also it uses voice recognition, so you need all the dependencies in your system for that. This means you need the following:
ffmpeg
python3-pyaudio
libasound-dev
Katia had been tested only in Linux, so it is not sure if it will work in Windows or Mac.
If you are using that OS and you have problems, feel free to open an issue and try to create a PR that solve it.
You can check how to install Pyaudio
in different OS
here
Installing Katia
You can install Katia and its dependencies using from PyPI
using pip:
pip install katia
We strongly recommend that you install Katia in a dedicated virtualenv, to avoid conflicting with your system packages.
Using a virtual environment (recommended)
TL;DR: We recommend installing Katia inside a virtual environment on all platforms.
Python packages can be installed either globally (a.k.a system wide), or in user-space. We do not recommend installing Katia system wide.
Instead, we recommend that you install Katia within a so-called
“virtual environment” (venv
).
Virtual environments allow you to not conflict with already-installed Python
system packages (which could break some of your system tools and scripts),
and still install packages normally with pip
(without sudo
and the likes).
See Virtual Environments and Packages on how to create your virtual environment.
Once you have created a virtual environment, you can install Katia inside it with pip
,
just like any other Python package.
Katia Tutorial
In this tutorial, we’ll assume that Katia and its requirements are already installed on your system. If that’s not the case, see Installation guide.
OPENAI setup
First of all you need a OPENAI account, as the interpreter is based on it. The value you need is a token. You can obtain it once you created your account.
You can sign up from the official platform openai page.
Once you are log in, you can get a new secret key from the API Keys section.
Remember, the OPENAI API is not free, try to configure the usage limits that suits with you here.
AWS setup
Katia requires AWS Polly service to talk. So you need to have an AWS account with polly configured and the permissions required.
You can follow the AWS guide.
Basic configuration
We are going to create a simple assistant with a couple of configurations. You can get the basic config directly from the Github files.
First of all we need to create a .env
file. This is because Katia configuration is meant
to be set from environment values.
Here is an example of the .env
that you can create:
# KAFKA CONFIGURATION
KAFKA_BROKER_URL=<CHANGE_ME>
KAFKA_BROKER_PORT=<CHANGE_ME>
# General configuration
KATIA_LANGUAGE=es-ES
KATIA_MAIN_NAME=Katia
KATIA_VALID_NAMES="['katia', 'catia', 'catya', 'katya', 'cati', 'katy', 'caty', 'kati']"
KATIA_ADJECTIVES="[]"
KATIA_EXTRA_DESCRIPTION="'You will always try to be very very concise.'"
# Recognizer configuration
RECOGNIZER_ENERGY_THRESHOLD=1
RECOGNIZER_DYNAMIC_ENERGY_THRESHOLD=False
RECOGNIZER_PAUSE_THRESHOLD=0.4
RECOGNIZER_PHRASE_THRESHOLD=0.8
RECOGNIZER_NON_SPEAKING_DURATION=0.2
RECOGNIZER_STOPPER_EXTRA_WORDS="[]"
RECOGNIZER_STOPPER_SENTENCES="['stop', 'now', 'talking', 'please']"
# Interpreter configuration
OPENAI_KEY=<CHANGE_ME>
OPENAI_MODEL=gpt-4
# Speaker configuration
AWS_PROFILE_NAME=adminuser
AWS_VOICE_NAME=Lucia
AWS_ENGINE=neural
Here it is mandatory to change the OPENAI_KEY
, and not mandatory, but by default the
AWS_PROFILE_NAME
is adminuser
. So if this is not your profile change it also.
Also, remember to change the KAFKA_BROKER_URL
and KAFKA_BROKER_PORT
values to your
kafka broker.
You can explore the different values for the configuration in the configuration documentation
Run kafka in local
This step is only needed if you want to run Katia pointing to a local kafka. If not, you can go directly to run section.
First of all you have to add these values to the .env
file:
# KAFKA ENV
ZOOKEEPER_CLIENT_PORT=2181
ZOOKEEPER_TICK_TIME=2000
KAFKA_ZOOKEEPER_CONNECT='zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR=1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=3
Now you can use this docker-compose.yml
and run kafka using docker:
version: '3.4'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.0.1
container_name: zookeeper
hostname: zookeeper
ports:
- "2181:2181"
env_file:
- .env
healthcheck:
test: nc -z localhost 2181 || exit -1
interval: 10s
timeout: 5s
retries: 3
start_period: 10s
broker:
image: confluentinc/cp-kafka:7.0.1
container_name: broker
hostname: broker
depends_on:
zookeeper:
condition: service_healthy
links:
- zookeeper
ports:
- "29092:29092"
- "9092:9092"
- "9101:9101"
env_file:
- .env
Then just run the following:
docker compose build
docker compose up --force-recreate -d
Run
Now we can create a simple main.py
script to run Katia in your local:
from dotenv import load_dotenv
from katia.katia import Katia
from katia.logger_manager.logger import setup_logger
from katia.owner import Owner
if __name__ == "__main__":
load_dotenv()
setup_logger()
owner = Owner(name="Katia User")
Katia(owner=owner)
And voila! Katia will start talking to you!

Architecture
Ok, so now you want to know how katia really works.
As we said in the main doc, Katia is based on three services, one recognizer that will recognize your voice and translate the audio to text. One interpreter that will interpret that message and return a text as a response. And finally one speaker that will reproduce that generated text.
But, how they communicate with each-other, when the speaker knows that it can talk? or when the interpreter knows it can interpret? Well, they are sending messages to kafka topics.
This way we are able to decouple completely the different modules, so if they grow in complexity and functionalities we can separate them to a different services easily.
For example you can run the recognizer and the speaker on the client side, but manage the interpreter in a dedicated server.
While they are connected to the same Kafka this will work.
There is one weak limitation. If you want Katia to work better you should run the recognizer and the speaker in the same host. As some of the things it is doing is checking if she is already talking (using mixer).
So if you separate them, it will work yes, but maybe she tries to interact with herself.
Schema
The main architecture is as follows:

As you can see, the different modules have no dependencies between them, so they can be easily decoupled.
- Katia will be your friend
Introduction to Katia
- Installation guide
Installation guide
- Katia Tutorial
Start with Katia!
- Architecture
How is Katia under the skin?
Configuration
Katia Configuration
You can configure Katia with different parameters that will allow you to create different assistants with different behaviours.
Docker configuration
If you are running in local kafka, using the associated tutorial, you can customize some aspects of kafka deployment.
Zookeeper configuration
ZOOKEEPER_CLIENT_PORT
:Instructs ZooKeeper where to listen for connections by clients such as Apache Kafka®.
ZOOKEEPER_TICK_TIME
:This is only required when running in clustered mode. Sets the server ID in the
myid
file, which consists of a single line that contains only the text of that machine’s ID. For example, themyid
of server 1 would only contain the text “1”. The ID must be unique within the ensemble and should have a value between 1 and 255.
Kafka configuration
KAFKA_ZOOKEEPER_CONNECT
:Instructs Kafka how to get in touch with ZooKeeper.
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
:Defines key/value pairs for the security protocol to use, per listener name. This is equivalent to the
listener.security.protocol.map
configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties
).KAFKA_ADVERTISED_LISTENERS
:A comma-separated list of listeners with their the host/IP and port. This is the metadata that is passed back to clients. This is equivalent to the
advertised.listeners
configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties
).KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR
:This setting defines the replication factor of the topic used to store the consumers offset. In the default case this is set to
1
. So the consumer offsets for a particular topic will only be present on a single node. If that node goes down, consumers will lose track of where they are, since they can’t update the consumer offsets.KAFKA_TRANSACTION_STATE_LOG_MIN_ISR
:The minimum ISR for this topic. This is equivalent to the
transaction.state.log.min.isr
configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties
).KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR
:The replication factor for the transaction topic (set higher to ensure availability). Internal topic creation will fail until the cluster size meets this replication factor requirement. This is equivalent to the
transaction.state.log.replication.factor
configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties
).KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS
:The amount of time the group coordinator will wait for more consumers to join a new group before performing the first rebalance. A longer delay means potentially fewer rebalances, but increases the time until processing begins. This is equivalent to the
group.initial.rebalance.delay.ms
configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties
).
Katia service configuration
Kafka configuration
Both this configuration fields are mandatory, and Katia will not work unless she can connect to a kafka broker. As Katia is based on an architecture centered in kafka communication between services.
KAFKA_BROKER_URL
: MANDATORYThis is your kafka url.
KAFKA_BROKER_PORT
: MANDATORYThis is your kafka port.
General configuration
KATIA_LANGUAGE
:Language for Katia. You can use the standard codes for languages from ISO 639-1 and from ISO 3166-1.
en-US en-EN es-ES ...
KATIA_MAIN_NAME
:The name of the interpreter. It is used to generate the interpreter personality. As it will be the name used to define itself. We like
Katia
, but you can configure with the name you want for your assistant 🙂KATIA_VALID_NAMES
:Sometimes the recognizer does not recognize well the name of the assistant, so here you can add the different valid pronunciations/names for your assistant. For example, for
Katia
should be also valid this list:"['katia', 'catia', 'catya', 'katya', 'cati', 'katy', 'caty', 'kati']"
Extra prompt
By default the prompt form Katia is very easy: You are a assistant called {name}.
.
But, you can add extra things to this prompt, like adjectives for the assistant, or extra
text.
For that purpose you can use the following configurations:
KATIA_ADJECTIVES
:This will be a list of adjectives that will be placed before the assistant word in the initial prompt. So, for example
"['funny', 'helpful', 'kind']"
will produce the following prompt:You are a funny, helpful and kind assistant called {name}.
KATIA_EXTRA_DESCRIPTION
:This is parameter you want if you want to add extra behaviours for your assistant. It is a free text and you can add things like:
You will always be very concise.
and it will produce the following prompt:You are a assistant called {name}. You will always be very concise.
You can add very complex (and funny) things, like for example:
You will always speak in verse with assonant rhyme.
, oryou're always rapping like you're from a fucked up neighborhood, saying a lot of swear words.
.We do not encourage to set up this last prompt if you are going to show Katia to your granny, unless your granny is a very tough granny. 👵
Recognizer configuration
RECOGNIZER_ENERGY_THRESHOLD
:- This is the minimum audio energy to consider for recording. Under ‘ideal’ conditions
(such as in a quiet room), values between 0 and 100 are considered silent or ambient, and values 300 to about 3500 are considered speech.
RECOGNIZER_DYNAMIC_ENERGY_THRESHOLD
:With
RECOGNIZER_DYNAMIC_ENERGY_THRESHOLD
set to'True'
, Katia will continuously try to re-adjust the energy threshold to match the environment based on the ambient noise level at that time.RECOGNIZER_PAUSE_THRESHOLD
:Seconds of non-speaking audio before a phrase is considered complete for Katia.
RECOGNIZER_PHRASE_THRESHOLD
:Minimum seconds of speaking audio before we consider the speaking audio a phrase - values below this are ignored (for filtering out clicks and pops).
RECOGNIZER_NON_SPEAKING_DURATION
:Seconds of non-speaking audio to keep on both sides of the recording.
RECOGNIZER_STOPPER_EXTRA_WORDS
:List of words that can be added to the stopper sentences. This field is complementary to
RECOGNIZER_STOPPER_SENTENCES
. And it is ment to work together.RECOGNIZER_STOPPER_SENTENCES
:List of sentences to use for katia to stop talking. This field is complementary to
RECOGNIZER_STOPPER_EXTRA_WORDS
. For example, you can use this values:RECOGNIZER_STOPPER_EXTRA_WORDS="['hey', 'please']" RECOGNIZER_STOPPER_SENTENCES="['stop talking', 'shut up']"
And if you say something like:
Hey Katia, stop talking please
it will stop. But for something likeHey catia, can you shut up now please?
it will not work. You should addcan
,you
andnow
toRECOGNIZER_STOPPER_EXTRA_WORDS
.RECOGNIZER_CONTINUE_CONVERSATION_DELAY_IN_SECONDS
:Seconds to wait for the recognizer to stop conversation flux (so you have to call again the assistant). This is not since start talking but since Katia stopped talking.
RECOGNIZER_GAP_CONTINUE_CONVERSATION_IN_SECONDS
:Gap between when Katia stopped talking and before your next iteration with Katia. You can configure it to
0
, but, the recognizer maybe get the Katia response as one of your answers, and Katia will start talking with herself.In future this should be removed if using voice recognition instead of speech recognition.
Interpreter configuration
OPENAI_KEY
: MANDATORYThis is your API key from openai. You can follow the tutorial about how to get one in the OPENAI setup tutorial
If the API key is not valid or if it is not set at all you will se an exception.
OPENAI_MODEL
:This is the model you want to use for chatgpt. You can check which ones you can use in the official documentation of OPENAI.
Speaker configuration
AWS_PROFILE_NAME
:This is your AWS profile for the speaker. You can follow the tutorial to get this well configured.
AWS_VOICE_NAME
:This is the model you want to use from Polly for your voice. You can pick one of the available ones provided by AWS Polly. (
Name/ID
value)Remember that some of the voices does not support neural voice, so make sure you configure this value according to the
AWS_ENGINE
AWS_ENGINE
:This is the type of engine you want to use for AWS Polly. The two values accepted are
neural
orstandard
. You can check the differences here.
- Katia Configuration
Configuring Katia