Katia 0.1.0 documentation

logo

https://img.shields.io/badge/code%20style-black-000000.svg Linters Tests Coverage PyPI docs

Katia is a python project ment to be used as a package to create assistants where they are needed.

It is based on the top techs in the market. It uses OPENAI API to understand what are you telling to her/him. To speak with you he/she uses AWS Polly. And, to heard what are you telling to him/her he/she uses Google recognizer.

Yo can check the code in GitHub or follow the official documentation.

First steps

Well, so you want a new assistant that is really “real”. You want to speak fluently with him/her, you want that he/she understand you better. This is your package.

Katia will be your friend

Katia real technical name is Kat.I.A.

This name comes from a very old joke that his creator did time ago (early 10s).

One day he said “I will help to create a cat that will be an AI (in spanish IA) and will help people”

Cat AI -> Kat IA -> Katia

husky-meme

Katia is ment to help people, not to hurt them. It accept a high level of customization. For this reason each people will have a different experience with Katia.

Main core of Katia is chatgpt models, so it will be as powerful as chatgpt, as it uses it’s API.

This means also that depending on the initial prompt you configure or in the things you said to it, it can act in many different ways.

The prompt can be configured via KATIA_EXTRA_DESCRIPTION parameter. So, for example, this prompt:

KATIA_EXTRA_DESCRIPTION = "You will try to convince everyone that you are a real person"

Will act different than this:

KATIA_EXTRA_DESCRIPTION = "You will try to convince everyone that you are a frog"

Installation guide

Supported Python versions

Katia requires Python 3.8+. It will be continuously updated, so it will be supporting the latest’s versions of python.

Requirements

There ara a few of requirements to run Katia without problems.

First of all, Katia uses kafka, so you need a kafka url where the topics for the correct work can be created.

Also it uses voice recognition, so you need all the dependencies in your system for that. This means you need the following:

  • ffmpeg

  • python3-pyaudio

  • libasound-dev

Katia had been tested only in Linux, so it is not sure if it will work in Windows or Mac.

If you are using that OS and you have problems, feel free to open an issue and try to create a PR that solve it.

You can check how to install Pyaudio in different OS here

Installing Katia

You can install Katia and its dependencies using from PyPI using pip:

pip install katia

We strongly recommend that you install Katia in a dedicated virtualenv, to avoid conflicting with your system packages.

Katia Tutorial

In this tutorial, we’ll assume that Katia and its requirements are already installed on your system. If that’s not the case, see Installation guide.

OPENAI setup

First of all you need a OPENAI account, as the interpreter is based on it. The value you need is a token. You can obtain it once you created your account.

You can sign up from the official platform openai page.

Once you are log in, you can get a new secret key from the API Keys section.

Remember, the OPENAI API is not free, try to configure the usage limits that suits with you here.

AWS setup

Katia requires AWS Polly service to talk. So you need to have an AWS account with polly configured and the permissions required.

You can follow the AWS guide.

Basic configuration

We are going to create a simple assistant with a couple of configurations. You can get the basic config directly from the Github files.

First of all we need to create a .env file. This is because Katia configuration is meant to be set from environment values.

Here is an example of the .env that you can create:

# KAFKA CONFIGURATION
KAFKA_BROKER_URL=<CHANGE_ME>
KAFKA_BROKER_PORT=<CHANGE_ME>

# General configuration
KATIA_LANGUAGE=es-ES
KATIA_MAIN_NAME=Katia
KATIA_VALID_NAMES="['katia', 'catia', 'catya', 'katya', 'cati', 'katy', 'caty', 'kati']"
KATIA_ADJECTIVES="[]"
KATIA_EXTRA_DESCRIPTION="'You will always try to be very very concise.'"

# Recognizer configuration
RECOGNIZER_ENERGY_THRESHOLD=1
RECOGNIZER_DYNAMIC_ENERGY_THRESHOLD=False
RECOGNIZER_PAUSE_THRESHOLD=0.4
RECOGNIZER_PHRASE_THRESHOLD=0.8
RECOGNIZER_NON_SPEAKING_DURATION=0.2
RECOGNIZER_STOPPER_EXTRA_WORDS="[]"
RECOGNIZER_STOPPER_SENTENCES="['stop', 'now', 'talking', 'please']"

# Interpreter configuration
OPENAI_KEY=<CHANGE_ME>
OPENAI_MODEL=gpt-4

# Speaker configuration
AWS_PROFILE_NAME=adminuser
AWS_VOICE_NAME=Lucia
AWS_ENGINE=neural

Here it is mandatory to change the OPENAI_KEY, and not mandatory, but by default the AWS_PROFILE_NAME is adminuser. So if this is not your profile change it also.

Also, remember to change the KAFKA_BROKER_URL and KAFKA_BROKER_PORT values to your kafka broker.

You can explore the different values for the configuration in the configuration documentation

Run kafka in local

This step is only needed if you want to run Katia pointing to a local kafka. If not, you can go directly to run section.

First of all you have to add these values to the .env file:

# KAFKA ENV
ZOOKEEPER_CLIENT_PORT=2181
ZOOKEEPER_TICK_TIME=2000
KAFKA_ZOOKEEPER_CONNECT='zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR=1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=3

Now you can use this docker-compose.yml and run kafka using docker:

version: '3.4'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.0.1
    container_name: zookeeper
    hostname: zookeeper
    ports:
      - "2181:2181"
    env_file:
      - .env
    healthcheck:
      test: nc -z localhost 2181 || exit -1
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 10s
  broker:
    image: confluentinc/cp-kafka:7.0.1
    container_name: broker
    hostname: broker
    depends_on:
      zookeeper:
        condition: service_healthy
    links:
      - zookeeper
    ports:
      - "29092:29092"
      - "9092:9092"
      - "9101:9101"
    env_file:
      - .env

Then just run the following:

docker compose build
docker compose up --force-recreate -d

Run

Now we can create a simple main.py script to run Katia in your local:

from dotenv import load_dotenv

from katia.katia import Katia
from katia.logger_manager.logger import setup_logger
from katia.owner import Owner

if __name__ == "__main__":
    load_dotenv()
    setup_logger()

    owner = Owner(name="Katia User")
    Katia(owner=owner)

And voila! Katia will start talking to you!

magic-meme

Architecture

Ok, so now you want to know how katia really works.

As we said in the main doc, Katia is based on three services, one recognizer that will recognize your voice and translate the audio to text. One interpreter that will interpret that message and return a text as a response. And finally one speaker that will reproduce that generated text.

But, how they communicate with each-other, when the speaker knows that it can talk? or when the interpreter knows it can interpret? Well, they are sending messages to kafka topics.

This way we are able to decouple completely the different modules, so if they grow in complexity and functionalities we can separate them to a different services easily.

For example you can run the recognizer and the speaker on the client side, but manage the interpreter in a dedicated server.

While they are connected to the same Kafka this will work.

There is one weak limitation. If you want Katia to work better you should run the recognizer and the speaker in the same host. As some of the things it is doing is checking if she is already talking (using mixer).

So if you separate them, it will work yes, but maybe she tries to interact with herself.

Schema

The main architecture is as follows:

logo

As you can see, the different modules have no dependencies between them, so they can be easily decoupled.

Katia will be your friend

Introduction to Katia

Installation guide

Installation guide

Katia Tutorial

Start with Katia!

Architecture

How is Katia under the skin?

Configuration

Katia Configuration

You can configure Katia with different parameters that will allow you to create different assistants with different behaviours.

Docker configuration

If you are running in local kafka, using the associated tutorial, you can customize some aspects of kafka deployment.

Zookeeper configuration
  • ZOOKEEPER_CLIENT_PORT:

    Instructs ZooKeeper where to listen for connections by clients such as Apache Kafka®.

  • ZOOKEEPER_TICK_TIME:

    This is only required when running in clustered mode. Sets the server ID in the myid file, which consists of a single line that contains only the text of that machine’s ID. For example, the myid of server 1 would only contain the text “1”. The ID must be unique within the ensemble and should have a value between 1 and 255.

Kafka configuration
  • KAFKA_ZOOKEEPER_CONNECT:

    Instructs Kafka how to get in touch with ZooKeeper.

  • KAFKA_LISTENER_SECURITY_PROTOCOL_MAP:

    Defines key/value pairs for the security protocol to use, per listener name. This is equivalent to the listener.security.protocol.map configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties).

  • KAFKA_ADVERTISED_LISTENERS:

    A comma-separated list of listeners with their the host/IP and port. This is the metadata that is passed back to clients. This is equivalent to the advertised.listeners configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties).

  • KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR:

    This setting defines the replication factor of the topic used to store the consumers offset. In the default case this is set to 1. So the consumer offsets for a particular topic will only be present on a single node. If that node goes down, consumers will lose track of where they are, since they can’t update the consumer offsets.

  • KAFKA_TRANSACTION_STATE_LOG_MIN_ISR:

    The minimum ISR for this topic. This is equivalent to the transaction.state.log.min.isr configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties).

  • KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR:

    The replication factor for the transaction topic (set higher to ensure availability). Internal topic creation will fail until the cluster size meets this replication factor requirement. This is equivalent to the transaction.state.log.replication.factor configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties).

  • KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS:

    The amount of time the group coordinator will wait for more consumers to join a new group before performing the first rebalance. A longer delay means potentially fewer rebalances, but increases the time until processing begins. This is equivalent to the group.initial.rebalance.delay.ms configuration parameter in the server properties file (<path-to-confluent>/etc/kafka/server.properties).

Katia service configuration

Kafka configuration

Both this configuration fields are mandatory, and Katia will not work unless she can connect to a kafka broker. As Katia is based on an architecture centered in kafka communication between services.

  • KAFKA_BROKER_URL: MANDATORY

    This is your kafka url.

  • KAFKA_BROKER_PORT: MANDATORY

    This is your kafka port.

General configuration
  • KATIA_LANGUAGE:

    Language for Katia. You can use the standard codes for languages from ISO 639-1 and from ISO 3166-1.

    en-US
    en-EN
    es-ES
    ...
    
  • KATIA_MAIN_NAME:

    The name of the interpreter. It is used to generate the interpreter personality. As it will be the name used to define itself. We like Katia, but you can configure with the name you want for your assistant 🙂

  • KATIA_VALID_NAMES:

    Sometimes the recognizer does not recognize well the name of the assistant, so here you can add the different valid pronunciations/names for your assistant. For example, for Katia should be also valid this list: "['katia', 'catia', 'catya', 'katya', 'cati', 'katy', 'caty', 'kati']"

Extra prompt

By default the prompt form Katia is very easy: You are a assistant called {name}.. But, you can add extra things to this prompt, like adjectives for the assistant, or extra text.

For that purpose you can use the following configurations:

  • KATIA_ADJECTIVES:

    This will be a list of adjectives that will be placed before the assistant word in the initial prompt. So, for example "['funny', 'helpful', 'kind']" will produce the following prompt: You are a funny, helpful and kind assistant called {name}.

  • KATIA_EXTRA_DESCRIPTION:

    This is parameter you want if you want to add extra behaviours for your assistant. It is a free text and you can add things like: You will always be very concise. and it will produce the following prompt: You are a assistant called {name}. You will always be very concise.

    You can add very complex (and funny) things, like for example: You will always speak in verse with assonant rhyme., or you're always rapping like you're from a fucked up neighborhood, saying a lot of swear words..

    We do not encourage to set up this last prompt if you are going to show Katia to your granny, unless your granny is a very tough granny. 👵

Recognizer configuration
  • RECOGNIZER_ENERGY_THRESHOLD:

    This is the minimum audio energy to consider for recording. Under ‘ideal’ conditions

    (such as in a quiet room), values between 0 and 100 are considered silent or ambient, and values 300 to about 3500 are considered speech.

  • RECOGNIZER_DYNAMIC_ENERGY_THRESHOLD:

    With RECOGNIZER_DYNAMIC_ENERGY_THRESHOLD set to 'True', Katia will continuously try to re-adjust the energy threshold to match the environment based on the ambient noise level at that time.

  • RECOGNIZER_PAUSE_THRESHOLD:

    Seconds of non-speaking audio before a phrase is considered complete for Katia.

  • RECOGNIZER_PHRASE_THRESHOLD:

    Minimum seconds of speaking audio before we consider the speaking audio a phrase - values below this are ignored (for filtering out clicks and pops).

  • RECOGNIZER_NON_SPEAKING_DURATION:

    Seconds of non-speaking audio to keep on both sides of the recording.

  • RECOGNIZER_STOPPER_EXTRA_WORDS:

    List of words that can be added to the stopper sentences. This field is complementary to RECOGNIZER_STOPPER_SENTENCES. And it is ment to work together.

  • RECOGNIZER_STOPPER_SENTENCES:

    List of sentences to use for katia to stop talking. This field is complementary to RECOGNIZER_STOPPER_EXTRA_WORDS. For example, you can use this values:

    RECOGNIZER_STOPPER_EXTRA_WORDS="['hey', 'please']"
    RECOGNIZER_STOPPER_SENTENCES="['stop talking', 'shut up']"
    

    And if you say something like: Hey Katia, stop talking please it will stop. But for something like Hey catia, can you shut up now please? it will not work. You should add can, you and now to RECOGNIZER_STOPPER_EXTRA_WORDS.

  • RECOGNIZER_CONTINUE_CONVERSATION_DELAY_IN_SECONDS:

    Seconds to wait for the recognizer to stop conversation flux (so you have to call again the assistant). This is not since start talking but since Katia stopped talking.

  • RECOGNIZER_GAP_CONTINUE_CONVERSATION_IN_SECONDS:

    Gap between when Katia stopped talking and before your next iteration with Katia. You can configure it to 0, but, the recognizer maybe get the Katia response as one of your answers, and Katia will start talking with herself.

    In future this should be removed if using voice recognition instead of speech recognition.

Interpreter configuration
  • OPENAI_KEY: MANDATORY

    This is your API key from openai. You can follow the tutorial about how to get one in the OPENAI setup tutorial

    If the API key is not valid or if it is not set at all you will se an exception.

  • OPENAI_MODEL:

    This is the model you want to use for chatgpt. You can check which ones you can use in the official documentation of OPENAI.

Speaker configuration
  • AWS_PROFILE_NAME:

    This is your AWS profile for the speaker. You can follow the tutorial to get this well configured.

  • AWS_VOICE_NAME:

    This is the model you want to use from Polly for your voice. You can pick one of the available ones provided by AWS Polly. (Name/ID value)

    Remember that some of the voices does not support neural voice, so make sure you configure this value according to the AWS_ENGINE

  • AWS_ENGINE:

    This is the type of engine you want to use for AWS Polly. The two values accepted are neural or standard. You can check the differences here.

Katia Configuration

Configuring Katia