Skip to main content

Command Palette

Search for a command to run...

How I Leveraged ChatGPT to Save Our Organization Thousands!

Updated
11 min read
How I Leveraged ChatGPT to Save Our Organization Thousands!
P

I am a Software Developer from Canada, and I find joy in sharing what I have learned over the years to assist others on their software development journeys.

I am a Senior Full Stack developer who works in the telecommunications sector. No software is perfect, and you are bound to experience some defects. When our users are impacted by a production defect, we often create ad hoc reports. We have a dedicated DBA who generates these reports. However, when the reports require complex queries, it takes a few hours, sometimes even a few days, to generate accurate reports. Due to my current responsibilities, I do not work with our databases frequently. Therefore, when I am asked to generate reports, it takes me a little longer than it would for a DBA to generate the reports. This is mainly because I do not work with SQL databases on a day-to-day basis. Our DBA works offshore hours, so the work will not start until the next day. If the requirements are not clear, then it will take even longer to generate the reports.

One time, I was asked to generate a report. This report would be generated using the tables that I had only heard about but never fully accessed. The Technical Product Owner (TPO) and I were pair-programming to come up with the right query to generate an accurate report, but it was taking too long and the results were not accurate. I decided to use ChatGPT. I articulated the query I had in mind, and it produced the desired query. Then I would copy and paste the query, change the tables and column names, and execute it to get the desired results. I did that for a couple of hours and managed to create the right query resulting in an accurate report.

A light bulb went off in my head. I knew that generative AI tools are context-based. The more context you provide when priming your chatbot, the more accurate the results will be when asked a question. So, I went on ChatGPT and provided a very simple table schema, asking it to generate a query that would retrieve all customers with orders needing shipment. Lo and behold, it did exactly that.

Since this proof of concept (PoC) worked, I decided to create a chatbot that can generate these queries for me given a detailed context. In this context, I included the table names, column names, types, and a detailed explanation of what each column represents. Then, I asked the chatbot to generate a query for a report, and it did exactly that. For obvious reasons, I cannot share the exact solution and have created a mock example of my solution.

I opted not to engage in schema design thinking, so I tasked ChatGPT with generating both the schema and the dataset for me. Here's the prompt I provided:

I am working with LangChain. I am trying to create an Select SQL query builder. I want you to create me table schemas and sample data like mentioned below for a complex E-commerce business.

TableName, ColumnName, ColumnDataType, ForeignKey

Here is an example of the output I am expecting you to return me:
*******Users Schema*******
id, UUID, NULL, A unique identifier for each user, stored as a UUID
email, TEXT, NULL, The email address of the user, stored as text
created_at, timestamptz, NULL, The timestamp indicating when the user account was created
updated_at, timestamptz, NULL, The timestamp indicating when the user account was last updated
*******Users Example Data CSV format*******
id,username,email,password,created_at,updated_at
a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11,user1,user1@example.com-03-01 10:00:00,2024-03-01 10:00:00
b6d96b6a-b6c4-11eb-8529-0242ac130003,user2,user2@example.com,2024-03-02 09:00:00,2024-03-02 09:00:00
c6d96b6a-b6c4-11eb-8529-0242ac130003,user3,user3@example.com,2024-03-02 10:00:00,2024-03-02 10:00:00

With a little bit of finesse I was able generate the following table schemas and datasets.

This schema is for the table Users.
Description: Stores information about users registered on the e-commerce platform.
*******Users Schema*******
id, UUID, NULL, A unique identifier for each user, stored as a UUID
email, TEXT, NULL, The email address of the user, stored as text
created_at, timestamptz, NULL, The timestamp indicating when the user account was created
updated_at, timestamptz, NULL, The timestamp indicating when the user account was last updated
*******Users Example Data CSV format*******
id,username,email,password,created_at,updated_at
a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11,user1,user1@example.com,2024-03-01 10:00:00,2024-03-01 10:00:00
  b6d96b6a-b6c4-11eb-8529-0242ac130003,user2,user2@example.com,2024-03-02 09:00:00,2024-03-02 09:00:00
c6d96b6a-b6c4-11eb-8529-0242ac130003,user3,user3@example.com,2024-03-02 10:00:00,2024-03-02 10:00:00

This schema is for the table Products.
Description: This table holds details about the products available for sale on the platform.
*******Products Schema*******
id, UUID, NULL, A unique identifier for each product, stored as a UUID
name, TEXT, NULL, The name of the product
description, TEXT, NULL, A description of the product
price, int8, NULL, The price of the product
stock_quantity, int2, NULL, The quantity of the product available in stock
created_at, timestamptz, NULL, The timestamp indicating when the product was added to the system
updated_at, timestamptz, NULL, The timestamp indicating when the product information was last updated
*******Products Example Data CSV format*******
id,name,description,price,stock_quantity,created_at,updated_at
a6d96b6a-b6c4-11eb-8529-0242ac130003,Product A,Description of Product A,10,100,2024-02-28 10:00:00,2024-02-28 10:00:00
b6d96b6a-b6c4-11eb-8529-0242ac130003,Product B,Description of Product B,20,50,2024-03-01 09:00:00,2024-03-01 09:00:00
c6d96b6a-b6c4-11eb-8529-0242ac130003,Product C,Description of Product C,30,200,2024-03-02 08:00:00,2024-03-02 08:00:00

This schema is for the table Orders.
Description: This table represents orders placed by users on the platform
*******Orders Schema*******
id, UUID, NULL, A unique identifier for each order, stored as a UUID
user_id, UUID, Users(id), The identifier of the user who placed the order, referencing the id column in the Users table
order_date, timestamptz, NULL, The timestamp indicating when the order was placed
total_amount, int8, NULL, The total amount of the order, stored as an integer
payment_status, TEXT, NULL, The status of the payment for the order
created_at, timestamptz, NULL, The timestamp indicating when the order was created
updated_at, timestamptz, NULL, The timestamp indicating when the order information was last updated
*******Orders Example Data CSV format*******
id,user_id,order_date,total_amount,payment_status,created_at,updated_at
a6d96b6a-b6c4-11eb-8529-0242ac130003,a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11,2024-03-01 12:00:00,50,Paid,2024-03-01 12:00:00,2024-03-01 12:00:00
b6d96b6a-b6c4-11eb-8529-0242ac130003,b6d96b6a-b6c4-11eb-8529-0242ac130003,2024-03-02 11:00:00,60,Pending,2024-03-02 11:00:00,2024-03-02 11:00:00
c6d96b6a-b6c4-11eb-8529-0242ac130003,c6d96b6a-b6c4-11eb-8529-0242ac130003,2024-03-02 15:00:00,40,Paid,2024-03-02 15:00:00,2024-03-02 15:00:00

This schema is for the table OrderItems.
Description: This table stores details about the individual items included in each order.
*******OrderItems Schema*******
id, uuid, NULL, A unique identifier for each order item, stored as a UUID
order_id, uuid, Orders(order_id), The identifier of the order to which the order item belongs, referencing the id column in the Orders table
product_id, uuid, Products(product_id), he identifier of the product included in the order item, referencing the id column in the Products table
quantity, int4, NULL, The quantity of the product included in the order item, stored as an integer
unit_price, int8, NULL, The price per unit of the product included in the order item
created_at, timestamptz, NULL, The timestamp indicating when the order item was created
updated_at, timestamptz, NULL, The timestamp indicating when the order item information was last updated
*******OrderItems Example Data CSV format*******
id,order_id,product_id,quantity,unit_price,created_at,updated_at
a6d96b6a-b6c4-11eb-8529-0242ac130003,a6d96b6a-b6c4-11eb-8529-0242ac130003,a6d96b6a-b6c4-11eb-8529-0242ac130003,2,10,2024-03-01 12:00:00,2024-03-01 12:00:00
b6d96b6a-b6c4-11eb-8529-0242ac130003,b6d96b6a-b6c4-11eb-8529-0242ac130003,b6d96b6a-b6c4-11eb-8529-0242ac130003,3,20,2024-03-02 11:00:00,2024-03-02 11:00:00
c6d96b6a-b6c4-11eb-8529-0242ac130003,c6d96b6a-b6c4-11eb-8529-0242ac130003,c6d96b6a-b6c4-11eb-8529-0242ac130003,1,30,2024-03-02 15:00:00,2024-03-02 15:00:00

This schema is for the table Reviews.
Description: This table records reviews submitted by users for products on the platform.
*******Reviews Schema*******
id, UUID PRIMARY KEY, NULL, A unique identifier for each review, stored as an integer with auto-incrementation
user_id, INT REFERENCES Users(user_id), NULL, The identifier of the user who wrote the review, referencing the id column in the Users table
product_id, INT REFERENCES Products(product_id), NULL, The identifier of the product being reviewed, referencing the id column in the Products table
rating, INT, NULL, The rating given by the user for the product
comment, TEXT, NULL, The comment or review text provided by the user
created_at, timestamptz, NULL, The timestamp indicating when the review was created
updated_at, timestamptz, NULL, The timestamp indicating when the review information was last updated

*******Reviews Example Data CSV format*******
id,user_id,product_id,rating,comment,created_at,updated_at
1,1,1,4,"Good product, worth the price",2024-03-01 13:00:00,2024-03-01 13:00:00
2,2,2,5,"Excellent quality, highly recommend",2024-03-02 12:00:00,2024-03-02 12:00:00
3,3,3,3,"Average product, needs improvement",2024-03-02 16:00:00,2024-03-02 16:00:00

Now all I had to do is implement it in using Streamlit, LangChain and OpenAI. Before I start writing the code, I created requirements.txt. Added all the dependencies.

langchain
langchain-community
langchain_core
langchain_openai
openai
streamlit
python-dotenv
tiktoken
redis

Then I created a file called streamlit.py. Initially, I loaded the environment variables. Subsequently, I established a Redis chat message history and initialized both the memory instance and the LLM (OpenAI) instance. Next, I primed the chatbot by providing the complete database schema context, chat history, and user question utilizing SystemMessagePromptTemplate, MessagesPlaceholder, and HumanMessagePromptTemplate, ultimately constructing the ChatPromptTemplate. Finally, executed streamlit run ./streamlit.py to start the server. This is how I successfully curated the LLM Chain that can be used to generate queries.

Below is the full implementation.

import os
from dotenv import load_dotenv
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    SystemMessagePromptTemplate,
)
from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain_openai import ChatOpenAI

import streamlit as st

# Load environment variables from .env file
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_ORG_ID = os.getenv("OPENAI_ORG_ID")

REDIS_URL = os.getenv("REDIS_URL")
REDIS_TTL = os.getenv("REDIS_TTL")


def main():
    sessionId = st.text_input("Session Id")
    question = st.text_input("Ask a question (query/prompt)")

    if st.button("Submit Query", type="primary"):
        # create Redis chat message history
        message_history = RedisChatMessageHistory(
            url=REDIS_URL,
            ttl=int(REDIS_TTL),
            session_id=sessionId,
        )

        # create memory instance
        memory = ConversationBufferMemory(
            memory_key="chat_history",
            chat_memory=message_history,
            input_key="question",
            return_messages=True,
        )

        # llm instance (OpenAI)
        llm = ChatOpenAI(
            temperature=0,
            model_name="gpt-3.5-turbo",
            openai_api_key=OPENAI_API_KEY,
            openai_organization=OPENAI_ORG_ID,
        )

        system_prompt_prime_assistant_template = """
            You are a senior Database Engineer who has 20+ years of experience working with SQL databases.
            Your job is to build SQL queries for PostgreSQL databases.
            Use the example data CSV format set and the comment given for the columns to generate the most accurate SQL queries.
        """

        system_prompt_table_pre_template = """
            The schema given is in csv format. Here is what each comma separated values mean.
            column_name, column_type, foreign_key, description

            For example: product_id, uuid, Products(product_id), id of the product
        """

        system_prompt_users_template = """
        This schema is for the table Users.
        Description: Stores information about users registered on the e-commerce platform.
        *******Users Schema*******
        id, UUID, NULL, A unique identifier for each user, stored as a UUID
        email, TEXT, NULL, The email address of the user, stored as text
        created_at, timestamptz, NULL, The timestamp indicating when the user account was created
        updated_at, timestamptz, NULL, The timestamp indicating when the user account was last updated
        *******Users Example Data CSV format*******
        id,username,email,password,created_at,updated_at
        a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11,user1,user1@example.com,2024-03-01 10:00:00,2024-03-01 10:00:00
         b6d96b6a-b6c4-11eb-8529-0242ac130003,user2,user2@example.com,2024-03-02 09:00:00,2024-03-02 09:00:00
        c6d96b6a-b6c4-11eb-8529-0242ac130003,user3,user3@example.com,2024-03-02 10:00:00,2024-03-02 10:00:00
        """

        system_prompt_products_template = """
        This schema is for the table Products.
        Description: This table holds details about the products available for sale on the platform.
        *******Products Schema*******
        id, UUID, NULL, A unique identifier for each product, stored as a UUID
        name, TEXT, NULL, The name of the product
        description, TEXT, NULL, A description of the product
        price, int8, NULL, The price of the product
        stock_quantity, int2, NULL, The quantity of the product available in stock
        created_at, timestamptz, NULL, The timestamp indicating when the product was added to the system
        updated_at, timestamptz, NULL, The timestamp indicating when the product information was last updated
        *******Products Example Data CSV format*******
        id,name,description,price,stock_quantity,created_at,updated_at
        a6d96b6a-b6c4-11eb-8529-0242ac130003,Product A,Description of Product A,10,100,2024-02-28 10:00:00,2024-02-28 10:00:00
        b6d96b6a-b6c4-11eb-8529-0242ac130003,Product B,Description of Product B,20,50,2024-03-01 09:00:00,2024-03-01 09:00:00
        c6d96b6a-b6c4-11eb-8529-0242ac130003,Product C,Description of Product C,30,200,2024-03-02 08:00:00,2024-03-02 08:00:00
        """

        system_prompt_order_template = """
        This schema is for the table Orders.
        Description: This table represents orders placed by users on the platform
        *******Orders Schema*******
        id, UUID, NULL, A unique identifier for each order, stored as a UUID
        user_id, UUID, Users(id), The identifier of the user who placed the order, referencing the id column in the Users table
        order_date, timestamptz, NULL, The timestamp indicating when the order was placed
        total_amount, int8, NULL, The total amount of the order, stored as an integer
        payment_status, TEXT, NULL, The status of the payment for the order
        created_at, timestamptz, NULL, The timestamp indicating when the order was created
        updated_at, timestamptz, NULL, The timestamp indicating when the order information was last updated
        *******Orders Example Data CSV format*******
        id,user_id,order_date,total_amount,payment_status,created_at,updated_at
        a6d96b6a-b6c4-11eb-8529-0242ac130003,a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11,2024-03-01 12:00:00,50,Paid,2024-03-01 12:00:00,2024-03-01 12:00:00
        b6d96b6a-b6c4-11eb-8529-0242ac130003,b6d96b6a-b6c4-11eb-8529-0242ac130003,2024-03-02 11:00:00,60,Pending,2024-03-02 11:00:00,2024-03-02 11:00:00
        c6d96b6a-b6c4-11eb-8529-0242ac130003,c6d96b6a-b6c4-11eb-8529-0242ac130003,2024-03-02 15:00:00,40,Paid,2024-03-02 15:00:00,2024-03-02 15:00:00
        """

        system_prompt_order_items_template = """
        This schema is for the table OrderItems.
        Description: This table stores details about the individual items included in each order.
        *******OrderItems Schema*******
        id, uuid, NULL, A unique identifier for each order item, stored as a UUID
        order_id, uuid, Orders(order_id), The identifier of the order to which the order item belongs, referencing the id column in the Orders table
        product_id, uuid, Products(product_id), he identifier of the product included in the order item, referencing the id column in the Products table
        quantity, int4, NULL, The quantity of the product included in the order item, stored as an integer
        unit_price, int8, NULL, The price per unit of the product included in the order item
        created_at, timestamptz, NULL, The timestamp indicating when the order item was created
        updated_at, timestamptz, NULL, The timestamp indicating when the order item information was last updated
        *******OrderItems Example Data CSV format*******
        id,order_id,product_id,quantity,unit_price,created_at,updated_at
        a6d96b6a-b6c4-11eb-8529-0242ac130003,a6d96b6a-b6c4-11eb-8529-0242ac130003,a6d96b6a-b6c4-11eb-8529-0242ac130003,2,10,2024-03-01 12:00:00,2024-03-01 12:00:00
        b6d96b6a-b6c4-11eb-8529-0242ac130003,b6d96b6a-b6c4-11eb-8529-0242ac130003,b6d96b6a-b6c4-11eb-8529-0242ac130003,3,20,2024-03-02 11:00:00,2024-03-02 11:00:00
        c6d96b6a-b6c4-11eb-8529-0242ac130003,c6d96b6a-b6c4-11eb-8529-0242ac130003,c6d96b6a-b6c4-11eb-8529-0242ac130003,1,30,2024-03-02 15:00:00,2024-03-02 15:00:00
        """

        system_prompt_reviews_template = """
        This schema is for the table Reviews.
        Description: This table records reviews submitted by users for products on the platform.
        *******Reviews Schema*******
        id, UUID PRIMARY KEY, NULL, A unique identifier for each review, stored as an integer with auto-incrementation
        user_id, INT REFERENCES Users(user_id), NULL, The identifier of the user who wrote the review, referencing the id column in the Users table
        product_id, INT REFERENCES Products(product_id), NULL, The identifier of the product being reviewed, referencing the id column in the Products table
        rating, INT, NULL, The rating given by the user for the product
        comment, TEXT, NULL, The comment or review text provided by the user
        created_at, timestamptz, NULL, The timestamp indicating when the review was created
        updated_at, timestamptz, NULL, The timestamp indicating when the review information was last updated

        *******Reviews Example Data CSV format*******
        id,user_id,product_id,rating,comment,created_at,updated_at
        1,1,1,4,"Good product, worth the price",2024-03-01 13:00:00,2024-03-01 13:00:00
        2,2,2,5,"Excellent quality, highly recommend",2024-03-02 12:00:00,2024-03-02 12:00:00
        3,3,3,3,"Average product, needs improvement",2024-03-02 16:00:00,2024-03-02 16:00:00
        """

        messages = [
            SystemMessagePromptTemplate.from_template(
                system_prompt_prime_assistant_template
            ),
            SystemMessagePromptTemplate.from_template(system_prompt_table_pre_template),
            SystemMessagePromptTemplate.from_template(system_prompt_users_template),
            SystemMessagePromptTemplate.from_template(system_prompt_products_template),
            SystemMessagePromptTemplate.from_template(system_prompt_order_template),
            SystemMessagePromptTemplate.from_template(
                system_prompt_order_items_template
            ),
            SystemMessagePromptTemplate.from_template(system_prompt_reviews_template),
            MessagesPlaceholder(variable_name="chat_history"),
            HumanMessagePromptTemplate.from_template("""{question}"""),
        ]

        prompt = ChatPromptTemplate.from_messages(messages=messages)

        # create llm chain instance
        llm_chain = LLMChain(llm=llm, prompt=prompt, memory=memory, verbose=True)

        response = llm_chain({"question": question})

        st.write({**response})


if __name__ == "__main__":
    main()

Below is the result you will see when you prompt a question.

Challenges solved:

  • Ad Hoc Reporting: Users can quickly generate SQL queries to answer ad hoc questions or investigate specific data trends or anomalies.

  • Data Analysis and Reporting: Users can generate SQL queries to retrieve specific data subsets for analysis and reporting purposes. This can be useful in business intelligence, financial analysis, marketing analytics, etc.

  • Rapid Prototyping: Developers can use this tool to quickly prototype database queries and test different data retrieval or manipulation approaches without writing code manually. This tool will do majority of the job for you in seconds.

  • Query Optimization: Developers can use this tool to generate optimized SQL queries by leveraging best practices and performance tuning techniques, thus improving application performance and scalability

In conclusion, this tool not only brings value to the developers and DBAs, it also brings value to the project stakeholders by minimizing the cost associated with query generation for reports. Instead of multiple developers spending hours or even days on report generation, now a single developer can accomplish this task within just a couple of hours. This tool directly contributes to enhanced productivity and profitability.