top of page

Tatacoa AI: Predictive Analytics Strategy Overview

Tatacoa AI employs cutting-edge artificial intelligence and machine learning frameworks to maximize predictive analytics performance for real-world, high-value applications.

 

 

The Tatacoa AI Predictive Analytics solution is architected to execute advanced analytical methodologies by interfacing directly with both historical data repositories (e.g., data warehouses, data lakes) and real-time (streaming) data sources (e.g., event buses, IoT platforms).

 

This dual-mode capability enables the unified application of sophisticated predictive techniques:

 

 

- Longitudinal & Time-Series Analysis: By analyzing past database records (e.g., transactional data, event logs, historical sensor readings), the system models temporal patterns, establishes operational baselines, and identifies long-term trends.

- Real-Time Pattern Recognition & Anomaly Detection: The framework applies online learning and stream processing models to current data feeds—including sensor data, voice, environmental audio, video, and imagery—to detect critical deviations and identify complex events as they unfold.

- Contextual Data Fusion: The solution excels at correlating current, high-velocity inputs against past, contextual data drawn from historical databases. This process enriches real-time analytics, allowing for more accurate and nuanced predictions.

 

- This methodology transforms disparate past and current data streams into actionable, intelligible predictions in line with rigorously defined criteria, albeit with elevated technical integration requirements for accessing and processing diverse enterprise data systems.

The following architectures are being rigorously evaluated for their predictive power, adaptability, and operational scalability in a dual-mode data context—processing both historical databases (past) and live data streams (current):

A. Deep Neural Networks (DNN)

- Analyzes (Past Data): Delivers state-of-the-art modeling by discovering complex, non-linear relationships and hidden patterns within vast historical datasets (e.g., past transactional records, event logs, and sensor archives).

- Requires (Past Data): Mandates substantial computational resources and extensive, high-quality labeled historical datasets to be trained effectively and realize its full predictive potential.

- Applies (Current Data): Best deployed in data-rich environments with scalable infrastructure, enabling large-scale, high-accuracy forecasting by applying its historical-trained model to current, real-time data inputs.

B. Large Language Model (LLM)

- Analyzes (Past & Current Data): Attains advanced predictive capabilities by correlating current unstructured inputs (like text or voice commands) against past structured and unstructured databases for contextual understanding.

- Excels: Excels in tasks such as nuanced language understanding, generative analytics (e.g., summarizing current system status based on past events), and domain-specific reasoning by leveraging its broad pre-training.

- Applies (Hybrid): Offers significant scalability and flexibility. Can be adapted via Retrieval-Augmented Generation (RAG) to dynamically query current and past databases, grounding its predictions in real-time, proprietary information with moderate deployment complexity.

C. Reinforcement Learning (RL) with DNN Integration

- Analyzes (Current Data): Achieves superior predictive precision in adaptive, real-time, feedback-rich scenarios (e.g., dynamic decision systems) by continuously learning from the current database state and live data stream.

- Combines (Past & Current Data): Combines autonomous agent learning (reacting to current environmental feedback) with deep neural feature extraction (often trained on past data). Can use historical databases for "offline RL" to pre-train policies, enhancing sample efficiency before live deployment.

- Applies (Current Data): Ideal for optimizing decisions where actions taken now (based on current data) impact future states, supporting high-accuracy analytics from multimodal inputs.

 

This advanced, modular approach enables Tatacoa AI to deliver measurable business outcomes through accurate and scalable predictive analytics, tailored to complex, rapidly evolving operational contexts that depend on both historical and real-time data insights.

Tatacoa is a venture formed by partners Bernardo Rincón Cerón and Juan Manuel Carretero Toro.

The company's main assets are its expertise in Systems Engineering, mathematics, and AI, combined with strong commercial relationship and project development skills. Tatacoa aims to deliver efficient, high-quality, information-based processes and can organize high-quality, cost-effective teams tailored to client needs.

 

Company Location: Bogotá, Colombia

Founders

Success Cases

Georgia Tech, USA

Scope: Analysis of survey data from victims of human trafficking.

 

Activities:

Constructed a database of the survey data.

Sourced and integrated economic data on inequality from a Harvard University database and income from Kaggle.com.

Performed data preprocessing.

Built a Markov state transition matrix and graphs.

Conducted time-focused and socioeconomic analysis.

 

Outcome:

A global map of human trafficking origins and destinations, including the number of people involved (based on the survey population).

Analysis of the total number of cases per year using exponential smoothing to detect significant changes over time.

A table classifying the activities victims are involved in (e.g., "Agriculture," "Domestic work," "Prostitution") by categories such as "Work force," "Sex related," and "Slavery / exploitation".

Degree of relations between nations involved

image.png

Degree of relations between nations involved

image.png

Success Cases

SOP S.A.S. (Blueberry crop)

Scope: Construction of AI models to predict monthly and annual blueberry harvest volumes.

 

Activities:

Built a database using crop information and weather data collected every 5 minutes from an in-house station.

Conducted data cleaning, statistical analysis, data engineering (handling outliers, normalization), and correlation calculations.

Built eight different models and selected two (one monthly, one annual) for production.

Calculated the importance of each variable.

Used the central limit theorem to improve prediction performance.

 

Result:

The monthly model for 2024 achieved 96% annual cumulative accuracy. The median achieved the highest result at 96%.

The annual model for 2024 achieved a 94% annual accuracy for cumulative data.

Randomly selected examples of the presentation of values and envelope curves for model variables

image.png

Correlation indicators between all numerical variables in the model

image.png

SOP S.A.S. (Blueberry crop)

Purpose: Present an efficient, AI-based solution for predicting crop production, enabling agricultural clients to make data-driven decisions and enhance reliability in forecasting.

 

Crop Types Addressed: Covers both temporary (removed after harvest) and permanent crops (remain after harvest).

Variables Considered: Includes irrigation, phenology/crop cycle (sowing, flowering, harvesting), climate, soil quality, and production metrics.

 

Analytical Approach:

- Uses at least 2 years of historical data.

- Involves recognizing data status, cleaning bad data, descriptive analytics, variable selection, and correlation analysis to minimize model bias.

- Highlights the importance of each variable in crop productivity.

 

Predictive Methodologies: Employs AI models such as random forests, deep learning (deep neural networks), and support vector machines for processing complex, multi-variable agricultural data.

 

Case Study: Blueberry farming in Zipaquirá, Colombia, with 20 lots featuring different varieties; the goal was to forecast production across various time periods using multiple environmental and operational variables (climate, lot details, time).

 

Results and Benefits: Implementation led to improved resource optimization, better decision-making, and increased predictability for tropical farmers.

 

Conclusion: Invites stakeholders to adopt the AI-powered methodology to enhance agricultural productivity and optimize operations.

Success Cases

University of Texas at Austin (Seedling Classification)

Scope: To build a model that classifies seedling species based on an image.

 

Activities:

Performed data analysis using statistical methods.

Analyzed 12 different seedling species.

Built four models using different neural network techniques.

Selected the model with the best response, which utilized transfer learning

Example with a random selection of seedling images from the 12 species analyzed

image.png

Success Cases

University of Texas Hackathon Competition

Result: Placed second in 2022 and fourth in 2023.

Context: The project involved analyzing a dataset related to restaurants, with variables such as "Annual Turnover," "Cuisine," "Restaurant Zomato Rating," and "Overall Restaurant Rating"

Saving the outputs in a data frame and then exporting it to a “.csv” file with the appropriate “Registration Number”.

Success Cases

University of Texas at Austin (Bank Customer Retention)

Scope: To predict the retention of a bank customer (whether they "Exited") over a 6-month period.

Activities:

Analyzed customer data variables, including "CreditScore," "Age," "Tenure," "Balance," "NumOfProducts," and "IsActiveMember”.

 

Performed data cleaning and normalization.

Conducted univariate, correlation, and multivariate analysis.

Built and analyzed four AI models using neural networks.

Result: The best model was selected. A confusion matrix of this model (without fine-tuning) shows it achieved an Accuracy of 0.843.

Predict the retention of a bank customer within a 6-month period from a given point in time, based on the following list of variables

bottom of page