We believe in AI and every day we innovate to make it better than yesterday.
We believe in helping others to benefit from the wonders of AI and also in
extending a hand to guide them to step their journey to adapt with future.
What would you do in two days? Let me be more precise, what would you do on a weekend? Depending on the kind of person you are, the answers may differ. Some may wanna stay in, have a good sleep, take it slow. If you are like me, you would be on the road riding […]
What would you do in two days? Let me be more precise, what would you do on a weekend? Depending on the kind of person you are, the answers may differ. Some may wanna stay in, have a good sleep, take it slow. If you are like me, you would be on the road riding a bike to that one peaceful getaway. Maybe you want to go on a date with your dearest one.
But if you asked me the same a couple of weeks back, you would be laughing your head off after hearing what I say. Hold on to your head guys…here is what it was. Along with my friends, I wanted to speak to data. Yes, you heard me, we were planning a scheme to talk with data and databases. Don’t get me wrong, I am not insane.
Let me brief you on what is going on. My company is organizing its yearly hackathon, the Accubits Innovation Challenge. It’s an annual event that sees technologists and innovators from around the world collaborating and contributing to developing the technologies for tomorrow. I was part of an internal team of researchers and our idea was to develop a unified data aggregation and interpretation tool powered by natural language processing to perform data analytics. The premise of the idea is to let anyone perform data science jobs by just having a conversation with the tool.
We laid the groundwork by noting down solution approaches on sticky notes and did a lot of caffeine powered brainstorming. The end result was the realization that the more we think about it, the more complex the solutions become. We agreed upon building a minimum viable product (MVP) within the weekend. A modular approach was adopted where every section of the architecture remains isolated because we wanted the flexibility to tweak, modify or remove these modules as we continue development. An abstract break down of the architecture goes like this –
A chat UI for conversational inputs.
An NLP engine to make DB queries based on free-flowing conversational inputs.
A data parser to ingest raw data and create dumps to our central database
A Machine Learning (ML) backend that ingests data, creates subsets and decides the best model to fit the data for prediction.
A data visualizer.
The first step to building a product is to find a good, catchy name for it. After having some heated verbal exchanges and a couple of black eyes later, we settled upon the name InsightsBud. Think more about the name and we come to realize how aptly put together it is. The core idea behind the platform is to remove a data scientist from redundant duties like cleaning up the data and generating visual insights. Now InsightsBud can do this job for you, where it equips a layman to do data science tasks and generate correlations on data to bring business insights. The biggest beneficiaries of the platform are business executives, sales heads, and marketing leads to name a few.
Potential use cases, as well as the market impact of such a product, fueled the enthusiasm of the team. Everyone had their duties assigned and it was crunch time. I was tasked with developing the ML backend. The workflow was pretty straight forward. The system will receive some data and some information on which domain the data is from, eg: healthcare, transportation, etc. Although this info may not make much difference to what happens in the backend, having something is better than nothing.
Once a dataset is received we have some preprocessing involved which checks for data characteristics like strings, integers, booleans, etc. Headers for the data are compiled to make word ontologies to make processing and mapping of data points easier. Then we have our rule engine that fits subsets of the data to figure out the best ML model, its parameters and characteristics of the outcome. Although not the best method for scaling, this ensures less computational errors and reduces resource cost.
Predictive analytics, data insights, etc are created in real-time based on what the user asks. As an example let me tell you how the system reacts when a user inputs some data and chats with the bot. Say, you are someone from the healthcare department. You are asked to generate insights on data about patients from different parts of a state or province. The data contains info related to patient health, history of diagnosed diseases, geo-location, ethnic background, demography, etc. You can load this data to InsightsBud, answer certain questions related to the uploaded data and start asking questions to Bud. The question would be something like, “Hey Bud, what is the possibility of someone from X location having chronic arrhythmia?”. Based on data already uploaded to the system in the previous step, the model takes in location and type of disease to give a probability score for the chances of a person having the disease. The model that is generated for the data will be based on several attributes of the training data and is decided by a rule engine and model parameter estimation technique that relies on data subsets to evaluate the best fit model for any given data.
The UI team started their work by designing the user flow and the UX design. On an experimental basis, we tried out flutter for the application development but found that it is very unstable for our particular use case. This forced us to switch to Angular for the frontend. NodeJS was our choice for managing backend communication, whereas python was chosen for building the data loader and preprocessing backend.
During the development of InsightsBud, we incorporated our data ingestion tool called Gulpi, which acts as the data crawler and dumping mechanism that lets us integrate data sources like Slack, Twitter, Facebook, etc. A regular CSV input is also supported. Next comes the ML backend which was my prime focus. Due to the vastness of data sources, it was only sensible to come up with a somewhat generic solution that can handle different data characteristics like data types, sources, discipline to which the data belongs, etc.
The first step was to design a preprocessing rulebook that takes into account certain obvious characteristics of the data like, origin, trend, data type, and size. Based on the guidelines the crawler performs certain validation checks to ensure that input data clear some basic criteria to be considered a valid input to the system. Such an approach lets us filter out fluke data.
Insights from the crawler will also be used while preprocessing and data cleanup. Right after the cleanup, our backend algorithm generates models based on random subsets to evaluate the best model parameters for any given data. Immediately after this, the model suggested by the backend is trained with the entire dataset. Similar steps are repeated with key attributes within the data to generate models for every feature in the dataset as a function of every other attribute.
Once we felt that we have all done our part it was time for the marriage – we had to merge everything together for a seamlessly functioning prototype. All hell broke loose when we integrated each module. Although everyone worked together and knew what’s what, cross-platform integrations often hit roadblocks during initial integration.
The clock was ticking and we were just 15 hours from the product presentation. We had a sleepless night ahead. Foreseeing this we already stocked Red Bulls and snacks, which proved to be not so helpful after 6 hours into the night. It was four in the morning and nobody seemed to flinch their eyes. Bug fixes were underway and by 6 AM, the platform seemed to be working. It was accepting data and generating insights based on conversations with the bot. We planned to take a nap for a couple of hours before the events inaugural address.
Then suddenly someone reminded us that we have a product but no demo presentation or a pitch deck. This was bad. Even Though we have a viable product, an average pitch could mean disaster for the future of this platform. The main purpose of the presentation is to convince a group of investors about the relevance of the product, its market impact, how it will make them money and ultimately make them invest in the product.
We quickly got into action, prepared the deck and rushed to the presentation. The demo went well, investors were happy and we had to get some sleep. We slept so deep, that we missed the moment we were declared the winners of the event. But for us, a good satisfactory sleep was more worth it. You know, the feeling you have when you have accomplished something that everyone else said you couldn’t, it’s priceless and that sleep, my god, I have never had such a pleasant, satisfactory sleep in my entire life. The past two days taught me a lot of lessons in mentorship, teamwork, the importance of knowledge sharing and collaboration.
I felt the story of our journey needs to reach a lot of people because it has the potential to inspire or create an impact on an individual or group of individuals to become more proactive in terms of how he/she manages their time and gives them a lesson in teamwork. For getting more info about our journey and the things we do, please visit us at www.accubits.com. To get a more in-depth understanding of the working of InsightsBud and trying out the platform, visit www.insightsbud.com
This article was originally published in my personal blog
Vysakh is an AI enthusiast and developer at Accubits, who is actively involved in creating artificial intelligence solutions for our vast clientele. He is part of the AI R&D team at Accubits and has several noted research papers in domains like deep learning, computer vision, cybersecurity and natural language processing. He is a true believer and promoter of the open-source movement. Vysakh has developed many AI systems that are freely accessible and open for developers across the globe.