Talend on Talend: How to use machine learning for your marketing database segmentation

Talend on Talend: How to use machine learning for your marketing database segmentation


In today’s business world, marketing segmentation is a must have for every organisation. It helps you process and aim different targets in a market into multiple customer or prospect segments to enhance your marketing actions. Through this discipline, you can hold a crucial competitive advantage over your competitors because you can adapt your offer and your communication according to the identified groups of personas you want to address.


Successful marketing segmentation optimises the marketing efforts to provide better satisfied customers and therefore increases revenue and profitability. But in fact, it is rare that companies that define personas have their segmentation ready in their database. If you already have the data for each individual, you may want to tag them into bundles to land faster strategies out of your database. And that’s where Talend can help you accomplish your customised segmentation.


You may think that, nowadays, we are in a 1:1 personalised world and marketing shouldn’t care about getting your business leads bucketed. Amazon does personalize its homepage based on your interests, right? Well, most companies may first start using a group-of-many approach before going into an individual personal customisation. Always better to learn to walk before running.

First of all, you have to define your segments. There are two typical methods available to you: carrying out market research or analyse your existing database. If you combine them, these two techniques allow a more precise identification of the segments, a better knowledge of the company’s targets and therefore an optimization of the marketing campaigns and actions to be implemented.


As you are writing down your segments, have in mind these should not alter with time. As you are planning who can be your customers, make sure they shouldn’t switch into new categories or segments they belong to in a short span. Be precise but give enough space. Secondly, think about your products and their use case for each of the segments you are listing. You want your segments to be relevant to your business cases.


Once you have your segments listed out, you can set the theory over at the corner of a table. Now it is about setting your machine learning actions.

In Talend’s case, we had data for job title within the individuals that were in our database. The data was poor in quality and were representing 600k+ values in the database. It would have taken centuries to find a way to manually segment each of them into the buckets we have drafted on a decent accuracy. So, we started with the big volumes and assigned them into bundles to create a base where we could start supervising the learning of the machine. As we were obtaining at least 5% of the database with almost 98% accuracy, we took this dataset into a semantic analysis approach.


We were considering a syntactic approach where the distances between instances (strings) is based on a character level similarity (e.g. apple, appeal) and a semantic approach where the distances between instances (strings) was obtained with word embeddings that can capture closeness between two semantically close words (e.g. apple, banana).


This approach gave us a dictionary of words used in job titles, and therefore we could use it as an input in the machine learning model to start making predictions. As expected, this first approach held poor generalization capabilities. For that reason, we decided to implement an “active-learning” inspired method and include a human expert in the learning loop.


The learning loop includes someone that validates or invalidates the predictions of the machine learning model, so it improves accuracy over the multiple iterations performed. You validate, it predicts, and again, you validate, it predicts… We did it 6 times and we achieved 80% accuracy on annotated data. So, in a short matter of time, we came from 3% of the database with almost a total accuracy to 100% of the database segmented with 80% accuracy.

As of now, the more data we validate, the more accurate it gets. We are achieving almost 90% accuracy out of a dataset that contains more than 600,000 unique values.


Now what? It is fantastic to have the predictions, but now you have to write the data back into your system. The idea was to take the job title information in our CRM, run the machine learning model, and push back a cleaned job title plus the persona predictions back into the database. The easiest way we found to complete that task was with our own tool, Talend Pipeline Designer.


Talend Pipeline Designer is a lightweight data transformation tool that allows you to build data pipelines with a schema-ready user interface. It allows you to integrate any data — structured or unstructured — and design seamlessly in batch or streaming from a single interface. It connects easily to leading data sources such as Salesforce, Marketo, Amazon Redshift, Snowflake…


In the Talend Pipeline Designer Summer ’19 release, the component connector for Marketo came alive. We just had to use it to connect to our database and retrieve the job title. Essentially the pipeline is built like this:

You can also incorporate the active learning loop with the human validation for predictions in the pipeline by adding a flow that is capturing a certain amount of people where you can validate the predictions of the machine learning model as an output and decide if you want to write this value in the CRM you have connected with.


Once you have laid your pipeline you have to define who can get into it. This part is on Marketo. We chose to push the prospects or customers who had a job title and a certain amount of basic information but didn’t have a persona segmentation in their details. Based on these criteria, it is quite easy to push these people into lists in your database where your Marketo connector for Pipeline Designer connects with. Then you have to set-up a pace and a rhythm.

There were two different options: go batch or go stream. The batch allows you to write data at a defined schedule of time in the database. While the stream will be steady, continuous flow, allowing to write data while the rest of the data is still being received. We chose to take a batch mode as we are receiving inconsistent amount of leads per days and weeks. It allows us to keep control on the model while with a streaming mode we couldn’t manage how heavy we could charge the system performance.


To ensure the best performance for your integrations, when performing inserts or updates, records should be grouped into as few transactions as possible.  When retrieving records from a data store for submission, the records should always be aggregated prior to submission, rather than submitting a request for each individual change. We are calling updates for persona predictions at a pace of 50,000 updates per week maximum.

The results. I’m working in Marketing, and I can tell you, this is almost magical. With an extended dataset as we held, the machine learning is smarter than you. It better perceives slight differences in the words than you actually can. It understands the similarities between words within job titles in different languages. The segments we drafted are fulfilled with homogenous types of people. On a business perspective, it is a game-changer for our Marketing organization as it allows us to better understand our personas based on analytics we ran on the group of business leads that fell into the bundles.


Just to give a few examples, you can look at the difference in the web traffic behavior among the personas, look at how the sales team are converting them into customers as we can spot the cross-sell or upsell opportunities we can work on. It is also a great asset for (ABM). You can target accounts and personas simultaneously and deliver a highly personalized messaging based on the line of business and the person you are talking to.


With this I hope you enjoyed reading my approach to segmentation and it will be great to read if anyone has any thoughts or comments on this approach!


Credits: Sebastiao Correia, Raphaël Nedellec, Tarek Benkhelif, Thibaut Gourdel, Maedeh Afshari


This article was written by William Prunier and originally appeared on: https://www.linkedin.com/pulse/marketing-database-segmentation-using-machine-learning-prunier/


The post Talend on Talend: How to use machine learning for your marketing database segmentation appeared first on Talend Real-Time Open Source Data Integration Software.

Leave a Reply

Your email address will not be published. Required fields are marked *