Bilingual Sentiment Analysis of Tweets from the Philippine Presidential Elections

I mapped out tweets from the elections to figure out how we divided as a nation, and where we met in the middle. Along the way, I worked with scrapers, virtual machines, and a few novel methods for working with bilingual data.

This is what 10,000 tweets from the Philippine Elections look like.

When social media users are only able to engage with like-minded peers, increasingly extreme worldviews are reinforced inside bubbles, free from critique or dissent. People don’t know how to disagree anymore, and there’s no better example of this than in politics. This is my attempt to grapple with this phenomenon, in the hopes of figuring out how we might meet in the middle again.


The goal of this project is to use statistical and machine learning techniques such as large language models, principal component analysis, and network analysis, to better comprehend political polarization on social media. Specifically, I aimed to do these three things:

  • Train large language models to classify bilingual tweets based on political affiliation, paying attention to what content is considered “extreme” or “obvious” by these models, and what might be considered “middle ground”, and thus recommended to users on both sides of the political spectrum
  • Quantifying and visualizing tweets as model-encoded embeddings, allowing us to visualize the relationships between tweets and their content
  • Using network analysis to understand interactions within and across political lines

You can follow along with the code I used on this github repository.

Scraping and Processing

The first task was to gather tweets from the time period of the Philippine elections that would represent a variety of political views. To do this, I queried:

  • General Philippine elections terms
  • The names of all six presidential candidates
  • Hashtags corresponding to the two leading opposing candidates, current president Bongbong Maros and opposition candidate Leni Robredo

I used twint to scrape these tweets. Afterwards, I had a look at the hashtags of the tweets to label them for a sentiment analysis task. 

To prepare the dataset for sentiment analysis, I constructed a labeled dataset, where tweets were either labeled as pro-Marcos or pro-opposition/anti Marcos. These labels were determined by the hashtags in the tweet. If a tweet contained at least one hashtag from one of the two categories, it was placed in that category. (If a tweet contained one hashtag from each, it was excluded from the dataset.)

  • Pro-Marcos: #BBMSaraUniTeam,  #BBMIsMyPresident2022,  #BBMSara2022, #BringBackMarcos
  • Pro-Opposition: #LeniKiko2022, #KulayRosasAngBukas, #LeniForPresident2022, #LeniKikoAllTheWay, #KayLeniTayo

This list is neither exhaustive nor comprehensive, but it provided me with a dataset of roughly 63,000 tweets for training and evaluation.


A language model is a machine learning model that takes in text as input and outputs a numerical representation of said text. Most large language models are designed to process and understand text in a single language. There is much more support for and research into English language models than either Filipino or multilingual language models, which might hint towards the best performance in that domain. Additionally, not only did the task at hand include tweets and English and tweets in Tagalog: a good majority of them involved tweets that were written in both. Taglish — seamless code-switching between Tagalog and English mid-sentence, a pervasive linguistic historical artifact.

To address these challenges, I came up with six approaches to dealing with this data.

The first three approaches all only use one model:

  • Model 1: Feed all plaintext through a Tagalog language model, and attach a classifier head. This is the least expensive method in terms of computation, both in input processing and training.
  • Model 2: Feed all plaintext through a multilingual language model, and attach a classifier head. This is the largest out of the three models that I tried using, which makes it the most computationally expensive to train.
  • Model 3: Translate all plaintext to English through Google’s Translate API, which leaves all texts detected to be English alone. Feed translated text through a widely used English language model, and attach a classifier head. Translation is expensive, but the model is relatively small and well-tuned. I opted to use bert-base-uncased instead of roberta-base, since most tweets do not observe proper capitalization.

The next three approaches use a combination of the Tagalog and English models:

  • Model 4: Feed all plaintext through both a Tagalog language model and English language model. Concatenate the representations, and attach a classifier head.
  • Model 5: Translate all Tagalog tweets to English, and all English tweets to Tagalog. Feed the original plaintext and complementary translation through the appropriate model. Concatenate the representations, and attach a classifier head.
  • Model 6: Translate all tweets into both Tagalog and English. Feed both translations through the appropriate models. Concatenate the representations, and attach a classifier head.

For all models, I removed the hashtags of the tweets first before generating predictions. This way, the model would not just memorize which hashtags meant what - it would look at the text and try to understand it.


I trained all of these models, first only training the classifier head, then later fine-tuning the weights. During the fine-tuning stage, I introduced a regularization term, denoted as alpha, which penalized models for deviating largely from the original model weights during training. I experimented with different levels of regularization and with different learning rates as well.

Since these models were large, and there were a lot of different hyperparameters to tune, I trained these models on a Google Cloud Virtual Machine with a NVIDIA T4 GPU. Training took about 3 days in total to complete.

Here are the results. For each model, I took the training epoch with the best out-of-sample loss for the specified learning rate and alpha (regularization hyperparameter).

Learning Rate CLS Only 1e-05 1e-06
Alpha 0.1 10.0 0.0 1.0 1.0 0.0 10.0 0.1
Model 1 0.250826 0.244631 0.180243 0.246818 0.237377 0.196726 0.194462 0.210757 0.196871
Model 2 0.263558 0.165568 0.165350 0.165003 0.172907 0.183602 0.181352 0.197714 0.182457
Model 3 0.406897 0.420875 0.234163 0.441585 0.388036 0.249107 0.247219 0.263557 0.247938
Model 4 0.227407 0.443742 0.182257 0.425447 0.387031 0.184038 0.183532 0.197998 0.185064
Model 5 0.224196 0.447964 0.185416 0.448696 0.380563 0.186712 0.185926 0.199786 0.185667
Model 6 0.222968 0.443559 0.190611 0.494647 0.423914 0.191236 0.190504 0.201342 0.190331

This writeup is in progress. Come back soon!