Geek Diaries: Blockchain and AI can defeat fake news!

Fake news is a term frequently used these days, especially related to the election of a certain U.S. president. Incorrect information on the Web is nothing new, so what is all this excitement about? The integration of social media to our lives is so deep, they became the primary interface between ourselves and the (digital) world. They are the means for chatting with family and friends, reading the news, do shopping and (soon) paying. This poses risks and one of them is misinformation. In any given day we could read through our social media apps an article from a New York Times reporter and an article from an anonymous blogger and “fall in the trap” of thinking they have the same validity just because we are consuming them from the same platform. Combined with our human nature of easily accepting information we receive, especially when it is in agreement with own world-view, leads to people believing things that are obviously debatable - to say the least.

This is a serious matter and we have to find defences against it because fake news have already produced very real results. Technology (e.g., social media) enabled this situation and technology must provide defenses against it. The good (and truthful) news is that the tools to recognize fake stories already exist.

I read in a few articles that Facebook will try to fight back using fact-checkers for news that were reported by many users to be fake. I do not know exactly their implementation strategy but relying solely on human fact-checkers is not going to cut it. The reason is obvious, there are a significantly more people writing fake stories than we can have fact-checkers and it takes more time to verify/disprove than to write a fake story. With the help of technology however, fact-checking can be a very viable solution. Specifically, using an approach similar to Blockchain and complemented by artificial intelligence.

The main idea behind the solution is simple: instead of trying to find out for each story if it is factual, focus on examining its source and its distributors. Additionally, we need to automatically recognize if the content of a story is very similar with stories proved to be fake.

To make a simplified example, if it is proven (fact-checkers!) that Dennis said many fake, we should hear any of his future stories with caution. Reminds you the story of the boy who cried wolf? Well, no one is blaming the villagers. Actually, Google is already taking a similar approach. Google APAC CMO Simon Kahn in his talk on the British chamber of commerce said “the issue that everyone in the industry is very focused on is how to curb it while still allowing people to have open information and networks. … One way of tackling it, is looking at sites that are purveyors of fake news and basically stopping advertising - cutting off oxygen so that they aren’t making money off it”. Cutting their advertising profits may decrease their motivation to vend fake news but it is not enough. If we can “strong-arm them” financially, then there are others who can fund them and push their own agenda with fake stories. It is more “honest” to simply hold them accountable for the content they produce. It is very important to know who are these purveyors of fake news and to be able to prove that indeed they are.

That sets the requirement of knowing the source of a story and via whom this story was disseminated. It turns out that there is a technology already that can do a pretty good job with that - Blockchain!

In case Blockchain does not ring a bell, I am pretty sure that Bitcoin does. Bitcoin is a popular peer-to-peer cryptocurrency and payment system that is already used widely. Well, Blockchain is the underlying technology that makes such a system possible. I will not go very deep into how Blockchain works, but I have to explain its workflow briefly so you can understand the benefits it provides and how they relate to our solution against fake news.

Imagine Blockchain as a ledger with entries like the table below. In this ledger all transactions between users in the network are recorded. Everyone involved (i.e., Nia, Mary, etc.) maintains its own exact copy of the ledger. In the metaphorical ledger below, each row describes a simple transaction between two people such as in transaction 2 where Mary sends 5$ to Helen. In Blockchain instead of rows we have blocks. Each block/transaction is connected to the previous forming a chain. The data in each block are cryptographically hashed. The blocks contain information about the reference of the previous block, the details of the transaction, a timestamp and proof of work for securing the block.

TheLedger
Transaction No.	From	To	Amount
1	Nia	Martin	10$
2	Mary	Helen	5$
3	Martin	Doug	9$

But how the blocks are chained to each other? Every time someone attempts a transaction, the transaction is published. At this time it is not still in the chain because it is not verified. The verification is performed by the so called miners which race between them to validate whether the transaction is valid (e.g. enough funds for it) but also to decrypt between which parties. The first miner that verifies it, puts the block in the chain and updates the rest so they can update their ledgers. This requires a lot of computational power and the miners do it because they get a fee for every transaction they verify. Another important detail is that Blockchain allows us to know the “story” of everyone’s money. What I mean is, that when Martin takes 10$ from Nia and gives 9$ of those to Doug, we know that these 9 dollars that he (Doug) now owns, came originally from Nia. Ok so what with all that?

Blockchain enables:

A complete history of the transactions (or whatever information we care about) and their relations
This history is publicly available, transparent and cannot be doctored. To be more precise, it is just extremely difficult to unnoticeably modify the information on the chain because of the hashing and the fact that everyone has a synchronized copy so it is required to corrupt the data of many places at once
It is not controlled by anyone. It will be a public record for all of us so we know who said what and eventually if things they claimed proved to be misleading

Applying such technology in the context of the fake news, allows us to know who created and disseminated the stories. At any given point, if a news article has been fact-checked and was found to be false, we are able to know its origin and distributors.

It is important to note that we will not “cut them off” by removing their stories but instead hold them accountable for the validity of their content. And if they prove to be vendors of fake news, assign them a deception score for any information they distribute. A higher score for the source but also a score for the subsequent distributors. When a story is published in the future, we can estimate its potential falsehood based on its source and distributors using their deception scores - algorithm has to be defined.

So now we have a way to quickly compute the estimated deceitfulness of a story but we need to scale it even more. For this we need to automatically recognize fake stories that are similar. What I mean is, different stories which refer to the same fake event. If you think about it, a single fake story or article cannot cause traction. It is when different articles and sources mention the same fake events that cause this false “verification” and convince people that it must be true. Well, when a story is fact checked and proved to be fake, then all related stories that claim the same fake information can be classified as fake. This automated process can already be achieved by computers with a decent degree of precision. Machine learning and Natural Language Processing can do that. Most of us had probably contact or have knowledge of such applications. A good example are applications for identifying plagiarism in research papers. A more naive (but more common) example is when our email client classifies certain emails as spam “because emails with similar contents were found to be spam”.

Automatically identifying stories that contain content known to be fake is by no means an easy task and the applications will not be performing perfectly. In fact it will be more of a fuzzy approach but we can use it to rate automatically stories which the classifier categorizes as fake with a high level of confidence. The sources and distributors of these articles will be assigned again with the appropriate deception score.

I said earlier that human fact-checkers alone cannot keep up with fake news purveyors. Well, an application which rates stories based on sources/disseminators and recognizes automatically stories with same content will be much faster. Start feeding it with data and it will quickly be able to produce accurate estimations.

The results of the application should be used by the platforms from which the stories are consumed. For instance, in Facebook next to a story, could show the estimated score of deceitfulness for the story. A crude sample of the idea is shown in the picture below:

But wait a minute! What in case our analysis errs? What if the story is not fake after all? The source will have the option to dispute the evaluation and bring forward evidence that it is true. After all, those claiming something to be true should be responsible to prove that indeed it is.