Booking.com customers learn the hard way that Unicode is tricky

5 hours ago 1
asdf
(Image credit: Shutterstock)

It's easy to mistake an "l" for a "1" or an "I" with a poorly designed typeface. (Ahem.) Fortunately, modern fonts tend to use a variety of techniques to disambiguate those easily confused alphanumeric characters. But those designs rarely account for ambiguity that results from similarities across different character sets, as a recent phishing campaign targeting Booking.com users demonstrates.

BleepingComputer reported that "the attack, first spotted by security researcher JAMESWT, abuses the Japanese hiragana character 'ん' (Unicode U+3093), which closely resembles the Latin letter sequence '/n' or '/~', at a quick glance in some fonts." The attacker's hope is that people will gloss over the funky character, follow the malicious link, and then fall prey to the malware they're distributing via this campaign.

Unicode has been exploited like this many times before—this is a relatively common way for spammers to make it past email filters, for example, or for particularly dedicated trolls to harass people online despite the prevalence of profanity filters. Yet it remains a difficult problem to solve because text rendering, much like DNS, is more cursed than most people realize. So let's go through a crash course on characters.

Computers originally supported the minimal American Standard Code for Information Interchange—or, as sane people call it, ASCII—standard. That was relatively simple: it allowed computers to deal with the 26 letters of the English alphabet in both their lowercase and uppercase forms, a smattering of critical punctuation, and various control codes that told the computer when to draw a new line, indent text, etc.

The Unicode Consortium says that Unicode "can encode up to roughly 1.1 million characters, allowing it to support all of the world’s languages and scripts in a single, universal standard" and that "all modern operating systems, computing environments, programming languages, and applications support the core of the Unicode Standard." So we can have cool things like emojis, punctuation, and non-English letters.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Nathaniel Mott is a freelance news and features writer for Tom's Hardware US, covering breaking news, security, and the silliest aspects of the tech industry.

Read Entire Article