Data Privacy: For All the Cars

In “All the Parties,” a track on his new album titled “For All the Dogs,” Drake raps, “I bought the Rolls just to take it apart,” making reference to a custom Rolls-Royce Cullinan he owns. Now, while Drake is clearly flaunting his lavish lifestyle, the idea of “taking apart” a car sparked a curiosity in me to dissect the evolving landscape of transportation.

One growing trend in the mobility space is on-demand ride hailing, such as Uber, Lyft, and even Northwestern’s Safe Ride. A recent market report found that, the “Global Ride Hailing Services market size was valued at USD 54.5 billion in 2022 and is expected to expand at a CAGR of 21.5% during the forecast period, reaching USD 175 billion by 2028.”

The convenience of requesting rides from anywhere, however, comes with a hidden cost: data collection.

What’s the issue?

Companies, researchers, and other groups collect, store, and analyze anonymized data that includes “location stamps” – the geographical coordinates and time stamps of users. This information is sourced from mobile phone records, credit card transactions, public transportation cards, Twitter profiles, and mobile applications. Combining these datasets offers the potential for valuable insights into human travel patterns, which can be used to optimize transportation systems and urban planning.

On the other hand, location stamps are specific to individuals and can be exploited for malicious purposes. According to a MIT News article written by Rob Matheson, research shows that someone can identify and extract sensitive information about an individual even with just a few randomly selected points in mobility datasets, which becomes more straightforward with combined datasets.

What are the solutions?

Masking location data to prevent user identification in the event of data leaks, misuse, or breaches enhances user privacy but might result in reduced data usefulness and lower efficiency in location-based systems due to information loss.

What is the appropriate balance between ensuring data privacy and optimizing service performance when using a ride-hailing platform?

A paper that came out of MIT’s JTL Urban Mobility Lab in 2018 explored that delicate balance.

The study focused on transportation efficiency, measured by Vehicle Miles Traveled (VMT), and service quality (including waiting and riding times) in the context of daily home-to-work commuting by citizens in Pisa, Italy. The researchers chose this context because work commutes contribute significantly to traffic congestion and pollution and can reveal recurring route patterns and schedules.

The researchers covered three privacy-protecting techniques:

Obfuscation

K-anonymity

Cloaking

What do these terms mean?

The obfuscation mechanism the researchers employed in the study involved generating a random location within an area centered around the real location, with a prescribed radius, when workers were requesting commute rides.

K-anonymity ensures that individuals’ specific locations cannot be distinguished from those of at least “k - 1” other individuals. This is often achieved by adding dummy or fake locations to a user’s real location, creating a set of indistinguishable locations. For example, if “k” is set to 5, the technique will make it impossible to determine which of the 5 locations in the set belongs to a particular user.The researchers introduce the concept of l-diversity on top of k-anonymity, which further enhances protection of user information by requiring that the cloaked region, which includes the “k” individuals, must contain “l” different Points of Interest (POIs). It adds diversity to the locations and the types of places within the group. This additional layer helps prevent an attacker from inferring specific information about an individual’s movements or preferences even within a cloaked group of users.

In cloaking, an actual location is substituted with a representative point, like a centroid, within the corresponding region. In the given context, the researchers employed cloaking by calculating centroid locations for census blocks. These centroids serve as proxies for groups of individuals within those blocks, obscuring their exact locations.

Study Findings

These findings demonstrate that improved VMT outcomes can be achieved when users are willing to make trade-offs between convenience and privacy, mainly by opting for longer travel times rather than extended waiting times. For example, if users are willing to tolerate a detour time of at least 5 minutes, the increase in VMT due to privacy preservation is minimal, at less than a 10% increase. This suggests that by compromising on convenience, it is feasible to protect privacy with only a minor effect on VMT.

Among the privacy methods assessed, k-anonymity consistently surpasses obfuscation, while cloaking becomes the most effective approach when the spatial scope of k-anonymity widens.

The researchers suggest future directions for study: hiding only the employee’s origin for commuting services and exploring temporal privacy by concealing departure times. Additionally, they allude to exploring more advanced location privacy and anonymization methods like differential privacy.

Link to original article published on 10/10/2023

IAN LEI

Articles

Data Privacy: For All the Cars

What’s the issue?

What are the solutions?

What do these terms mean?

Study Findings

Read More