Categories
Technology Thoughts

Privacy Nutrition Labels for the Top Apps of 2020

With the release of iOS and iPadOS 14.3, all app updates in the App Store are now required to include Privacy Details, or “nutrition labels”.

App Privacy Labels

At a high level, there are three categories of nutrition label:

  • Data Used to Track You
    • “May be used to track you across apps and websites owned by other companies”
  • Data Linked to You
    • “May be collected and linked to your identity”
  • Data Not Linked to You
    • “May be collected but it is not linked to your identity”

Within each category, there is additional info split into types of data collected and ways data is used.

Types of data an app can collect includes:

  • contact info
  • health & fitness
  • financial info
  • location
  • sensitive info
  • contacts
  • user content
  • browsing history
  • search history
  • identifiers
  • purchases
  • usage data
  • diagnostics
  • other data

Ways data is used include:

  • third-party advertising
  • developer’s advertising or marketing
  • analytics
  • product personalization
  • app functionality
  • other purposes
App Privacy 
See Details 
The developer, Zoom, indicated that the app's privacy practices may 
include handling of data as described below. For more information, see 
the developer's privacy policy. 
Data Linked to You 
The following data may be collected and linked to your identity: 
Location 
o 
Contact Info 
User Content 
Identifiers 
Usage Data 
Diagnostics 
Privacy practices may vary, for example, based on the features you use 
or your age. Learn More
Zoom Privacy Details – apps.apple.com

Putting it all together, when looking at an app in the store, like Zoom for example, you can see the app collects your location, contact info, user content, identifiers, usage data, and diagnostics and links the data to you. If this data was in the “not linked to you” category, the data would still be collected, but done so anonymously.

The top level information tells you what data the app collects, but to see how the data is used, you need to select the “See Details” link at the top right of the App Privacy section.

From the expanded view, you can see that Zoom collects data for advertising & marketing, analytics, and general app functionality. This may look like a lot, but Zoom’s data use is comparatively short. Details for Facebook’s data use scroll for days.

And the distinction between data collection and data use is important. For example, an app may collect your location and use it to tell you the weather nearby. Granting permission to location would make sense if you are downloading a weather app. But an app may also collect your location and use it to tell ad providers all the places you go. In this case, giving access to your location would be sketchy if you were downloading a calculator app.

There is also an inherent level of trust associated with Apple’s new model for privacy details, as for app developers:

“You’re responsible for keeping your responses accurate and up to date.”

This means, to apply these new privacy labels, app developers must self report their data use when submitting updates to the app store. Apple does not read through all the code or monitor network traffic to automatically create an app’s privacy details. 

Apps can change their behavior with any update, but developers are required to update on their own. App reviewers do not flag when the privacy details need an update.

So while the longevity and robustness of the new privacy nutrition labels remains to be seen, we can take a look at how the most popular apps of 2020 report their privacy nutrition details.

Top 2020 Apps

If you have updated to iOS 14.3, it’s interesting to flip through some of the apps you use to see how they report their data collection and use. Although, it’s not exactly easy to compare two apps.

Since Apple recently unveiled the top games and apps of 2020, you can look at all the privacy nutrition label details in search of trends from the apps everyone are using.

So I did. And compiled the Privacy Nutrition Label Data for the Top Apps of 2020.

This starts off with general info regarding what data is collected, then looks at how specific apps and games report data use, and finally lists insights and questions from the investigation. (All the spreadsheets and data are included at the end).

Nutrition Label Data

General statistics
  • 80 total apps
    • 20 free apps
    • 20 paid apps
    • 20 free games
    • 20 paid games
  • 51 updated to report privacy data
    • 32 apps
    • 19 games
  • Top collected data types across all three categories
    • identifiers (70)
    • usage data (70)
    • diagnostics (59)
    • purchases (46)
    • location (42)
    • user content (36)
    • contact info (35)
    • other data (21)
    • search history (16)
    • contacts (14)
    • financial info (12)
    • browsing history (11)
    • sensitive info (7)
    • health and fitness (6)
  • Top collected data types (used to track you)
    • identifiers (27)
    • usage data (23)
    • purchases (12)
    • contact info (10)
    • diagnostics (10)
    • location (10)
    • other data (8)
    • user content (4)
    • browsing history (3)
    • contacts (1)
    • financial info (1)
    • health and fitness (1)
    • search history (1)
    • sensitive info (1)
  • Top collected data types (linked to you)
    • usage data (30)
    • identifiers (28)
    • diagnostics (26)
    • user content (24)
    • purchases (23)
    • location (22)
    • contact info (22)
    • search history (13)
    • contacts (12)
    • other data (11)
    • financial info (10)
    • browsing history (7)
    • health and fitness (4)
    • sensitive info (4)
  • Top collected data types (not linked to you)
    • diagnostics (23)
    • usage data (17)
    • identifiers (15)
    • purchases (11)
    • location (10)
    • user content (8)
    • contact info (3)
    • sensitive info (2)
    • search history (2)
    • other data (2)
    • health and fitness (1)
    • financial info (1)
    • contacts (1)
    • browsing history (1)
By Apps and Games
  • Most types of data collection (17)
    • Facebook
    • Instagram
    • Spotify
    • Twitter
  • No data collection (* these are all paid apps/games)
    • HotSchedules
    • AutoSleep Track Sleep on Watch
    • Shadowrocket
    • EpocCam Webcamera for Computer
    • Arcadia – Arcade Watch Games
  • Only collects data not linked to you
    • Widgetsmith
    • Among Us!
  • Most data types used to track you
    • Twitter (7)
    • Subway Surfers (6)
    • Spotify (5)
Free vs Paid
  • Average types of data collected (overall)
    • Free (10.5)
    • Paid (3.6)
  • Median types of data collected (overall)
    • Free (10)
    • Paid (4)
  • Average types of data (used to track you)
    • Free (2.9)
    • Paid (0.3)
  • Average types of data (linked to you)
    • Free (6.3)
    • Paid (1.1)
  • Average types of data (not linked to you)
    • Free (1.3)
    • Paid (2.2)

Insights and Questions

Many of these points stem from the descriptions of Types of data and Data use sections of Apple’s privacy details page.

Free apps
On Apple’s categories:
  • “Identifiers” is a vague name, but it’s related to device and user IDs. These types of IDs are often static and used to link your information across apps and services
  • “User content” from apps not creating user content is interesting (Disney Plus and Netflix). Guessing these are related to the “Customer Support” category.
    • And how does an app have “User Content” not linked to you?
  • “Purchases” is not included by Netflix (as you can’t subscribe in the app)
On companies:
  • Google hasn’t updated info for any of their apps yet
  • Widgetsmith was a breakout iOS 14 app of the year. It only collects anonymous purchase and diagnostic data.
  • WhatsApp is Facebook’s least offensive app.
  • What is Spotify doing with browsing history?
  • Twitter is doing a lot of tracking
On trends:
  • “Data linked to you” is largest category and shows most first party data use
    • “Data used to track you” is “owned by other companies”
  • Companies should move usage data and diagnostics collection from “linked” to “not linked” categories
    • Free games do a somewhat better job collecting anonymous data (but also use the same data types to track you)
  • Top free apps do less data sharing (tracking) than expected

Overall, rules are new, so companies are still getting used to the categories. Guessing they’ve over-reported as it is easier to move to a more private usage category. Companies may interpret rules differently (Twitter vs Facebook vs TikTok, why so different?)

Free games
Paid apps
  • Top paid apps do less tracking and data collection overall
    • Also have most non-updated apps in the top 2020 list
  • “Data Not Collected” is a tag (took going through a lot of apps to find that out…)
App Privacy 
The developer, HotSchedules, indicated that the app's privacy 
practices may include handling of data as described below. For more 
information, see the developer's privacy policy. 
Data Not Collected 
The developer does not collect any data from this app. 
Privacy practices may vary, for example, based on the features you use 
or your age. Learn More
Paid games
  • Very few top games have updated
  • Seems Facebook SDK could require Identifiers, location, usage data, diagnostics
Overall
  • Apple, what’s up with the random ordering of data types? Seems to be consistent by count, but not across all apps
  • Health and fitness apps were not very popular this year
  • How do changes to data collection and use get reported? Is there a notification added to the nutrition label?

Wrap up

Probably can do a lot more analysis on all this data, but it’s the holidays and everyone is asking me why I’m working. So I’ll leave it at that. As more apps update with their privacy nutrition details, we can expect to learn more about about how the apps we use use our data, and how Apple’s new system changes with time.

Charts and Graphs

Here is all the raw data if you want to compare: Top 2020 Apps – Privacy Summary

☃️ 🛷 ❄️

Categories
Technology Thoughts

What we learned from Facebook this week

For all the talk with Facebook CEO Mark Zuckerberg in the US Senate and House this week, there was very little surprising content. We give consent to use the Facebook service, we upload images, write posts, and like articles. We have control at every step of our interaction to decide how much to share with Facebook and what we give the company is exactly what is given back to us in the data archive download tool. It’s shocking to see every interaction you’ve ever made on Facebook in one place, but there is nothing here we don’t expect. There is no post we didn’t make or image we didn’t take. Facebook remembers what we do on the service as long as we have an account.

But that doesn’t mean everything from the last week was old information.

What was clarified?

An important point Zuckerberg reiterated is that Facebook does not sell user data. This would be a silly business move because Facebook’s value to advertisers is in the uniqueness of its data. It is in Facebook’s best interests to keep it’s trove of data secure, as it requires advertisers to keep coming back. There’s no other place advertisers can go to get the same level of targeting.

Instead of selling data, Facebook actually collects all the details from every person “in the community” and compiles the best advertising opportunity for a given ad. Facebook assures advertisers their ad placement will reach the intended audience with the greatest possibility of interaction. It is this assurance that gives Facebook it’s gazillion dollar market cap.

The Cambridge Analytica case was different, but still Facebook never sold data. Instead, Cambridge Analytica got raw Facebook user data from an app developer who used a survey app to harvest data. In 2014, it was within Facebook terms for a 3rd party app developer to use the Facebook developer platform to collect just about all the information about you and all your friends ever entered onto the site.

Listen to Exponent episode 146 “Facebooks Real Mistake” (link at the end) for background on how Facebook’s past push to be a platform landed the company in this situation. The takeaway? Had Facebook realized it’s value as an ad network, the company would never have given the same level of data access in the first place.

This is why the current Facebook fiasco is not a data security breach, but a data privacy leak. Hackers did not break into Facebook systems to obtain user data, but a developer (which could have been anyone) used Facebook sanctioned tools to collect your information. Facebook has since locked down it’s platform to prevent such unrestricted access to user data, but it does not change the fact that massive amounts of user data left the platform seemingly without consent of its users. And yes, it’s true that by signing up you agreed to the terms that allowed developers to leverage the wide open API to gather profile information, but did you really know that was part of the agreement?

What was surprising and novel?

Did you check if your info was collected by Cambridge Analytica? Go ahead, I’ll wait ⌚😊

After you’ve read through your activity log and exported your data, take a minute and think about what stands out from the content (I think this tinfoil hat scandal is all a ploy to get us to go on Facebook even more. Feel free to finish reading in the meantime, the export takes a while). Once you get to the details, you can see the majority of the information came from you, but there is a small subset which reveals the inner working of the Facebook machine.

To put things in perspective, focus on your ad preferences and take a look at your ad demographics information. This is a window to the 9698 categories from the Senate hearing. Advertiser demographic is the result of running all our interactions on Facebook through a proprietary algorithm. Of all the information in the data archive, this piece is novel. We didn’t explicitly tell Facebook this information, but they determined it based on what we’ve done on the site.

This is why the Facebook hearing this week is only the tip of the iceberg. If we are concerned that Cambridge Analytica could sway an election with a slice of our data, what kind of power does Facebook have? Sure we didn’t entrust Cambridge Analytica with our data, but why does opting into a puppy video sharing service change our perception of possible psychological manipulation?

What does Facebook do with all our data? And what can they do?

We need greater transparency on how our data is used. I can control and know what I upload, but what happens with the data “I own” once it’s handed over?

When I upload a photo to Facebook, what algorithms are tuned as a result? How does the content of the photo affect ads I see?

WhatsApp communication is encrypted, so it’s private between those in the conversation, but in what way does Facebook link my WhatsApp, Instagram, Facebook accounts? I’ve logged into all three on the same device so they must know it’s the same person (even though I signed up for all three as separate users).

And what about activity coming from the same IP address or GPS location? Does Facebook correlate data of those physically closest to me, outside of our connections on it’s services? What about when I’m on Facebook but signed out?

The consumer facing fun part seems like a front for the stingy advertising business on the back end. What is the difference between the two? It’s telling that Zuckerberg doesn’t fully understand the difference (from questioning by Brian Schatz). From Facebook’s perspective, the “fun part” is the user feature set that drives advertising revenue. It’s the top of the funnel for all of Facebook’s algorithms and drives the companies valuation.

For a platform that relies on its users to generate value, the company doesn’t provide much information to said users on how the internal cogs work. Perhaps it’s best to be blissfully unaware, or maybe it’s not a requirement, but when 2 billion people feel like the product and not the customer, it’s reasonable for them to want a little more information on how they’re being used.

And if this is Facebook, what about Google? (You can also export Google data)

What can you do to stay in control?

  1. Adjust log-in behavior to prevent future data leaks
  2. Check permissions when using Facebook (or Google or any over service) to sign up for a new site. To keep the same convenience, sign up for a password manager like Dashlane or LastPass which can generate and remember a new login for each site you visit. This adds a layer of security to your accounts and removes the possibility of another Cambridge Analytica style data leak.
  3. Prevent cross site tracking
  4. Use a separate browser just for Facebook. Only log in to Facebook on that browser and do all your other web stuff in another. Or use extensions like Ghostery (which also tracks your trackers, so maybe just turn off the internet for the day…) or the Facebook Container for Firefox.
  5. Limit sharing data
  6. Just use Facebook less? Deactivate for a week and see how you feel. You can always reactivate.
    Go old school and use an rss reader.
    Stick with iMessage/FaceTime.
    This is always an option.

All sorts of links

Video of Zuckerberg’s Senate hearing (transcript) and appearance before House committee (transcript)
Day 2 from MIT Technology Review
What was Facebook Thinking by James Allworth
The Facebook Current and The Facebook Brand from Stratechery
Facebook and Cambridge Analytica Explained from NYTimes
Facebook’s Real Mistake and Facebook Fatigue from Exponent Podcast
Mark Zuckerberg is Either Ignorant or Deliberately Misleading Congress from The Intercept
Mark Zuckerberg on Facebook’s hardest year, and what comes next from Vox
What is GDPR?
General Data Protection Regulation
Coachella streams 1, 2, and 3