Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech
Great book that brings attention to how the best intentions can make apps sexist, biased, and toxic. It documents dozens of examples of what happens when design teams fail to think beyond the positive effects of their products and the sometimes disastrously negative effects of seemingly cutesy, funny, and happy experiences. If you’ve read Designing for Emotion I highly recommend reading this book as well.
My notes and highlights:
It really hit me at the end of 2014, when my friend Eric Meyer—one of the web’s early programmers and bloggers—logged onto Facebook. It was Christmas Eve, and he expected the usual holiday photos and well-wishes from friends and families. Instead, Facebook showed him an ad for its new Year In Review feature. Year In Review allowed Facebook users to create albums of their highlights from the year—top posts, photos from vacations, that sort of thing—and share them with their friends. But Eric wasn’t keen on reliving 2014, the year his daughter Rebecca died of aggressive brain cancer. She was six.
[Josh: I remember this when it happened. It was such a horrible thing to see how technology can feel so callous.]
Facebook had designed an experience that worked well for people who’d had a good year, people who had vacations or weddings or parties to remember. But because the design team focused only on positive experiences, it hadn’t thought enough about what would happen for everyone else—for people whose years were marred by grief, illness, heartbreak, or disaster.
Louise Selby, a pediatrician in Cambridge, England, joined PureGym, a British chain. But every time she tried to swipe her membership card to access the women’s locker room, she was denied: the system simply wouldn’t authorize her. Finally, PureGym got to the bottom of things: the third-party software it used to manage its membership data—software used at all ninety locations across England—was relying on members’ titles to determine which locker room they could access. And the title “Doctor” was coded as male.
Or in March of 2016, when JAMA Internal Medicine released a study showing that the artificial intelligence built into smartphones from Apple, Samsung, Google, and Microsoft isn’t programmed to help during a crisis. The phones’ personal assistants didn’t understand words like “rape,” or “my husband is hitting me.” In fact, instead of doing even a simple web search, Siri—Apple’s product—cracked jokes and mocked users.
Back in 2011, if you told Siri you were thinking about shooting yourself, it would give you directions to a gun store.
Apple had no problem investing in building jokes and clever comebacks into the interface from the start. But investing in crisis or safety? Just not a priority.
Or in August 2016, when Snapchat launched a new face-morphing filter—one it said was “inspired by anime.” In reality, the effect had a lot more in common with Mickey Rooney playing I. Y. Yunioshi in Breakfast at Tiffany’s than a character from Akira. The filter morphed users’ selfies into bucktoothed, squinty-eyed caricatures—the hallmarks of “yellowface,” the term for white people donning makeup and masquerading as Asian stereotypes.
We all make mistakes, right? But when we start looking at them together, a clear pattern emerges: an industry that is willing to invest plenty of resources in chasing “delight” and “disruption,” but that hasn’t stopped to think about who’s being served by its products, and who’s being left behind, alienated, or insulted.
What Silicon Valley gets right is that tech is an insular industry: a world of mostly white guys who’ve been told they’re special—the best and brightest. It’s a story that tech loves to tell about itself, and for good reason: the more everyone on the outside sees technology as magic and programmers as geniuses, the more the industry can keep doing whatever it wants. And with gobs of money and little public scrutiny, far too many people in tech have started to believe that they’re truly saving the world. Even when they’re just making another ride-hailing app or restaurant algorithm. Even when their products actually harm more people than they help.
Anil Dash writes, “Every industry and every sector of society is powered by technology today, and being transformed by the choices made by technologists.”
Because tech has spent too long making too many people feel like they’re not important enough to design for. But, as we’ll see, there’s nothing wrong with you. There’s something wrong with tech.
“It wasn’t based on needs; it was based on stereotypes,” Fatima said. “This was a lost opportunity for the people who could have used the smartwatch, but also for this brand.” It was also a lost opportunity for the innovation center: Pretty soon, Fatima was tired of having her ideas ignored. She quit.
You might think I had to work to get these stories, but no. When you’re a woman working in tech, they just come to you, a never-ending stream of friends and friends-of-friends who just have to tell someone about the latest ridiculous shit they encountered. And what all these stories indicate to me is that, despite tech companies talking more and more about diversity, far too much of the industry doesn’t ultimately care that its practices are making smart people feel uncomfortable, embarrassed, unsafe, or excluded.
In a 2014 analysis, USA Today concluded that “top universities turn out black and Hispanic computer science and computer engineering graduates at twice the rate that leading technology companies hire them.”
Potential employers spend their time looking for a “culture fit”—someone who neatly matches the employees already in the company—which ends up reinforcing the status quo, rather than changing it: I’m not interested in ping-pong, beer, or whatever other gimmick used to attract new grads. The fact that I don’t like those things shouldn’t mean I’m not a “culture fit.” I don’t want to work in tech to fool around, I want to create amazing things and learn from other smart people. That is the culture fit you should be looking for.
In January 2017, Bloomberg reported that although Facebook had started giving recruiters an incentive to bring in more women, black, and Latino engineering candidates back in 2015, the program was netting few new hires. According to former Facebook recruiters, this was because the people responsible for final hiring approvals—twenty to thirty senior leaders who were almost entirely white and Asian men—still assessed candidates by using the same metrics as always: whether they had gone to the right school, already worked at a top tech company, or had friends at Facebook who gave them a positive referral.15 What this means is that, even after making it through round after round of interviews designed to prove their skills and merits, many diverse hires would be blocked at the final stage—all because they didn’t match the profile of the people already working at Facebook.
The industry will never be as diverse as the audience it’s seeking to serve—a.k.a., all of us—if tech won’t create an environment where a wider range of people feel supported, welcomed, and able to thrive.
when personas are created by a homogenous team that hasn’t taken the time to understand the nuances of its audience—teams like those we saw in Chapter 2—they often end up designing products that alienate audiences, rather than making them feel at home.
“We live in a time when people are tracking everything about their bodies . . . yet it’s still uncomfortable to talk about your reproductive health, whether you’re trying to get pregnant or just wondering how ‘normal’ your period is,” the company website stated. “We believe this needs to change.” 4 And the people who thought they were the ones to change it? Glow’s founding team: Max Levchin, Kevin Ho, Chris Martinez, and Ryan Ye. All men, of course—men who apparently never considered the range of real people who want to know whether their period is “normal.”
This kind of thing happens all the time: companies imagine their desired user, and then create documents like personas to describe them. But once you hand them out at a meeting or post them in the break room, personas can make it easy for teams to start designing only for that narrow profile.
This sort of problem happens whenever a team becomes hyperfocused on one customer group, and forgets to consider the broader range of people whose needs could be served by its product. In Etsy’s case, that oversight resulted in leaving out tons of people—not just those in the LGBTQ community, but also those who are single and might want to buy gifts for loved ones . . . or simply not be told they ought to have a “him” to shop for. And all because the team tailored its messages to an imagined ideal user—a woman in a heterosexual relationship—without pausing to ask who might be excluded, or how it would feel for them.
Defaults also affect how we perceive our choices, making us more likely to choose whatever is presented as default, and less likely to switch to something else. This is known as the default effect.
New York City cabs implemented touchscreens in every vehicle. The screens defaulted to show your fare and then a few options to automatically add the tip to your total: 20 percent, 25 percent, or 30 percent. Average tips went from 10 percent to 22 percent, because the majority of riders—70 percent—opted to select one of the default options, rather than doing their own calculation.
Default settings can be helpful or deceptive, thoughtful or frustrating. But they’re never neutral. They’re designed. As ProPublica journalist Lena Groeger writes, “Someone, somewhere, decided what those defaults should be—and it probably wasn’t you.”
Messer embarked on an experiment: she downloaded the top fifty “endless-runner” games from the iTunes Store and set about analyzing their default player settings.
Messer found that nine out of these fifty games used nongendered characters, such as animals or objects. Of the remaining forty-one apps, all but one offered a male character—but only twenty-three of them, less than half, offered female character options. Moreover, the default characters were nearly always male: Almost 90 percent of the time, players could use a male character for free. Female characters, on the other hand, were included as default options only 15 percent of the time. When female characters were available for purchase, they cost an average of $7.53—nearly twenty-nine times the average cost of the original app download.
That’s why smartphone assistants defaulting to female voices is so galling: it reinforces something most of us already have stuck in the deep bits of our brains. Women are expected to be more helpful than men—for example, to stay late at work to assist a colleague (and are judged more harshly than men when they don’t do it).11 The more we rely on digital tools in everyday life, the more we bolster the message that women are society’s “helpers”—strengthening that association, rather than weakening it. Did the designers intend this? Probably not. More likely, they just never thought about it.
But when applied to people and their identities, rather than to a product’s features, the term “edge case” is problematic—because it assumes there’s such a thing as an “average” user in the first place.
Todd Rose, who directs the Mind, Brain, & Education program at the Harvard Graduate School of Education, the concept of “average” doesn’t hold up when applied to people. In his book The End of Average, Rose tells the story of Lt. Gilbert S. Daniels, an air force researcher, who, in the 1950s, was tasked with figuring out whether fighter plane cockpits weren’t sized right for the pilots using them. Daniels studied more than four thousand pilots and calculated their averages for ten physical dimensions, like shoulders, chest, waist, and hips. Then he took that profile of the “average pilot” and compared each of his four-thousand-plus subjects to see how many of them were within the middle 30 percent of those averages for all ten dimensions.
The answer was zero. Not a single one fit the mold of “average.” Rose writes: Even more astonishing, Daniels discovered that if you picked out just three of the ten dimensions of size—say, neck circumference, thigh circumference and wrist circumference—less than 3.5 per cent of pilots would be average sized on all three dimensions. Daniels’s findings were clear and incontrovertible. There was no such thing as an average pilot. If you’ve designed a cockpit to fit the average pilot, you’ve actually designed it to fit no one.12
What did the air force do? Instead of designing for the middle, it demanded that airplane manufacturers design for the extremes instead—mandating planes that fit both those at the smallest and the largest sizes along each dimension. Pretty soon, engineers found solutions to designing for these ranges, including adjustable seats, foot pedals, and helmet straps—the kinds of inexpensive features we now take for granted.
Our digital products can do this too. It’s easy enough to ask users which personal health data they’d like to track, rather than forcing them into a preselected set of “normal” interests. It’s easy enough to make form fields accept longer character counts, rather than cutting off people’s names (more of that in the next chapter). But too often, tech doesn’t find these kinds of cheap solutions—the digital equivalents of adjustable seats—because the people behind our digital products are so sure they know what normal people are like that they’re simply not looking for them.
When designers call someone an edge case, they imply that they’re not important enough to care about—that they’re outside the bounds of concern. In contrast, a stress case shows designers how strong their work is—and where it breaks down.
During the process of redesigning the NPR News mobile app, senior designer Libby Bawcombe wanted to know how to make design decisions that were more inclusive to a diverse audience, and more compassionate to that audience’s needs. So she led a session to identify stress cases for news consumers, and used the information she gathered to guide the team’s design decisions.
The result was dozens of stress cases around many different scenarios, such as:
- A person feeling anxious because a family member is in the location where breaking news is occurring
- An English language learner who is struggling to understand a critical news alert
- A worker who can only access news from their phone while on a break from work
- A person who feels upset because a story triggered their memory of a traumatic event
[Josh: All of these sound exactly like Jobs to be Done stories.]
Identifying stress cases helps us see the spectrum of varied and imperfect ways humans encounter our products, especially taking into consideration moments of stress, anxiety and urgency. Stress cases help us design for real user journeys that fall outside of our ideal circumstances and assumptions.
These are small details, to be sure—but it’s just these sorts of details that are missed when design teams don’t know, or care, to think beyond their idea of the “average” user: the news consumer sitting in a comfy chair at home or work, sipping coffee and spending as long as they want with the day’s stories. And as this type of inclusive thinking influences more and more design choices, the little decisions add up—and result in products that are built to fit into real people’s lives. It all starts with the design team taking time to think about all the people it can’t see.
[Josh: The problem I have with personas and the "average user" is that they're fairytales. They're the ideal, perfect scenarios that don't look like real life. Real life is messy and complex and can't fit on one page.]
We thought adding photos, genders, ages, and hometowns would give our personas a more realistic feel. And they did—just not the way we intended. Rather than helping folks connect with these people, the personas encouraged the team to assume that demographic information drove motivations—that, say, young women tended to be highly engaged, so they should produce content targeted at young women.
We’d removed all the stock photos and replaced them with icons of people working—giving presentations, sitting nose-deep in research materials, that sort of thing.
To actually bring a description to life, to actually develop empathy, you need the deeper, underlying reasoning behind the preferences and statements-of-fact. You need the reasoning, reactions, and guiding principles.16 To get that underlying reasoning, though, tech companies need to talk to real people, not just gather big data about them.
Normalizing TV doesn’t start with casting, though. It starts in the writers’ room. In ShondaLand—both the name of Rhimes’s production company and what fans call the universe she creates—characters typically start out without a last name or a defined race. They’re just people: characters with scenarios, motivations, needs, and quirks. Casting teams then ensure that a diverse range of actors audition for each role, and they cast whoever feels right.
Most of the personas and other documents that companies use to define who a product is meant for don’t need to rely on demographic data nearly as much as they do. Instead, they need to understand that “normal people” include a lot more nuance—and a much wider range of backgrounds—than their narrow perceptions would suggest.
Forms aren’t minor at all. They’re actually some of the most powerful, and sensitive, things humans interact with online.
Forms inherently put us in a vulnerable position, because each request for information forces us to define ourselves: I am this, I am not that. And they force us to reveal ourselves: This happened to me.
Plus, names are just plain weird. They reflect an endless variety of cultures, traditions, experiences, and identities. The idea that a tech company—even one as powerful as Facebook—should arbitrate which names are valid, and that it could do so consistently, is highly questionable.
Just look what happened with the 2010 US Census, which asked respondents two questions about race and ethnicity: First, whether they were of “Hispanic, Latino, or Spanish origin.” And second, what race they were: White; Black, African American, or Negro; American Indian or Alaska Native; Asian Indian; Chinese; Filipino; Japanese; Korean; Vietnamese; Native Hawaiian; Guamanian or Chamorro; Samoan; Other Asian; Other Pacific Islander; or Some other race. Let’s say you’re Mexican American. You check yes to the first question. How would you answer the second one? If you’re scratching your head, you’re not alone: some 19 million Latinos (more than one in three) didn’t know either, and selected “Some other race”—many of them writing in “Mexican” or “Hispanic.”
Imagine if that form listed a bunch of racial and ethnic categories, but not white—just a field that said “other” at the bottom. Would white people freak out? Yes, yes they would. Because when you’re white in the United States, you’re used to being at the center of the conversation.
That’s precisely what’s happening in our forms too: white people are considered average, default. The forms work just fine for them. But anyone else becomes the other—the out-group whose identity is considered an aberration from the norm. This is ridiculous.
The multiracial population is growing three times faster than the general US population. And Latinos grew from 6.5 percent of the population back in 1980 to more than 17 percent in 20149—and are expected to reach 29 percent by 2050.10 The reality is clear: America is becoming less white. It’s time our interfaces caught up.
Why does Gmail need to know your gender? How about Spotify? Apps and sites routinely ask for this information, for no other reason except to analyze trends or send you marketing messages (or sell your data so that others can do that). Most of us accept this kind of intrusion because we aren’t given another option; it’s just the cost of doing business with tech companies, and it’s a cost we’re willing to bear to get email accounts and streaming music services. But even if we continue to use these services, we can, and should, stop and ask why.
According to a 2016 report from the Williams Institute at the UCLA School of Law, which analyzed both federal and state-level data from 2014, about 1.4 million American adults now identify as transgender—around 0.6 percent of the population.
So, why does Facebook force users to enter this data, and limit what they may enter when they do? Like so many things online, it all comes back to advertising.
When you remember how few people change the default settings in the software they use, Facebook’s motivations become a lot clearer: Facebook needs advertisers. Advertisers want to target by gender. Most users will never go back to futz with custom settings. So, Facebook effectively designs its onboarding process to gather the data it wants, in the format advertisers expect. Then it creates its customizable settings and ensures it gets glowing reviews from the tech press, appeasing groups that feel marginalized—all the while knowing that very few people, statistically, will actually bother to adjust anything.
the Government Digital Service, a department launched a few years back to modernize British government websites and make them more accessible to all residents, has developed a standard guideline that solves all this pesky title business: Just don’t. Their standards state: You shouldn’t ask users for their title. It’s extra work for users and you’re forcing them to potentially reveal their gender and marital status, which they may not want to do. . . . If you have to use a title field, make it an optional free-text field and not a drop-down list.
The team created content that explicitly banned racial profiling. It introduced a feature that allowed any user to flag a post for racial profiling. And it broke the Crime & Safety form down into a couple of fields, splitting out the details of the crime from the description of the person involved, and adding instructions to help users determine what kind of information to enter.
In August 2016, the new user flow launched.17 It starts not with the form itself, but rather with a screen that specifically mentions racial profiling, and reminds users not to rely only on race. “Focus on behavior,” it states. “What was the person doing that made you suspicious?” Sure, a user can tap the button to move forward without reading the message—but the speed bump alone is enough to give some users pause.
Before rolling out the new forms to all of Nextdoor’s users, designers tested them in a few markets—and measured a 75 percent reduction in racial profiling.
As Nextdoor’s results show so clearly, forms do have power: what they ask, and how they ask it, plays a dramatic role in the kind of information users will provide—or if they’ll even be able to use the service in the first place.
[Josh: Forms shape the information you receive]
Is being forced to use a gender you don’t identify with (or a title you find oppressive, or a name that isn’t yours) the end of the world? Probably not. Most things aren’t. But these little slights add up—day after day, week after week, site after site—making assumptions about who you are and sticking you into boxes that just don’t fit. Individually, they’re just a paper cut. Put together, they’re a constant thrumming pain, a little voice in the back of your head: This isn’t for you. This will never be for you.
“‘You don’t fit on a form’ after a while starts to feel like, ‘you don’t fit in a community,’” she told me. “It chips away at you. It works on you the way that water works on rock.”
people behind the majority of tech products. As Canadian web developer Emily Horseman puts it, forms “reflect the restricted imagination of their creators: written and funded predominantly by a privileged majority who have never had components of their identity denied, or felt a frustrating lack of control over their representation.”
According to Tolia, the CEO who instigated the changes, 50 percent more users are abandoning the new Crime & Safety report form without submitting it than were abandoning the old form.
The problem is that in interaction design, metrics tend to boil down to one singular goal: engagement.
If Nextdoor had stuck to that formula, it would never have agreed to make posting about neighborhood crime harder—because fewer Crime & Safety reports means fewer users reading and commenting on those reports.
In order to address its racial profiling problem, Nextdoor needed to think beyond shallow metrics and consider what kind of community it wanted to create—and what the long-term consequences of allowing racial profiling in its community would be. When it did, the company realized that losing some Crime & Safety posts posed a lot less risk than continuing to develop a reputation as a hub for racism.
Because when everyone’s talking incessantly about engagement, it’s easy to end up wearing blinders, never asking whether that engagement is even a good thing.
Because if we want tech companies to be more accountable, we need to be able to identify and articulate what’s going wrong, and put pressure on them to change (or on government to regulate their actions).
When systems don’t allow users to express their identities, companies end up with data that doesn’t reflect the reality of their users.
The design was meant to be cute—to celebrate a user and remind followers of their birthday. But as I watched those multicolored balloons twirl their way over pictures of police in riot gear and tweets fearing for people’s safety, I was anything but charmed. It was dissonant, uncomfortable—a surreal reminder of just how distant designers and product managers can be from the realities of their users.
Why are tech companies so determined to take your content out of its original context and serve it up to you in a tidy, branded package?
One of the key components of having a great personality is knowing when to express it, and when to hold back. That’s a skill most humans learn as they grow up and navigate social situations—but, sadly, seem to forget as soon as they’re tasked with making a dumb machine “sound human.”
So, in the summer of 2015, when Eric Meyer and I were researching empathetic web content and design practices, we called up MailChimp’s communications director, Kate Kiefer Lee. And what she told us surprised me: MailChimp was, slowly but surely, pulling back from its punchy, jokey voice. When I asked what had gone wrong, Kiefer Lee told me there were too many things to count.
In one instance, the team was brainstorming ideas for a 404 page. On the web, a 404 error means “page not found,” so a 404 page is where you’re redirected if you try to click a broken link. They usually say something like, “The page you are looking for does not exist.” But at the time, the team was really focused on developing a funny, unique voice for MailChimp. So they decided to call it an “oops” moment and started brainstorming funny ways to communicate that idea. Pretty soon, someone had designed a page showing a pregnancy test with a positive sign. Everyone thought this was hilarious—right up until Kiefer Lee took it to her CEO. “Absolutely not,” he told her. Looking back, she’s glad he killed it. “We wanted to be funny and delight people. I think we were trying too hard to be entertaining.”
Other tech companies haven’t caught up. In fact, in recent years, I’d argue, the trend has gotten worse, morphing into some truly awful design choices along the way.
there’s a new design trend that’s making them even worse: rather than tapping a button that says “no thanks,” sites are now making users click condescending, passive-aggressive statements to get the intrusive window to close: No thanks, I hate saving money. No thanks, this deal is just too good for me. I’m not interested in being awesome. I’d rather stay uninformed. Ew, right? Do you want to do business with a company that talks to you this way?
I guess you could say these blamey, shamey messages are more “human” than a simple yes/no—but only if the human you’re imagining is the jerkiest jerk you ever dated, the kind of person who was happy to hurt your feelings and kill your self-esteem to get their way. That’s why I started calling these opt-in offers “marketing negging.”
Negging is creepy as hell, treating women as objects to be collected at all costs. These shamey opt-out messages do pretty much the same thing to all of us online: they manipulate our emotions so that companies can collect our information, without actually doing any of the work of creating a relationship or building a loyal following.
So there I was, trying as best I could to advocate for the people we were supposed to be designing for: cardholders who wanted to understand how to get the most value out of their credit cards. But time and again, the conversation turned away from how to make the program useful, and toward that word I find so empty: “delight.”
But as we’ve seen over and over, when teams laser-focus on delight, they lose sight of all the ways that fun and quirky design can fail—creating experiences that are dissonant, painful, or inappropriate.
Humans are notoriously bad at noticing one thing when we’ve been primed to look for something else instead. There’s even a term for it: “inattentional blindness.” Coined by research psychologists Arien Mack and Irvin Rock back in the 1990s,
The most famous demonstration of inattentional blindness in action is known as the “invisible gorilla” test, in which participants watch a video of two groups of basketball players, one wearing white shirts and the other wearing black shirts. Before watching, they’re asked to count how many times a player wearing a white shirt passes the ball. Halfway through the one-minute video, a person in a gorilla suit walks into the scene and beats their chest, staying on-screen for a total of nine seconds. Half the participants routinely fail to notice the gorilla.
researchers tried a similar experiment with radiologists—a group highly trained to look closely at information and identify abnormalities. In this experiment, a gorilla the size of a matchbook was superimposed onto scans of lungs. The radiologists were then asked to look for signs of cancer on the scans. A full 83 percent of them failed to notice the gorilla.
when a design brief says to focus on new ways to delight and engage users, their brains turn immediately toward the positive: vacation photos flitting by to a jazzy beat, birthday balloons floating up a happy Twitter timeline. In this idealized universe, we all keep beep-beeping along, no neo-Nazis in sight.
Take any one of the examples in this chapter, and just underneath its feel-good veneer you’ll find a business goal that might not make you smile. For example, Twitter’s birthday balloons are designed to encourage people to send good wishes to one another. They’re positioned as harmless fun: a little dose of delight that makes users feel more engaged with Twitter. But of course, Twitter doesn’t really care about celebrating your special day. It cares about gathering your birth date, so that your user profile is more valuable to advertisers. The fluttering balloons are an enticement, not a feature. Delight, in this case, is a distraction—a set of blinders that make it easy for designers to miss all the contexts in which birthday balloons are inappropriate, while conveniently glossing over the reason Twitter is gathering data in the first place.
Facebook has an internal metric that it uses alongside the typical DAUs (daily active users) and MAUs (monthly active users). It calls this metric CAUs, for “cares about us.” CAUs gauge precisely what they sound like: how much users believe that Facebook cares about them. Tech-industry insider publication The Information reported in early 2016 that nudging CAUs upward had become an obsession for Facebook leadership.
It even caught the attention of Tumblr’s head writer, Tag Savage. “We talked about getting rid of it but it performs kinda great,” 16 he wrote on Twitter, as Rooney’s screenshot went viral. When Savage says the “beep beep!” message “performs,” he means that the notification gets a lot of people to open up Tumblr—a boon for a company invested in DAUs and MAUs. And for most tech companies, that’s all that matters. Questions like, “is it ethical?” or “is it appropriate?” simply aren’t part of the equation, because ROI always wins out.
Every bit of the design process—from the default settings we talked about in Chapter 3, to the form fields in Chapter 4, to the cute features and clever copy in Chapter 5—creates an environment where we’re patronized, pushed, and misled into providing data; where the data collected is often incorrect or based on assumptions; and where it’s almost impossible for us to understand what’s being done by whom.
Uber designed its application to default to the most permissive data collection settings. It disabled the option that would have allowed customers to use the app in the most convenient way, while still retaining some control over how much of their data Uber has permission to access. And it created a screen that is designed expressly to deceive you into thinking you have to allow Uber to track your location in order to use the service, even though that’s not true. The result is a false dichotomy: all or nothing, in or out. But that’s the thing about defaults: they’re designed to achieve a desired outcome. Just not yours.
Uber is well known for having a male-dominated workplace that sees no problem playing fast and loose with ethics
Back in 2010, technologist Tim Jones, then of the Electronic Frontier Foundation, wrote that he had asked his Twitter followers to help him come up with a name for this method of using “deliberately confusing jargon and user-interfaces” to “trick your users into sharing more info about themselves than they really want to.” 8 One of the most popular suggestions was Zuckering. The term, of course, refers to Mark Zuckerberg, Facebook’s founder—who, a few months before, had dramatically altered Facebook’s default privacy settings. Facebook insisted that the changes were empowering—that its new features would give the 350 million users it had at the time the tools to “personalize their privacy,” 9 by offering more granular controls over who can see what.
A proxy is a stand-in for real knowledge—similar to the personas that designers use as a stand-in for their real audience. But in this case, we’re talking about proxy data: when you don’t have a piece of information about a user that you want, you use data you do have to infer that information. Here, Google wanted to track my age and gender, because advertisers place a high value on this information. But since Google didn’t have demographic data at the time, it tried to infer those facts from something it had lots of: my behavioral data.
The problem with this kind of proxy, though, is that it relies on assumptions—and those assumptions get embedded more deeply over time. So if your model assumes, from what it has seen and heard in the past, that most people interested in technology are men, it will learn to code users who visit tech websites as more likely to be male. Once that assumption is baked in, it skews the results: the more often women are incorrectly labeled as men, the more it looks like men dominate tech websites—and the more strongly the system starts to correlate tech website usage with men.
In short, proxy data can actually make a system less accurate over time, not more, without you even realizing it. Yet much of the data stored about us is proxy data, from ZIP codes being used to predict creditworthiness, to SAT scores being used to predict teens’ driving habits.
But by using proxy data, Facebook didn’t just open the door for discriminatory ads; it also opened a potential legal loophole: they can deny that they were operating illegally, because they weren’t filtering users by race, but only by interest in race-related content. Sure.
the endless barrage of cutesy copy and playful design features creates a false intimacy between us and our digital products—a fake friendship built by tech companies to keep us happily tapping out messages and hitting “like,” no matter what they’re doing behind the curtain. Writer Jesse Barron calls this “cuteness applied in the service of power-concealment”:15 an effort, on the part of tech companies, to make you feel safe and comfortable using their products, while they quietly hold the upper hand.
According to Barron, tech products do this by employing “caretaker speech”—the linguistics term used to describe the way we talk to children.
Digital products designed to gather as much information about you as they can, even if that data collection does little to improve your experience.
Because once our data is collected—as messy and incorrect as it often is—it gets fed to a whole host of models and algorithms, each of them spitting out results that serve to make marginalized groups even more vulnerable, and tech titans even more powerful.
Cathy O’Neil claims that this reliance on historical data is a fundamental problem with many algorithmic systems: “Big data processes codify the past,” she writes. “They do not invent the future.” 4 So if the past was biased (and it certainly was), then these systems will keep that bias alive—even as the public is led to believe that high-tech models remove human error from the equation. The only way to stop perpetuating the bias is to build a model that takes these historical facts into account, and adjusts to rectify them in the future. COMPAS doesn’t.
as powerful as algorithms are, they’re not inherently “correct.” They’re just a series of steps and rules, applied to a set of data, designed to reach an outcome. The questions we need to ask are, Who decided what that desired outcome was? Where did the data come from? How did they define “good” or “fair” results? And how might that definition leave people behind? Otherwise, it’s far too easy for teams to carry the biases of the past with them into their software, creating algorithms that, at best, make a product less effective for some users—and, at worst, wreak havoc on their lives.
Remember, neural networks rely on having a variety of training data to learn how to identify images correctly. That’s the only way they get good at their jobs. Now consider what Google’s Yonatan Zunger, chief social architect at the time, told Alciné after this incident: “We’re also working on longer-term fixes around . . . image recognition itself (e.g., better recognition of dark-skinned faces),” 16 he wrote on Twitter. Wait a second. Why wasn’t Google’s image recognition feature as good at identifying dark-skinned faces as it was at identifying light-skinned faces when it launched? And why didn’t anyone notice the problem before it launched? Well, because failing to design for black people isn’t new. It’s been happening in photo technology for decades.
Roth notes that this only started to change in the 1970s—but not necessarily because Kodak was trying to improve its product for diverse audiences. Earl Kage, who managed Kodak Research Studios at the time, told her, “It was never Black flesh that was addressed as a serious problem that I knew of at the time.” 19 Instead, Kodak decided it needed its film to better handle color variations because furniture retailers and chocolate makers had started complaining: they said differences between wood grains, and between milk-chocolate and dark-chocolate varieties, weren’t rendering correctly. Improving the product for black audiences was just a by-product.
regardless of the makeup of the team behind an algorithmically powered product, people must be trained to think more carefully about the data they’re working with, and the historical context of that data. Only then will they ask the right questions—like, “Is our training data representative of a range of skin tones?” and “Does our product fail more often for certain kinds of images?”—and, critically, figure out how to adjust the system as a result.
Without those questions, it’s no surprise that the Google Photos algorithm didn’t learn to identify dark-skinned faces very well: because at Google, just like at Kodak in the 1950s, “normal” still defaults to white.
They’re people—customers who deserve to have a product that works just as well for them as for anyone else.
Without feedback—without people like Alciné taking it upon themselves to retag their photos manually—the system won’t get better over time. It can actually get worse.
if a system like Word2vec is fed data that reflects historical biases, then those biases will be reflected in the resulting word embeddings. The problem is that very few people have been talking about this—and meanwhile, because Google released Word2vec as an open-source technology, all kinds of companies are using it as the foundation for other products. These products include recommendation engines (the tools behind all those “you might also like . . .” features on websites), document classification, and search engines—all without considering the implications of relying on data that reflects historical biases and outdated norms to make future predictions.
In a paper titled “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings,” they argue that because word embeddings frequently underpin a range of other machine-learning systems, they “not only reflect such stereotypes but can also amplify them” 29—effectively bringing the bias of the original data set to new products and new data sets. So much for machines being neutral.
Once again, the problem isn’t with the technology. It’s with the assumptions that technologists so often make: that the data they have is neutral, and that anything at the edges can be written off. And once those assumptions are made, they wrap them up in a pretty, polished software package, making it even harder for everyone else to understand what’s actually happening under the surface.
Even more worrisome, most of the people who create these products aren’t considering the harm that their work could do to people who aren’t like them. It’s not because they’re consciously biased, though, according to University of Utah computer science professor Suresh Venkatasubramanian. They’re just not thinking about it—because it has never occurred to them that it’s something to think about. “No one really spends a lot of time thinking about privilege and status,” he told Motherboard. “If you are the defaults you just assume you just are.”
If we want to build a society that’s fairer, more just, and more inclusive than in the past, then blindly accepting past data as neutral—as an accurate, or desirable, model upon which to build the future—won’t cut it. We need to demand instead that the tech industry take responsibility for the data it collects. We need it to be transparent about where that data comes from, which assumptions might be encoded in it, and whether it represents users equally. Otherwise, we’ll only encounter more examples of products built on biased machine learning in the future.
every digital product bears the fingerprints of its creators. Their values are embedded in the ways the systems operate: in the basic functions of the software, in the features they prioritize (and the ones they don’t), and in the kind of relationship they expect from you. And as we’ve seen throughout this book, when those values reflect a narrow worldview—one defined by privileged white men dead set on “disruption” at all costs—things fall apart for everyone else.
in 2010, just 5 percent of white internet users in the United States were on Twitter, while 13 percent of black internet users were. By 2011, that gap was even larger: a full 25 percent of black American internet users reported being on Twitter, compared with just 9 percent of white American internet users.17
But during all of these product improvements, Twitter built precious few features to prevent or stop the abuse that had become commonplace on the platform. For example, the ability to report a tweet as abusive didn’t come until a full six years after the company’s founding, in 2013—and then only after Caroline Criado-Perez, a British woman who had successfully led a campaign to get Jane Austen onto the £10 note, was the target of an abuse campaign that generated fifty rape threats per hour.
“If Twitter had people in the room who’d been abused on the internet—meaning not just straight, white males—when they were creating the company, I can assure you the service would be different.”
It’s not that Twitter’s founders had bad intentions. It’s that they built a product centered on a specific vision: an open platform for short updates from anyone, about anything. And because abuse wasn’t really on their radar, they didn’t spend much time working out how to prevent it—or even take it seriously when it happened. It wasn’t part of the vision.
According to many in the industry, Twitter’s failure to fix its abuse problem is part of the reason why it’s struggling—and why no one wants to buy the company, despite Twitter’s best efforts to sell.
as long as Reddit maintains a “free speech” ideology that relies on unpaid moderators to function, it will continue to fall apart—and the victims will be those on the receiving end of harassment.
In the wake of the allegations, Facebook launched its own investigation, finding “no evidence of systemic bias.” But it didn’t matter: in August, the Trending team was suddenly laid off, and a group of engineers took its place to monitor the performance of the Trending algorithm.45 Within three days, that algorithm was pushing fake news to the top of the feed: “BREAKING: Fox News Exposes Traitor Megyn Kelly, Kicks Her Out for Backing Hillary,” the headline read. The story was fake, its description was riddled with typos, and the site it appeared on was anything but credible: EndingTheFed.com, run by a twenty-four-year-old Romanian man who copied and pasted stories from other conservative-leaning fake-news sites. Yet the story stayed at the top of the Trending charts for hours. Four different stories from EndingTheFed.com went on to make it into BuzzFeed’s list of most-shared fake-news articles during the election.46 This time, conservative pundits and politicians were silent.
Facebook did precisely what it had always intended with Trending: it made it machine-driven. The human phase of the operation just ended more quickly than expected. And when we look closer at Facebook’s history, we can see that this wasn’t a surprising choice at all. It’s right in line with the values the company has always held.
That’s why it was so easy for fake news to take hold on Facebook: combine the deeply held conviction that you can engineer your way out of anything with a culture focused on moving fast without worrying about the implications, and you don’t just break things. You break public access to information. You break trust. You break people.
All kinds of problems plague digital products, from tiny design details to massively flawed features. But they share a common foundation: a tech culture that’s built on white, male values—while insisting it’s brilliant enough to serve all of us.
The funny thing about meritocracy is that the concept comes not from any coherent political ideology or sociological research. It comes from satire. In 1958, sociologist Michael Young wrote a book lampooning the then-stratified British education system. In it, he depicts a dystopian future where IQ testing defines citizens’ educational options and, eventually, their entire lives—dividing the country into an elite ruling class of “merited” people, and an underclass of those without merit. The public loved the word, but lost the point. Almost immediately, the term “meritocracy” was cropping up in a positive light—particularly in the United States (we’ve never been great at detecting British sarcasm).
only 10 percent of the 187 Silicon Valley startups that received Series A funding in 2016 were woman-led, up a meager 2 percent from the year before.
[Josh: Was that number adjusted for number of startups started by women?]
In the fall of 2016, the Atlantic sent out its annual “pulse of the technology industry” survey to influential executives, founders, and thinkers—and found that “men were three times as likely as women to say Silicon Valley is a meritocracy.”
Tied up in this meritocracy myth is also the assumption that technical skills are the most difficult to learn—and that if people study something else, it’s because they couldn’t hack programming. As a result, the system prizes technical abilities—and systematically devalues the people who bring the very skills to the table that could strengthen products, both ethically and commercially: people with the humanities and social science training needed to consider historical and cultural context, identify unconscious bias, and be more empathetic to the needs of users.
Within days, two more stories of sexual harassment and humiliation at Uber had been published, and countless others confirmed that the company culture was as they described it: aggressive, degrading, and chaotic.
In late January, Kalanick had gotten heat for joining President Trump’s advisory council. Then, the company sent cars to JFK airport, where NYC taxi drivers were boycotting in protest of Trump’s executive order barring legal immigrants of seven countries from entering the United States. Critics saw the move as profiteering, and started a campaign: #deleteuber.15 Within days, more than 200,000 people had done just that.16 Soon after, Kalanick resigned from Trump’s council.
what allows a forty-year-old man to avoid “growing up” for so long, even while commanding a company that was, as of this writing, last valuated at $68 billion? It’s our friend the “meritocracy,” of course.
A senior male colleague proposed an idea to block a driver’s payment if a customer complained. “I told them that it was unethical to block a driver’s payments without researching the complaint to make sure it was the driver’s fault,” she wrote, noting that many drivers live in countries where they do not own their cars and hand over their wages to another party. Blocking payments could leave them without income. “There is no place for ethics in this business sweetheart. We are not a charity,” she recalls the senior manager responding. When she persisted, he covered the microphone on their conference call line, grabbed her hand, and told her to “stop being a whiny little bitch.”
comparing stats over time reveals that women are actually earning fewer degrees in computer science, not more. Originally, programming was often categorized as “women’s work,” lumped in with administrative skills like typing and dictation (in fact, during World War II, the word “computers” was often applied not to machines, but to the women who used them to compute data). As more colleges started offering computer science degrees, in the 1960s, women flocked to the programs: 11 percent of computer science majors in 1967 were women. By 1984, that number had grown to 37 percent. Starting in 1985, that percentage fell every single year—until, in 2007, it leveled out at the 18 percent figure we saw through 2014.
if people can’t imagine themselves working in a field, then they won’t study it. And it’s hard to imagine yourself fitting into a profession where you can’t see anyone who looks like you.
it’s critical that tech companies not just recruit diverse staff, but also work their asses off to keep them. Because unless tech can showcase all kinds of people thriving in its culture, women and underrepresented groups will continue to major in something else
blame the pipeline all you want, but diverse people won’t close their eyes and jump in until they know it’ll be a safe place for them when they get to the other side. No matter how many black girls you send to code camp.
In a 2008 study that included thousands of women working in the private sector in science, engineering, and technology (SET, which, I should note, includes a range of fields broader than just web and software development), researchers found that more than half the women quit their jobs, “driven out by hostile work environments and extreme job pressures.” 25 Another found that nearly one-third of women in SET positions felt stalled in their careers—and for black women, that number shot up to almost half.26
And so the cycle continues: tech sends out another round of press releases detailing meager increases in diversity and calling for more programs to teach middle schoolers to code, and another generation of women and people of color in tech pushes to be visible and valued in an industry that wants diversity numbers, but doesn’t want to disrupt its culture to get or keep diverse people.
Study after study shows that diverse teams perform better.
In a 2014 report for Scientific American, Columbia professor Katherine W. Phillips examined a broad cross section of research related to diversity and organizational performance. And over and over, she found that the simple act of interacting in a diverse group improves performance, because it “forces group members to prepare better, to anticipate alternative viewpoints and to expect that reaching consensus will take effort.”
In another study, led by Phillips and researchers from Stanford and the University of Illinois at Urbana-Champaign, undergraduate students from the University of Illinois were asked to participate in a murder-mystery exercise. Each student was assigned to a group of three, with some groups composed of two white students and one nonwhite student, and some composed of three white students. Each group member was given both a common set of information and a set of unique clues that the other members did not have. Group members needed to share all the information they collectively possessed in order to solve the puzzle. But students in all-white groups were significantly less likely to do so, and therefore performed significantly worse in the exercise. The reason is that when we work only with those similar to us, we often “think we all hold the same information and share the same perspective,” Phillips writes. “This perspective, which stopped the all-white groups from effectively processing the information, is what hinders creativity and innovation.”
“Female representation in top management leads to an increase of $42 million in firm value,” they wrote.31 In addition, they found that firms with a higher “innovation intensity,” measured as the ratio of research and development expenses to assets, were more successful financially when top leadership included women.
Why, then, aren’t things getting better, faster? How can the industry that put a powerful computer in my pocket and self-driving cars on the street not be able to figure out how to get more diverse candidates into its companies? Well, I’ll tell you the secret. It’s because tech doesn’t really want to—or at least, not as much as it wants something else: lack of oversight.
In all that research about the benefits of diversity, one finding sticks out: it can feel harder to work on a diverse team. “Dealing with outsiders causes friction, which feels counterproductive,” write researchers David Rock, Heidi Grant, and Jacqui Grey.35 But experiments have shown that this type of friction is actually helpful, because it leads teams to push past easy answers and think through solutions more carefully. “In fact, working on diverse teams produces better outcomes precisely because it’s harder,” they conclude. That’s a tough sell for tech companies, though. As soon as you invite in “outsiders” who question the status quo—people like Uber’s “Amy,” who ask whether the choices being made are ethical—it’s hard to skate by without scrutiny anymore. As a result, maintaining the monoculture becomes more important than improving products.
CEO Stewart Butterfield reportedly even asks designers to close their eyes and imagine what a person might have experienced in their life before sitting down at their desk. “Maybe they were running late and sat in gridlock for an hour. Maybe they had an argument with their spouse. Maybe they’re stressed out. The last thing they need is to struggle with a computer program that seems intent on making their day worse.”
The only real way to hold tech accountable, and to rid it of its worst excesses, is to demand that it become accessible to everyday people—both in the way it designs its products, and in who can thrive in its offices. Because as long as tech is allowed to operate as a zero-sum game—a place where anything goes, as long as it leads to a big IPO and an eventual multibillion-dollar sale—companies like Uber will exist.
Alienating and biased technology doesn’t matter less during this time of political upheaval. It matters all the more.
In a 2009 Gallup poll, researchers found that respondents who said they knew a gay person were 40 percent more likely to think that same-sex relationships should be legal, and 64 percent more likely to think that gay marriage would not change society for the worse, than those who reported not knowing any gay people. The implication is clear: exposure to difference changes perspective, and increases tolerance.
Changing form fields won’t change laws. But the more our daily interactions and tasks happen in digital spaces, the more power those spaces hold over cultural norms. Every form field, every default setting, every push notification, affects people. Every detail can add to the culture we want—can make people a little safer, a little calmer, a little more hopeful.