Tuesday, October 11, 2016

CS50 Lecture by Mark Zuckerberg

MICHAEL D. SMITH: This afternoon I have the pleasure
of introducing Mark Zuckerberg, which is one of our guest speakers
this semester to come and talk a little bit about computer science
in the real world.
As most of you probably know, as you guys all do this much more
than I do, founder of Facebook.com, which is a social networking
program, whatever you want to call it.
Used at over 2000 schools across the nation, and possibly the world too.
Is it the world too, or just the nation?
So good influence for doing some things in computer science.
He's going to tell us some of the background of it
and what's been important and so forth.
So please join me in welcoming.

All right, cool.
This is the first time I've ever had to hold one of these things.
So I'm just going to attach it really quickly, one second.

All right.
Can you hear?
Is this good?
Is this amplified at all?
>> AUDIENCE: Yeah.
This is like one of the first times I've been to a lecture at Harvard.

I guess what's probably going to be most useful for you guys is if I just
take you through some of the courses that I took at Harvard where I actually
did go to lecture sometimes.
I was joking.
And sort of, like, how different decisions
that I had to make when I was moving along with Facebook
got impacted by different stuff that I was learning in the classes
that I was taking.
And if all goes according to plan, then maybe some of you guys
will come out of this thinking that taking CS or engineering stuff
at Harvard is actually sort of useful.
So that's the game plan.
>> I think that this is slotted for two hours.
There's no way I'm going to speak for two hours.
I'll probably speak for like 20 minutes, or 15 minutes,
and then I'll just let you guys ask questions.
Because I'm sure you guys have more interesting stuff
to ask me than I could come up with to talk about myself.
>> So I guess I'll just kind of get started.

When I was here, I started off taking 121.
I never actually took 50.
You should have gotten the other guy who was
doing Facebook, Dustin Moskovitz, who was my roommate.
When we got started the site was written in PHP, which isn't something
that you learned in one of these classes.
But fortunately, if you have a good background in C,
the syntax is very similar, and you can pick it up in a day or two.
>> So I started writing the site and launched it at Harvard
in February 2004.
So I guess almost two years ago now.
And within a couple of weeks, a few thousand people had signed up.
And we started getting some emails from people
at other colleges asking for us to launch it at their schools.
>> And I was taking 161 at the time.
So I don't know if you guys know the reputation of that course,
but it was kind of heavy.
It was a really fun course, but it didn't leave me with much time
to do anything else with Facebook.
So my roommate Dustin, who I guess had just finished CS50,
was like, hey, I want to help out.
I want to do the expansion and help you figure out how to do the stuff.
So I was like, you know, that's pretty cool dude,
but you don't really know any PHP or anything like that.
So that weekend he went home, bought the book Perl for Dummies,
came back and was like, alright, I'm ready to go.
I was like dude, the site is written in PHP, not Perl, but you know,
that's cool.
>> So he picked up PHP over a few days because, I
promise that if you have a good background in C, then
PHP is a very simple thing to pick up.
And he just kind of went to work.
So I mean, the first big decision that we really had to make
was in how to kind of expand the architecture
to go from the single school type set up that we had when it was just at Harvard
to something that supported multiple schools.
>> So this was a decision that had to be made on a bunch of levels,
both in the product and how we wanted privacy to work,
but I think that one really important decision that's
helped us scale pretty well is how we decided to distribute the data.
>> So I don't know how much of complexity stuff like big O notation you guys
in this class.
So I mean, one of the most complicated computations that we do on the site
is the computation to tell how you're connected to people.
>> Because if you can imagine, that's stored
as sort of a series of undirected-- it's not weighted-- so undirected,
unweighted pairs of ID numbers of people in the database.
Then if you want to figure out who is friends with someone,
you have to look at all their friends.
So that's maybe like 100 or 200 people.
>> But then if you want to figure out who's a friend of a friend,
or what the closest connection is there, then you kind of
have to look at the 100 or 200 friends of each of those friends.
So it becomes at each level there's another factor of n multiplied n, where
n is the number of friends that each of your friends has.
So you can see that this kind of becomes exponentially
difficult to solve for the shortest path between people.
So if you're just looking for a friend of a friend, that's n squared.
If you're looking for a friend of a friend of a friend, that's n cubed.
And that's something that traditionally was
pretty difficult for a lot of the predecessor sites to Facebook.
And for example Friendster had large problems with this
because they were trying to compute paths six degrees out,
or like seven degrees out.
>> And that's something that when you're doing like n seventh,
that just is really very hard and it took down their site for a while.
So one of things that we kind of had in mind when we were figuring out
how to do this was how do you distribute the database in such a way
that this computation becomes manageable.
>> So what we decided was that everyone on the site
does most of their activity at the school that they're kind of based at.
So if you're at Harvard, then most of the people
who you're going to be seeing and transacting with on the site
are going to be at Harvard.
It's actually probably like 90% of the stuff that you do on the site.
>> So we decided to split up the databases and create
one instance of MySQL database for each school in the network.
And in doing that, if you notice the paths that we compute
are only within the school.
So instead of say, like now we're at six million users,
and instead of having to do n cubed over some portion of six million,
it's just n cubed over 10,000, which is a much more
manageable type of computation.
>> So that was sort of the first big architectural decision
that we had to make that contributed to us not dying a few months later.
And it was probably a pretty important one.
>> So when we first set up the site we had just one computer that we were running.
It wasn't in our dorm room.
We were renting it.
I kind of learned my lesson for trying to run a site out of my dorm
room a few months earlier, and Harvard almost tried to kick me out.
>> So I ended up renting a server off site this time.
And I guess running originally the database and the web server.
So Apache is what we were using in this instance
to serve the pages from the same machine.
And because we distributed the databases in the way that we did,
we were able to, as time went on, just add more machines linearly and sort of
grow the site without having any kind of exponential expansion
on the amount of machinery that we had.
>> But after we hit about like 30 or 50 schools,
we started realizing that we could start getting more performance out
of MySQL or Apache.

Some of the way that stuff was set up just wasn't as optimal as it could.
>> So for example, when you have MySQL machines and Apache
running on the same server, then if something happens to that server,
then not only does the database for that school or the schools
on that server just stop kind of responding
in a way that will get you anything useful,
but you can't even load any web pages.
So you get page not founds.
And that kind of sucks.
>> But another issue is that the variance and the use from school to schools
is also not going to be perfect.
So some schools are always going to have heavier use.
We have schools now like Penn State that have 50,000 users.
And then the majority of the schools still have less than 2000 users.
Because there's a lot of small schools and a lot of schools
that don't have complete ubiquity.
>> So in trying to deal with this issue and make it
so that you could deal with the fact that Penn State had
50,000 people and just a ton of users all the time,
and then you have some schools that don't, what we decided to do
is separate out some of the web servers from the database servers.
And make it so that we just had a pool of Apache web servers
that we could load balance between.
And make it so that you can use those uniformly
while just having the database layer be sort of consistent.
>> So I don't know if this stuff is interesting to you guys at all.
Or if this is anything that matters to what you guys are studying now.
So if there's more stuff that you guys would rather
know about in terms of the architecture, then I'll leave that open to questions
So I don't spend a lot of time just talking about random applications
that you guys might not ever care to use.
>> Let me try to find some interesting examples.

So I mean, I guess one of the things that was pretty interesting
was when we got to a point in terms of traffic
where we started maxing out the performance of some
of these open source applications that are generally pretty performant.
>> So for example, MySQL is a really good open source database.
I don't know if any of you guys sort of in your own time mess
around and make anything with MySQL or have used it in any way.
But it's pretty easy to use.
It's also decently quick.
Indices work pretty well.
It's not as fully featured as something like Oracle, but it's pretty good.
>> And we got to a point where, I think around
when we started doing like maybe 100 million pages a day,
that we started running into some bottlenecks on that.
So for example, a typical query on MySQL might take two to four milliseconds.
And that's not that much.
But when you're doing 100 billion page views a day,
and each page view might have 30 to 50 queries,
especially if you're doing something like a profile view that
queries all kinds of different information, then that starts to suck.
>> So we started to develop a caching layer that
allowed quicker access to some of the information.
And originally we were using another open source application Memcache,
which I don't know if any of you guys have any experience with that.
But it was pretty quick.
It got access times down to I guess the 0.3
to 0.5 milliseconds, which is pretty good.
>> But it also has a bunch of distribution issues.
It's supposed to be a distributed hash table sort of application,
where you can just attach any number of Memcache boxes in a cluster
and be able to hook it up and have it go.
But we ran into a lot of issues there where
different Memcache boxes would go down.
And there was no redundancy on the information.
So when a Memcache box went down and you had a cache miss,
then all of a sudden you had a lot more traffic
going to a specific set of databases.
And that would suck.
>> So as time went on, we even outgrew Memcache and the indices on MySQL.
We still use that stuff.
But we had to build on top of that extra redundancy.
And I think that's something that's probably maybe a little interesting.
But I'll let you guys ask me more questions about that later.
>> I'm not really sure what would be interesting to talk about right now.
Maybe you guys could help out a little?

Go for it.
>> AUDIENCE: I'm curious about, thinking of [INAUDIBLE]
going into an online business like this, how you felt the atmosphere was
with big players all bringing it to market and other big players
who you thought might [INAUDIBLE] to mark,
or what your experience was with that.
I'd be interested, just on a technical side, [INAUDIBLE] just ramping
up and technically how you [INAUDIBLE].

>> MARK ZUCKERBERG: Yeah, so that's not a technical question at all.
But I guess I'll just like go into question time now.
Because I'm not really sure what's relevant stuff for me to be discussing.
So I'll just answer this.
Then anyone else who wants to ask me questions can just go for that.

>> I guess I'd never really spent a lot of time worrying about stuff like-- I
mean, there are companies out there like Google
that could just get into your space and do whatever you want at any time.
And I think one of the cool things about this time in technology
is that individuals are leveraged and able to do way more than they've really
ever been able to do before.
>> And even four years ago when Google was started,
now they have hundreds of thousands of machines
and probably billions of dollars spent on equipment.
I think the generation before Google, you couldn't even
make a site without some big piece of hardware.
I think eBay, for example, ran off of two $50,000 machines.
You just can't start doing that if you're just a kid in a dorm room.
>> So I think the fact that we could rent machines for $100 a month
and use that to scale up to a point where we had 300,000 users
is pretty cool.
It's a pretty unique thing that that's going on in technology right now.
It makes it so that instead of worrying about who is the big player
and what is Google going to do next, you can do more of-- you
can just get a lot of stuff done.
>> And instead of having to go out and have some of the traditional business
problems, like you have to raise capital before you can make anything,
that's no longer an issue.
So you're leveraged to do a lot more on your own now.
I don't know if that answers the question that you're asking.
>> But I mean, it's one of the reasons why I think that, at this point,
it makes a lot of sense to be studying this stuff.
Because at no point in the past could you leverage such a small amount
of money to get powerful enough technology
to really touch people in the way that you can today.
Google does about 250 million pages views a day.
They have hundreds of thousands of machines and 5,000 employees.
>> Facebook does 400 million page views a day.
That's a lot more than Google does.
And we have hundreds of machines.
And we just passed 50 employees.
And that's just a technical generation of three or four
years in the architectures that were created.
>> And then you go three or four years back before that from like eBay to Google,
and it's just completely different.
Because at least Google is running off of a lot of distributed equipment
that they have hundreds of thousands of machines,
but the idea there was to get a lot of shitty machines that are really cheap.
I mean, that's a big step up.
>> Because then it's like, OK, that's more redundant.
They're not losing information.
They don't expect stuff to always work.
It's a much more mature attitude than eBay's, which
was the only thing that they could do at the time.

>> AUDIENCE: I have a question about the DHT stuff.
AUDIENCE: The Distributed Hash Table stuff.
MARK ZUCKERBERG: Yeah, which one?
AUDIENCE: I was just wondering if you [INAUDIBLE]
all your extensions for Memcache, because one thing I've noticed
is that, yeah, there aren't really good available libraries for DHT stuff.
There's all this wonderful research, but in terms
of implementations that actually deal with all the redundancy issues and all
those things--
>> MARK ZUCKERBERG: Yeah, a lot of the stuff-- we
didn't necessarily extend Memcache.
We built a bunch of stuff ourselves.
Right now, it's not open source.
We considered doing it.
And I mean, there's a lot of work that goes into making stuff open source.
And it's on top of whether or not you want to lose the competitive advantage.
It's kind of unfortunate.
>> Because I think that if it we were just easier to make something like that,
then you could do it.
You could just release the code.
But then there's a lot of support and licensing and all that stuff.
We found that it's been annoying.
>> One of the things that we actually considered making open source
was this search server that actually that guy sitting right there
made while he was still out in California.
And I guess we got to a point where MySQL was lagging a little on some
of the searches that we were trying to do.
And we decided that it would be a cool thing
to do to make a series of distributed machines
that could-- he doesn't use a hash table.
What's the structure that you use, McCollum?
MARK ZUCKERBERG: So, yeah, we thought about making that open.
But that's when we kind of had to do all this work to come up with a license.
And we're just like, all right, screw that.

>> AUDIENCE: What do you spend most of your work time doing these days?
>> MARK ZUCKERBERG: Hiring people.

I guess when, as you grow, the most important thing
is to have smart people.

If you think about how, the technical leverage stuff that I was talking about
in answering that guy's question, as technology becomes
more generic and less expensive, the leverage point
becomes more in the people.
So if you think about this from a perspective
of a person to people time spent or user time spent, or page view
analysis, because of technology now, people
are much more leveraged to do more things
and be more important in the equation.
>> Because of that, it's really important to get the most intelligent people.
And also, I mean, when you're a small company, you can be really nimble
and get a lot of stuff done.
And there's relatively little bureaucracy.
So if you have smart people who can take advantage of that to build cool things,
then that's awesome.

>> I guess, besides that, designing new things.
There's not much corporate bureaucracy yet.
So I don't have to waste much time on that.

Keep on going?
>> AUDIENCE: Yeah, how much have you spoken and consulted with lawyers so far?
>> MARK ZUCKERBERG: I have a lawyer who works for me full-time.
>> AUDIENCE: OK, it is a big part of running a business?
Would you recommend working on [INAUDIBLE] early on?

>> MARK ZUCKERBERG: We didn't.
And that, I guess, provided some annoyance later on.
Getting stuff set up really well is good.
Getting stuff clean is really good.
>> And, I mean, no one's ever going to tell you a lawyer is bad.
It's all just a question of opportunity cost and what you prioritize.
I guess that, in our case, we now have to deal with a bunch of stuff that
wasn't set up properly in the beginning.
Most of the stuff is dealt with.
It's not even a big deal anymore.
>> But instead of talking to lawyers early on, we were making stuff.
And I think that that was probably the right use of our time.
I think that one cool characteristic of a lot of the companies that end up
being really successful, not that we are really successful,
but I guess we also fall into this bucket,
is that they started off as someone trying to make something
cool and not someone trying to make a company.

You kind of have-- Google came out of Larry and Sergey's PhD Dissertation
at Stanford, and Yahoo came out of just, I guess, also some Stanford guys
just kind of screwing around in their dorm room.
And eBay came out of some guy trying to build a marketplace for his girlfriend
to exchange PEZ dispensers.
Amazon was a little more calculated.

>> So I can't imagine that any of those people really had that much advice,
and it seems to have worked out OK for them.
But, I mean, at the same time I'm not going to sit here
and tell you not to get advice on stuff.
And a lot of times people are just too careful, too.
I think it's more useful to make things happen and then apologize later
than it is to make sure that you dot all your I's eyes now and then
just not get stuff done.

Go for it.
>> AUDIENCE: When do you think that Facebook will reach the point where
it could become that big company [INAUDIBLE] new idea, [INAUDIBLE]?
Do you think it will reach that point any time soon?
How would you keep it from [INAUDIBLE]?
>> MARK ZUCKERBERG: Well, I mean, I think that-- I
think you're kind of always at that point.

I mean, most companies are started on like a couple of ideas,
and those are a few things that they do well.
So, I mean, Yahoo's was like we're going to organize all this information
in the world like by directory.
And that was what they started off doing,
and then they kind of diversified out as time went on and built more stuff.
And a lot of that stuff is like the core of their business now.
I mean, it's like they didn't originally do search.
And now directory just doesn't exist.
It sucks.
There's no utility for it.
>> I mean, Google's big thing was just like they did PageRank.
And then, I guess, out of PageRank, they have search.
And now they kind of extend that to do other similar type of algorithms,
searching in other spaces.
But, I mean, you can kind of tell how all the other stuff that they're doing
is sort of tangential.
And it's like they're trying really hard to make PageRank
and other types of algorithms that are very
similar to that work in their spaces, and it's just not as elegant
or pure of an idea as the original one was.
>> So in Facebook, for example, when it just got started,
what I thought was the most interesting thing was just
to be able to type in someone's name and find out information about them.
And there was hardly any of the stuff that was there now.
There was no groups.
There was no messages even.
There was poking.

>> Yeah.
I mean, so it's like you kind of get started on some kind of core idea.
And generally, the company will do well, because I
guess the people who are starting off working on that core idea
kind of understand that single core idea in some sort of unique way.
But that doesn't imply that they have any better understanding of anything
else, than anyone else.
So that's why surrounding yourself with a lot of smart people
is really important.
>> AUDIENCE: What was-- was there any sort of model
that was [INAUDIBLE] photo features [INAUDIBLE] on Facebook?
Was there any sort of [INAUDIBLE]?

MARK ZUCKERBERG: I mean, there's a lot of applications on the internet
now that do that stuff.
So, I mean, Flickr's a pretty photo application.
Although I think in three weeks we passed them in the number of photos
that we had on our site.

I mean, I think that the coolest thing about photos
is that you can tag them and the way that
makes them link to people's profiles.
And I think that that's something that you can really
only do if you have the context of everyone around you on the site.
That kind of requires the ubiquity of usage.

So I don't know if any of the other guys would have done that if they have that
kind of use, but they didn't.

>> I don't know.
Don't any of you guys have any CS questions?
>> AUDIENCE: I'm curious.
How do you decide as you're moving forward with the company
to pursue a technology or not pursue a technology?
MARK ZUCKERBERG: What's an idea?
What's in the example?
>> AUDIENCE: Well, I actually don't know much about Facebook.
What's the next thing you want to do with pictures
and linking people together?
How do you know about figure out which technologies are good ones?
How do you mine to find technology?
Do you have any processes in place today that
are directed towards those sorts of things,
or does technology just come into the company
because you're out someplace and somebody
mentioned something you might want to do in terms of Facebook?
>> MARK ZUCKERBERG: So I think that our process for filtering what technologies
to use are trust the smart people.
So we definitely have some people at the company who are just really smart,
and I think that most of the people at the company are generally pretty smart.
>> But there area a few guys in particular-- I'm
not one of them-- who I think that when they say that something is a generally
good practice to go at it, then it's relatively-- then
they can get support for that pretty easily.
And I think that a lot of the engineers sort of build a consensus around that.

I'm trying to think of a good example.
>> I think it's somewhat goal oriented.
So then with photos, we knew that we wanted
to support just people uploading unlimited photos.
So, I mean, there's no real concept of unlimited.
It's just you have to keep on adding stuff, keep on adding storage.
And you want to make it so that it kind of works as seamlessly as possible.
So the first thing that we were trying to do
is, well, let's evaluate these companies that
just do large storage for a living.
Or it's like NetApp or something, Network Appliance.
So we talk to them for a while.
And then we're like, all right.
Well, we don't really want to go with this single, big box approach.
We want to go with having just a series of distributed smaller
boxes with a lot of hard drive and a lot of RAM.
>> And so I think that the architecture that we first built
was one where we had a bunch of those machines
with relatively slow but very stable disk behind a level of-- a layer
of caching boxes with a ton of RAM that could hold most of the thumbnails
and the most frequently accessed images in-- I guess in RAM at any time.
And then right before we launched, it occurred to us
that we were going to have some issues with this.
And the issues that we were going to have
were going to be network issues, not hardware issues.
>> So, for example, if you take a photo album of 30 photos
and each of your photos is three megabytes,
then you can upload 90 megabytes to Facebook.
And that kind of sucks.
All right.
I mean, it sucks because people tend to have not optimal connections
and because our router-- I guess most routers are set up
to only be able to handle a gigabit at a time,
and routers are kind of expensive.
Thy are big pieces of equipment.
I don't think that there is a distributed version of that yet.

>> So we couldn't, in the time frame that we wanted to launch it,
just get a new router and get it set up.
So what we ended up doing was building a Java applet and an ActiveX control that
coupled the choosing of the photos that people wanted
to upload with compression on the client side to make it smaller,
and then that way people can just upload their photos relatively quickly.
We also saved CPU on our side because we don't
have to do the decompression on our side,
although that wasn't that huge of a bottleneck.
So that worked.
>> And then we got it to a point where we were
having uploads at a rate of 100 a second,
and people were using the feature way more than we thought we were going to.
And even though we had this caching tier setup,
it just still wasn't fast enough.
I'm sure you guys remember this.
A few weeks ago, the site was not having a good time.

>> So what we ended up doing at that point was
using edge caching, like Akamai type of stuff
to make these photos which are static content just be closer to people.
So that way we can sort of offload some of the equipment and the-- sort
of having to transfer these still somewhat large files to people.
So that's where we are now, and it seems to be working pretty well.
>> It wasn't that we had any upfront technical genius about it.
It was just sort of that at each point we sort of anticipated the issues
or picked them out pretty quickly and then
had enough competence to evaluate, I think,
what the options were that we had and make
what I think were decent decisions about how to execute on them.
What's that?
>> AUDIENCE: Take that to the next level, too, in terms of the problems
you just talked about.
>> AUDIENCE: Students get one year of-- you know, one computer science working
with, like, I go sit in the corner, type on my [INAUDIBLE].
How did the company work through-- what do the software engineers do
when you guys all have to put curly braces in the same place?
>> MARK ZUCKERBERG: What's that?
AUDIENCE: Curly braces for the programmers in the same place.
How is the structure of the software engineering actually done [INAUDIBLE]?

>> MARK ZUCKERBERG: So the way that-- I guess the methodology that we have is
that I wanted to be sort of-- as much of a meritocracy as possible
where the people who can come up with the coolest solutions
and implement them the quickest and have like the fewest bugs get
to work on the stuff that they think is the most interesting
and go off and have the most influence in the company.
>> So we're also on-boarding a lot of people,
because we're hiring relatively quickly.
And in doing so, we sort of have-- we pair up
new people who are coming in with some-- like the better people
who are sort of at the top of the chain, and then we
have them sort of work with those people when they first come in,
to learn the stuff that they're working on that-- so
that the new guys, like the incoming class,
can sort of learn what some of the people that are currently
at the company are working on.
And I think in doing that, they pick up the style and the methods that we
use for doing stuff.
>> But I think that it changes pretty quickly.
I think one difference between the way stuff works in a company
and the way stuff works in school is that this is a very iterative process.
And it's nice when you get stuff right the first time, but we don't need to.
And I think that a lot of companies go through phases, or stages,
where they don't get stuff right the first time.
>> Like Microsoft-- I mean, I don't know when
the last time was that they had a good product before Version 4.
But by the time they get to Version 4, it's
like always good for the most part.
And I think that works out pretty well for them.
And, I mean, Google always releases their stuff in beta.
>> So I guess we try to have multiple people work on the same thing,
so everyone can learn from each other and kind of pick off
some of the mistakes that might be made that we can reduce pretty quickly.
But like, I guess in general, the idea is
that it doesn't have to be perfect the first time around.
And as long as you get the architecture as right as possible,
then a lot of the other implementation stuff
isn't going to be as big of a deal, and you can sort of
work that out at any time.
I know if that's sort of answering the question that you asked me.
>> AUDIENCE: So now, when you find something
that you want to do that you don't know so much about,
you can ask some of these people that are working for you,
or you can get new people.
But when you started, it was just sort of you and your roommate as a student.
And obviously, there were domain knowledge issues of computer science
that you had to deal with and you didn't know about.
>> I mean, how did you go about figuring out how to do things?
Did you decide to take certain classes?
Did you get books?
Did you go hire or get involved with some more people?
How did you work through those issues of learning
computer science as you worked through this?
MARK ZUCKERBERG: The internet is a pretty good tool.

I think that that's how we did most of it.
I mean, we kind of make a point of not hiring people for skills,
because I guess the theory is if someone has skills in an area
and has been doing it for 10 or 15 years,
then that's probably what they can do.
And that's good, and that mean that they can do that.
>> But if you hire someone, say, right out of college,
or someone younger who you're just hiring them for raw intelligence,
then the idea is that they're going to be able to learn stuff really quickly.
And there's a lot of information available all over the place,
and now, withing recent years, there's good tools for sorting through that.
And I think that the most performant people we have
are sort of younger people, who didn't necessarily know that much about
anything specific coming out of college.
>> I mean, a good example is-- Dustin, my roommate at Harvard
wasn't even a CS major.
He was an economics major.
And he's just a really smart dude, and was able to pick it up.
Some of the other good people we have are
EE majors out of Stanford or Berkeley.
And they aren't even CS all the time.
Like math people-- if you studied math, you
can learn the stuff relatively quickly a lot of the time.

>> AUDIENCE: I guess, since you have the infrastructure in place, right now,
when you focus on your hiring, so you still look for tech skill people?
Or do you look for people who might have the business knowledge to help grow you
further and make more money?
What's actually the priority right now in growing the company?
MARK ZUCKERBERG: I never really hire people
just because they have business skills.
It's actually kind of funny, but knowledge of a lot of core CS stuff
is really important in business, too.
One of the main things that you learn when you're studying CS
is complexity and scale, and that is a huge issue in business, too.
How do you go from having five people to 100 people,
and what's the change in the dynamic there?
And like, how are certain processes-- how
is a sales force going to scale from five people to 100 people?
>> It's like the same type of intelligence that
can figure out both of those problems.
And it might be a different type of person who cares to solve the problems.
>> But I think that the second part of my answer to what you said
is that I think we're sort of continually
in the process of building out infrastructure,
and I don't think you ever get out of that process.
And we're kind of focusing not on just building something
and figuring out how to make money off of it
and sort of maximizing the value of our business in the short term--
but instead, sort of always looking to maximize
what the long term value would be.
And I think that in doing that, you kind of
need to always just be building out your base, and not at any time
be worried about maximizing your money.

>> AUDIENCE: This is sort of back to the [INAUDIBLE]
Facebook, but do you guys have issue like the day after college,
maybe something like that, with everybody uploading pictures
all at the same time, [INAUDIBLE]?
MARK ZUCKERBERG: Our peaks are pretty strong.
So like at 5:00 in the morning, no matter
how many users we have signed up, there's always like 5,000 people,
and that's it.
And then if you get to 9:00 PM Pacific-- so like midnight here--
which I guess is like the peak across the country,
it's close to 400,000 people using it simultaneously.
>> And it's actually kind of interesting, because we monitor these graphs
and we have this huge LCD in our office, and whenever
there's a blip in the traffic, we're like, oh crap, what happened?
And a lot of times it's like Laguna Beach.
>> But usually it doesn't swing that far the other way.
>> AUDIENCE: With your archive [INAUDIBLE], if someone deletes something
from their profile, do you keep a cache of that, and how long?
MARK ZUCKERBERG: Right now, we don't.
But we may at some point in the future.
>> AUDIENCE: To follow up on that, what kind of issues
do you talk about at the company in terms
of privacy and security, all those things?
Are you worried about it at all?
You've put your [INAUDIBLE] privacy and security statement online.
So you just put it up and then not worry about it?
>> MARK ZUCKERBERG: Well, I think that what makes Facebook fun
and useful is that there's a lot of information about a lot of people
that you can get.
But what's more important is that the information
is available to the people who that person wants that information
to be available to.
And the flip side of that is that the information
is available to the people that want to have access to that information.
>> So one of the kind of core decisions that we made
was only to let people at the same school see each other's profiles.
And I guess the idea behind that was that you're at Harvard.
You probably wouldn't have that hard of a time just letting
someone else at Harvard see your information.
But at the same time, it's like only people at Harvard,
who you're probably going to see on a day-to-day basis and maybe meet,
who are ever going to want to look you up.
It's not like some kid out at Stanford who you will never
talk to is going to be interested in knowing what your cell phone number is
or what you're interested in.
>> So by limiting the scope of the information
to sort of as narrow as makes sense, I think
that we've solved a lot of those issues.
And then, we also give people complete control
over what parts of their profile get showed.
So we don't force anyone to show anything,
and we give people granular control over some of the more sensitive stuff.
>> So like, right next to the cell phone field,
there's another field that's like, who do you want to show this to?
Just your friends, just people at your school, what?
We care about it, because if people stop--
if people feel like their information isn't private,
then that screws us in the long term, too.
>> AUDIENCE: Just furthering on that, I guess even though you
put the information up yourself, what's the recourse in case,
say, you have a photo, and somebody puts that photo up
on some message board or some Hot or Not type site.
How do you control what users do with the information that's
input onto your servers?
MARK ZUCKERBERG: It's very hard to control what people do with information
that they have access to.
I mean, the best that we can do is give people control over their information
and who can see it.
And then once they let someone see it, it's sort of out of anyone's control.

>> AUDIENCE: I'm curious a bit about [INAUDIBLE] Wall feature.
It seemed to start out maybe more like blackboard type of thing, and then it
completely changed around. [INAUDIBLE] like one or the other,
or if there was something that you were thinking of?
Or was there a design change in the process of doing [INAUDIBLE]?
>> MARK ZUCKERBERG: So I originally threw that together in like a half an hour.
And I guess it was pretty complicated, because-- or it
was more complicated than I thought it was going to be.
And I think part of the reason why we changed
it was because it didn't work as well as we wanted it to.
I mean, the original goal was to sort of make it
so that you can have this wiki type thing on people's profiles,
that when you moused over something, it showed who added that part of it.
>> But I guess there were a lot of cases that we missed,
or it just wasn't well designed by me.
And I don't know if you guys remember, but you used to mouse over stuff,
and it just wasn't as good.
And like, it might tell you the wrong person,
or it might highlight more than it was supposed to.
>> So I kind of coupled that with thinking, this isn't even the best feature.
It would be much more interesting if instead of having to mouse over stuff,
people could just see the picture and the name of the person who
posted everything, without having to go through the whole wall.
So over the summer, we just kind of went through
and wrote a better parser for the walls and tried to decompose them.
And then, going forward, we made it so that you just added a post,
and it went to the top of the wall.

>> AUDIENCE: [INAUDIBLE] question.
Where'd you get the idea from, for creating Facebook?
>> MARK ZUCKERBERG: I just wanted to make something
where people can type in someone's name and get
some information about a person.
I thought that would be cool.

Oh, yeah?
>> AUDIENCE: I'm interested in the feature that you
could SMS some [INAUDIBLE] information if you wanted and send it back.
I didn't know about people using it.
So I'm just wondering if there actual considerations [INAUDIBLE]?
>> MARK ZUCKERBERG: So the SMS Gateways also have an email counterpart,
so if your phone numbers is x and you have Cingular as your provider,
then you could email x@cingular.com or some variant of that,
and the text message would go to your phone.
And that's a free gateway.
So, you know when you text message people, a lot of times
depending on what your cell phone plan is, it will cost you money.
If you do it through email, it actually doesn't cost any money.
So that's how we chose to do it.
We were doing a high volume of them and we
decided that it would just be a better thing for us to-- to actually do it
the legit way and send a text message directly to the cell phone,
as opposed to going through the email gateways.
So we're kind of the process of getting that set up now.

>> MARK ZUCKERBERG: I think that we're always looking for more stuff to do.
I don't think that we're competing with Myspace.
And I think it's kind of a different type of application.

AUDIENCE: I'm just curious.
Is there a particular reason why on a person's profiles and school emails
and stuff [INAUDIBLE] and not as text can be copied and pasted?
Is that [INAUDIBLE]?
>> MARK ZUCKERBERG: So I did that so that people
couldn't go through and scrape the pages.
We have a lot of stuff that we put in place
to make sure that people don't aggregate information off of Facebook.
You obviously, you can't see profiles of people at other schools.
But also if you try to view a lot of profiles,
it picks up that you're just viewing an abnormal number of profiles.
>> And we also sort of-- just by analyzing user activity,
we've built these Bayesian filters that I guess just let us pick out
abnormal activity, like really quickly, and just kind of show
very limited information to those users.
But one of the things that we wanted to do,
we want to make sure-- we want to make it especially difficult for anyone
to try to scrape email addresses, because that's
really annoying-- if people get spammed.
So we figured that by making it an image,
instead of plain text, that just added an extra level of complexity
in terms of scraping.

>> AUDIENCE: [INAUDIBLE] pretty valuable resources that [INAUDIBLE].

Do you do anything [INAUDIBLE]?
>> MARK ZUCKERBERG: Well, we can use it to target posters to you, for example.
I don't know if any of you bought posters off of that.
But we sort of-- we're trying to figure out what we can do that,
but we're obviously really sensitive to people's privacy.
And what's that?
>> AUDIENCE: Not so much for individual [INAUDIBLE],
but just as a whole [INAUDIBLE]?
I think we're actually going to be releasing something
in late this week or next week that shows some aggregate statistics that we
think are interesting.
I mean, this is the stuff is kind of cool, but it's not the type of thing
that you come back to every day.

No CS questions?

MICHAEL D. SMITH: Do you have any questions for Mark?
He might be willing to stay around for a couple of minutes,
in case people want to not ask you in public, but have a--
disappointed that Will Chen didn't ask me any questions.
>> MICHAEL D. SMITH: We'll work on Will later.
That's it?
No more?
We've got a couple more.
AUDIENCE: Do you ever procrastinate on Facebook,
like everyone else in the room?
>> MARK ZUCKERBERG: What's that?
>> AUDIENCE: Do you ever procrastinate on Facebook?
>> MARK ZUCKERBERG: Of course.
>> MARK ZUCKERBERG: I mean, I think that there's
a value to what people do on the site.

>> AUDIENCE: I just know that probably many of us
would feel that the hours [INAUDIBLE].

>> MARK ZUCKERBERG: Yeah, of course.
AUDIENCE: I don't know if you can say this, but what kinds of features can
we expect in the future?
>> MARK ZUCKERBERG: Well, I can tell you what we're going to do next two weeks.
There's the thing that I just mentioned before,
where we're aggregating a bunch of stats, and just show what's hot
and what's changing.
And also surprising statistics that we've
found, like 2% of people at Harvard are Libertarian, for example,
or something like that.

I think another thing that we're going to launch hopefully
sometime either late this week or next week,
is something that allows people to clarify
their relationships with other People.
>> So a lot of the problems that we kind of deal with at Facebook
aren't always technical, but there are sometimes like they're social problems.
And it's like-- one thing that I think is
really interesting is-- if you have 100 or 150 friends, how well do
you know each of those people, and who are maybe like the five people
who you actually care about, like a lot.
And that's not something that you can really
answer right now, because the connections are binary.
You either are connected or you're not.
So I've been trying to think for a while about how we could design something
that would make it so that people could express how close they were
to people, in sort of an unbiased way.
>> So you can imagine, if you made a feature that was just like-- rate
your friendship on a scale of 1 to 10, that would not work.
Because first of all, no one would want to do
that because you're insulting someone if you're like, you're a three.
But it's also kind of boring, and so no one
would want to do it because of that.
And it would just be skewed by social pressure in the same way
that the friends are.
Some people have a different sense of what a friend is to them,
then another person would.
So if someone has 30 friends and another person has 150 friends,
does that person actually have more friends in real life?
Maybe or maybe not, and maybe the person with 30 just
has a higher threshold for making someone on a friend on Facebook.
>> So I mean, I guess that the solution that we came up with for this
was to make-- to judge relationships based
on bi-directional, factual statements.
So for example, I took CS50 with this person.
Or I lived in a house with this person.
And there's just kind of a bunch of different ways to do stuff like that.
But I figured that that would probably be a little more accurate,
because no one is going to-- there's no pressure
to lie about something like that.
It's not like, what are you talking about?
I didn't take CS50 with you.
But if someone aggregates a lot of different connections,
then that kind of means something.
So when you take someone like Dustin, who's my roommate here,
and it's like OK, well we lived together at Kirkland House.
Then we worked on Facebook.
Then we moved out to Palo Alto, and now we're still working on Facebook-- then
maybe that's enough connections to say OK, well this person clearly
has a lot to do this person.
Whereas if the only category that you know someone through is,
this person's my Facebook friend, then that also means something.
So I don't know.
We'll see how it works.
Nothing is for sure.
What's up?
>> AUDIENCE: Do you actually [INAUDIBLE] people typing in information

>> MARK ZUCKERBERG: It's a combination.
So I think that another thing that's pretty important for each
of these events is the date at which they occur.
So if you had, for example, a date on each person's friendship
with each person then that would give you a more accurate representation
of what that meant, because right now you
don't know what friend means to each of the people on the network.
And because you don't know when that friendship was formed,
you don't know what has changed in that relationship
since that friendship was formed.
>> I mean if the person-- if friendship means very little to someone
if you know that that happened yesterday, that they became friends,
you still know that there's some-- that there's some strength.
It's like a certainty thing.
There's a lower certainty that their relationship
has diverged since that point if the date at which the action occurred
was sooner.

Sorry, more recent.
So I think that's one of the things that we're focusing on here.
So I took a course-- I took CS50 with someone
this term is a lot different than saying I'm a senior now
and I took CS50 with this person when I was a freshman.
>> A lot of these-- the analysis of how people look at this
and see the relationships isn't necessarily--
Facebook isn't going to rate the relationship.
It's sort of-- people have an implicit understanding
of what the difference is between having taken CS50 with someone this term
and having taken CS50 within three years ago.
And I think that will kind of help out.

What's up?
>> AUDIENCE: When you get a new idea and you
think it's pretty cool, how [INAUDIBLE] with how you go about it?

Because I think that a lot of the stuff, we sort
have a very unique platform for building it.
I don't think there's any other company or group of people
in the world who could develop this right now.

I mean even Google, with their like 5,000 engineers
is not in the place to make an application that sort
of characterizes people's relationships like this.
>> And it's like the same thing with the photo tagging.
We can do that because photo tagging only works if everyone around you
is on the site.
Because otherwise you're going to get a type of use
for it where you go and you upload a photo
and you go to tag a bunch of people, and they're not there, and that sucks.
So even if 50% of the people at Harvard were on Facebook, then the tagging
and the way that we set up would still suck.
So it only works because 97% of the people at Harvard are on Facebook,
or whatever.
So because of that, it's like not that big of a concern.

>> AUDIENCE: So from sort of a software engineering,
sort of dynamic [INAUDIBLE] way, when somebody
has one of these ideas-- like let's aggregate this [? wider ?] statistic
and tell people, or I have a way to measure this, that, and the other
about these people and mark up this thing on people's profiles--
how do they go about getting the go-ahead from everyone
else in the company to spend some of their time technically working on that?
Or get other people to work on it with them, and stuff like that?
I think that a lot of people-- I mean, the people who work at Facebook really
like working at Facebook, I think, for the most part,
and spend a lot of their time doing that.
And like, a lot of the time that they're spending,
they spend working on stuff that might be
sort of strategically important to what we're trying to do at that point.
But also, a lot of people just mess around with the code base,
and kind of put if-statements in there that's like, if the user is me,
then put this in there.
>> And so I walk around to different people's places during the day,
or people come and talk to me.
Like, I hold CEO office hours as a joke, like from 2:00 to 4:00 every day--
not today.
And people just come and show me different stuff that they're doing,
and a lot of it is relatively cool, and stuff
that I wouldn't have necessarily thought of.
>> So I mean, you asked before if we were saving,
if we were archiving, old profile information, and one of the reasons
why I said that we might start doing it is
because one of the guys at the company came up with something where it's like,
so you go to your friend's page, and it shows your recently updated friends.
And then you click on that, and it shows their new profile.
But there's no indication of what changed.
>> So one of the guys made something that keeps an old version of his profile,
and then makes it so that when you go to his profile when he updates it,
it highlights in yellow the parts of it that were changed.
And I think that that's pretty cool.
And it's not a huge project-- I mean, it actually kind of is,
if we have to start storing everyone's information.
>> But I mean, it's somewhat cool.
It's not the type of thing that you necessarily are bound to come up,
but I definitely think it's a pretty big improvement over what we have now.
Now, it's really hard to go to someone's profile and tell what changed.
And that's just the most recent example that I have.
>> AUDIENCE: Do you have time to allow people to change the look of each page?

>> MARK ZUCKERBERG: So, I don't want to do that.
And the reason is because I think that Facebook is a directory,
and the primary purpose is to look up someone.
Like type in their name and get some information about them.
And one of the things that's really useful
is that everyone's page is structured in the same way.
>> So if you want to see if someone's single,
you don't have to scan down the columns until you get to relationship status.
You just know where that is.
So you click, go-- your eyes just go to that thing.
But if you had different people changing their CSSes in different ways,
then that could become annoying-- especially
if people are doing stuff like dark blue text on black backgrounds.
It just gets kind of obnoxious.
>> AUDIENCE: How successful has the Facebook [INAUDIBLE] been,
and what do you see as differences in the purpose [INAUDIBLE]?

>> MARK ZUCKERBERG: The purpose-- for me, the high school one was the same.
I think that the application-- this is going to probably
sound pretty stupid-- but wanting to look people up, I think,
is kind of a core human desire.
I think that people just want to know stuff about other people.
So I think that providing an interface where people can just
type in someone's name and get some information about them
is generally a pretty useful thing.
So growth has been pretty good.
>> It was tough to figure out exactly how to gauge it,
because when we did college, we opened it up at Harvard.
Then we opened it up at a couple colleges around Harvard.
And the idea was always, we were really short on money and equipment.
So while getting as little equipment as possible,
we want to maximize our growth.
So we want to launch at the schools that we
think are going to grow the quickest, based on the fact
that the people at those schools are going to have the most
number of friends at the schools that we're already at.
We took a different approach for high school,
because we could just launch it everywhere at the same time.
So we didn't really know how it was going to grow.
I think it's growing at more than 5,000 people a day, which is pretty good.

>> AUDIENCE: When you started Facebook, did you
intend for it to become this full-fledged business?
AUDIENCE: Well, how did you [INAUDIBLE]?

>> MARK ZUCKERBERG: I remember thinking that it would be cool
if you could have a directory of everyone.
I remember arguing with my parents about this, because after I almost
got kicked out of school for this project that I did before Facebook,
they were like, what good could possibly come of doing something new?
And I'm like, no, this is pretty cool.
Just imagine how cool it would be if you could just type in someone's name
and get some information about them.
And they were just like, I don't see it.
And I'm like, well, we'll just do it at Harvard for now,
but imagine what happens if one day, you can just type in anyone's name
and get some information about them.
And like, that would be kind of cool, right?

So they didn't buy it, but now they do.
>> Yeah, so I don't know.
I guess at each phase, we're just kind of looking at a natural way
to preserve the integrity of the network,
and also to make it so that it's more useful-- I
guess is the answer to that question.
>> AUDIENCE: Are there certain skills, particularly [INAUDIBLE],
that you [INAUDIBLE] or you would suggest for someone to study?

MARK ZUCKERBERG: I just suggest that you take the hardest courses that you can,
because you learn the most when you challenge yourself, right?
So like 161 just ruined my life, and I learned so much from it.
121 I also found pretty hard.
124 kind of changed the way I thought about stuff.

>> What 124 taught me that I think was really useful
was that there are-- I think a lot of people focus
on how to do stuff as well as possible, and how
to make the most efficient algorithm.
But what has always gotten us by isn't doing stuff in the most efficient way,
but laying the framework in a pretty efficient way.
So I mean, it kind of teaches you both sides of the problem,
like data structures and algorithms, and how the setup is really important.
And that's definitely saved our ass in scaling a lot of times.

>> I don't know.
Work with smart people.
Learn from people.

AUDIENCE: One of the things that I've noticed about Facebook, compared
to other social networking space, is that it's actually a lot easier to use.
Do you have people-- like your employees just putting whatever pieces they think
are cool.
Do you have separate stability people to ensure it all works all together?
>> MARK ZUCKERBERG: People can make whatever they want,
but that doesn't mean they can put it on the site.
So I think that before stuff goes on the site, a lot of people see it.
I mean, I definitely check off on it before it can go live.
But I mean, I think that people have a lot of creativity to do cool stuff.
And a lot of times, it's like someone can come up with a cool idea,
but that doesn't mean it's the final way that it would happen.
>> So for example, people highlighting in yellow what the changes are
in their profile-- I think that just the concept of highlighting
stuff that has changed is really good, but the interface
that that guy used for it isn't what I think is the best one.
And the way that he's storing the old profile information
isn't optimal either.
And that kind of is cool, because he was just doing it for himself.
But if we were ever going to make something live out of that, which
I want to, we do in a different way.
And it's more just like a mock-up.
>> AUDIENCE: So like, the ideas come from the ground, up,
and then [? it's just ?] [? tossed ?] [? down the line? ?]
>> MARK ZUCKERBERG: I mean, it goes both ways.
And I'm not completely unopinionated.


>> AUDIENCE: I actually have a question about the [INAUDIBLE].
So, going back about the [INAUDIBLE] and [INAUDIBLE] privacy.
And it's a different platform?
>> AUDIENCE: So college people are over 18 and allowed
to post whatever pictures they want, and they're not really
incriminating themselves, except possibly for drugs and alcohol?
I've seen pictures on Facebook where my younger
cousins are drinking and stuff like that.
But when you go to the high school kids, they're 15 and 16 and younger.
>> And are you guys just saying, it's the internet,
and if they want to incriminate themselves and things like that,
is that OK?
Or do you guys filter the pictures that high school students put up
and the information they write?
Or do you just [INAUDIBLE]?
MARK ZUCKERBERG: So a lot of the solutions that we come up with stuff
aren't technical or organizational, but just applying social pressure
in good ways.
So Myspace has-- almost a third of their staff
is monitoring the pictures that get uploaded for pornography.
We hardly ever have any pornography uploaded,
and I think that a lot of the reason is that people
use their real names on Facebook, and your real email address for school.
And if you have that, then you're not going to upload pornography.
And I think that that's a really simple social solution
to a possibly complex technical issue.
>> So that said, we changed some of the features around for high school.
For example, we took parties out, because we
figured that parents would get pissed off
or they would just break up all the keg parties really quickly,
and that would suck for everyone.
>> I don't know.
We deemphasize contact information in high school.
AUDIENCE: All right, we end here.
If you have other questions, feel free to come down and talk to Mark.
Thank you very much.