Originally aired on August 20, 2020 @ 4:00 PM - 4:30 PM EDT
Founder Focus is a “Humans of New York” style spotlight on the human stories behind diverse startup founders, their life experiences and perspectives, the origin stories of their startups, and the path they took to where they are today.
Hello, everyone. I'm your host, Jade Wang, and welcome to another episode of Founder Focus. We are joined today by our guest, Ivan Lee of Datasource. Hi, Ivan. Welcome to the show. Hey, Jade. Thanks for having me. So, very briefly, can you tell us what your startup does? Yeah. So, over the past decade, we've seen a lot of AI models being deployed in the world around us. And in order for that AI to learn about our real world, we often have to provide it with a lot of labeled training data so that it can learn about specific facets of our work. So, as a toy example, let's say you have a bunch of restaurant reviews on a site like Yelp. A customer might be complaining about the customer service, but praising the food. And it's important to understand, you know, we can label those key terms and say, hey, the customer service was great, was poor, but the food was great. And we can teach the model to read and understand these reviews at scale. And so, Datasource provides a platform where you can apply these labels very easily. Cool. Can you tell me about the origin of your startup? How you started to work on this? Yeah, absolutely. So, it was my job. I've been a machine learning product manager for the last seven years at a number of companies, most recently at Apple. And it was my job to go and gather and label this data. I've spent millions of dollars just capturing this labeled data. And so, it was just a matter of pattern matching. I saw all three of the last companies reinvent the tools and the processes to capture this labeled data. And I recognized it was going to be this thing where over the coming decade, we expect AI to be more ubiquitous. And, you know, a lot of companies will have to reinvent this wheel. And so, I left to start this startup and solve it for the industry once and for all. What tools were you reinventing the wheels with prior to that? Was it like spreadsheets? Yeah, it's still very early stages. People are still figuring out what the ideal AI process is. So, like you said, some folks are using spreadsheets, you know, Google Sheets, Excel. Others are using some open source tools built by academics that aren't necessarily kept up to date. And yet, others still are investing a lot of money and time building their own in-house tools. And so, it's pretty chaotic. And we're trying to build some semblance of standardization. So, who are the typical users? Are they mostly the large companies? Are research labs also using this? Yeah. So, we are focused on a branch of machine learning called natural language processing. And so, NLP, as it's called, is specifically focused on text data. And one of the pleasant surprises as we were starting this company and talking to users and customers is we found there were so many different use cases of NLP across many different projects and industries. So, to your question, you know, our users can be the machine learning teams at large companies. We're also talking to startups, of course, but also to academics and nonprofits and also folks from traditional industries who are interested in AI and what it can bring to their companies. So, really, anybody who's working on NLP, on natural language processing, can come and use Datasource. So, for our audience, can you give an idea of the scope of the kinds of questions that are feasibly addressed by NLP and also maybe some of the questions that are not so suitable for NLP? Just to give us an idea of, like, the bounds of where it's suitable. Can it feasibly detect fake news or supervise learning for, like, the truth of statements and political ads? Or can it detect fraudulent bank transactions? Yeah, those are all excellent examples, Jade. So, one of the use cases we're really proud to support is we were working with three different organizations who are trying to detect misinformation online. And this is a really important problem, not only because it's something that's facing society today, but it's also something where you can imagine the number of articles that are published online any given second, right? And you can't have a human army go and track down every social media post, every news article, and we can't label and detect which ones might be fake. So, this is an excellent use case where you want to build a model to read all of these. It won't detect with 100% accuracy, but I'm a big believer in a trend called human in the loop, where the machine will actually flag suspicious articles, and then they'll have a human go and review and verify and say, hey, is this actually fake? Maybe it's just a particularly new take. Maybe somebody's breaking news, right? And this is actually really important to understand. But other times, it's just misinformation, and it's something that should be shut down right away. So, cases for NLP and for AI, I think anything where you want to read text and understand text at scale, and that has a whole broad spectrum of use cases. Do you think it can also enforce civility on a platform where there's a lot of bullying, like on Twitter, for instance, or other groups that have community guidelines? I think it can. But this is now a very interesting philosophical question, right? A lot of people, I've actually been in charge of making sure that content on a certain platform is friendly to the users. And I approached this originally fairly naively. I thought, oh, well, we've got ratings for movies and such, right? We know the list of things that are definitely inappropriate. But it turns out that there's a whole spectrum of use cases, and it's not always easy to detect. And humans are always trying to get around the rules. So, for example, if you ban one explicit word, people will be very creative, for better or for worse, in finding ways around that rule. So, how can you always keep up to date with the ways that people are going to game the system? How are you going to make sure that you're enforcing these rules and standards safely? So, I do think that technology will have an important role to play. I've spoken with the director at YouTube, who's in charge of making sure that the comments are always going to be fair and non-toxic. Well, that sounds like a very tall order. Oh, absolutely, right? And the thing is, it's a never-ending problem. And there's essentially an infinite supply. But again, that's why technology can be important here in flagging the ones that are particularly egregious. So, how far along are you on your startup adventure? Yeah, so we started in the beginning of 2019. And, you know, we've been trying to focus. We're going to eventually expand this platform beyond text into images and video. But right now, we've been focused on this core space of natural language processing. We went through the startup accelerator, Y Combinator, in winter of 2020. And we were really lucky to raise funding, not only from YC, but from Initialized Capital, one of the best seed investors in the area, along with the CTOs of OpenAI and SegmentIO. And so, we've been able to use that backing to really double down this year and focus on expanding the platform. The other thing is that because of this shelter-in-place order, because of people not being able to go to work, we've seen a lot of companies double down on investing in AI. And because of that, we've seen usage of our platform grow by a couple orders of magnitude over the last four months. We've been beneficiaries of seeing that additional need for relabeled data. Silver lining to 2020 as an interesting year. It's a good time to be buckling down on a startup, if nothing else. How has 2020 been treating you otherwise, in terms of you as a company? Have you had to make adjustments at all? Yeah, so, well, it's a funny story because back in 2019, I had to make a hard decision. We had this unfair advantage in the sense that because of certain connections, I had access to a remote team of world-class engineers. But it was a difficult decision because I'd never worked this closely or relied this much on a remote team before. And I was worried it would be difficult to coordinate across time zones. I was worried at how investors would view this if you don't have your entire founding team in one place. But we went forward with it. We set things up, learned a lot about how to set things up remotely in 2019. And lo and behold, we're in 2020 and the rest of the world has decided to join us. So it actually worked out really well, right? We didn't have to go through a particularly difficult transition. We had these remote practices to begin with. And so 2020, it's certainly tough on all of us. You know, the pressures of the outside world and everything certainly get to us and our team. But at least the remote practice was something that we had down pat. Now, let's shine the spotlight a little bit more on you personally and your own journey up to this point. How did you get into NLP research? So, Jane, so we have to go way back. Right, when we first met, you were working in gaming. That's right. That's right. So I've always had a lifelong passion for games. And right out of school, my first adventure was actually to create a gaming startup. It was always a lifelong passion and dream of mine to create my own games. And so I've been there, done that, had a ton of fun building out a game called Geomon. It was a location-based monster capturing game back in 2010, before the days of Pokemon Go. But we had a lot of fun building that out for a couple of years. We ended up selling the company to Yahoo. And that was the kind of big career pivot for me. When we joined Yahoo, I thought I'd be working on revitalizing their Yahoo Games portal, something else that I had spent a lot of time on as a kid. But the priority shifted at the company. And instead of working on Yahoo Games, we were actually asked to focus on a new priority for the company, which was to refocus on their mobile search efforts. And so my manager tasked me with replacing some of the algorithms behind their search algorithms. We replaced these models that had been built and fine-tuned for over a decade with a machine-learned model. And it took us less than three months. And that was an eye-opening experience for me. Because this was, you know, I had used Yahoo Search as a kid growing up. The fact that I could replace and improve upon it in under three months was mind-blowing. And so I was hooked. And I continued to work as a product manager for machine learning for the better part of the last decade. And that's how I got to be part of this hype around machine learning and AI. How did you then make the leap from B2B to B2C? Yeah, so that was something that I struggled with. As you may know, sometimes people break the world of tech into two primary divisions, right? Selling to businesses and selling to consumers. I'd always gravitated towards B2C, towards selling to consumers. I loved being able to work on products that my friends and family could use, that I could explain really easily. And I spent the first decade of my career honing my skills on the B2C side. And there are a lot of skills that are kind of tailored towards that. Just the fact that I was working in gaming during the era of Zynga. We were very data-driven in our approach. We looked at all of these metrics and stats on how consumers at scale were using our product. And so that was how I always functioned. And when it came time to start my second startup, I had this idea around building this tool for other businesses. But I was hesitant because I felt like I was throwing away a decade's worth of work, of kind of practice on the B2C side. And I was worried about what it meant to learn sales as a B2B founder. But I decided that, you know, I thought hard about this. I spent three months really writing down the pros and cons and evaluating whether this might work or not. And I realized AI is the next wave. Just like mobile has become ubiquitous for us in the past decade, I think AI will become ubiquitous in the next decade. So to be a part of that movement, and to be part of that surge, I think will be really powerful. And on top of that, I came to turn this around and view it as an opportunity to go back to basics, to learn about the world of sales, to maybe apply some of the skills that I've learned as a B2C founder, as a product manager, to the world of B2B. And so Datasource, one of our core pillars is to create a really friendly experience. And that's not something that's always associated with B2B tools. We've designed this from the ground up to be really easy to get started with, to have very user-friendly interface, to make sure that people are as efficient on it as possible. So it's been really neat to kind of merge those two worlds together. So many startup founders have a story about being close to the brink, on the edge of survival, sort of like Airbnb's serial story. Do you have one to share? Yeah. So fortunately, we haven't been there with Datasource. So far, right? We've been really fortunate. We raised that round of funding earlier this year, and we're just getting started on our journey. But there was certainly a story with my first startup. We'd always focused, I think the focus back then was on growth at all costs. It's what we had seen at companies like Zynga at the time. And so we'd always focused on customer traction and usage and retention. But there was a specific moment where I had to sit down with my co-founders and let them know, hey, we've got less than a year's worth of runway left. We're counting down the months where we can even pay payroll for our team. So we can't just continue building out the game features that we are passionate about, that our users are asking about. We have to focus on monetization. And so it was an epiphany of sorts and a turning point for us. And we really focused on increasing monetization over the next six months. And we were able to increase that revenue tenfold and actually bring our startup to profitability. And that was just a really powerful, important learning experience from early on. Once we were profitable, it gave us options. We weren't desperate to raise another round of funding. We didn't have to go with a fire sale or even shut down the company. And so that left a really important mark on how I approached all of my products from there on out. As product managers, many of us like to just focus on creating the best user experience. But I learned how to balance that with the importance of having revenue and being self-sustaining. And so that's a lesson that I took home with me. I brought to a lot of my experiences at other companies. And I'm bringing with me to DataSore now. So you started your first company straight out of school. How did you decide that entrepreneurship was the right for you? That's always been a really interesting question. I wasn't someone who, you know, you hear of these stories of 14-year-old whiz kids who were determined to go and create their own startup and knew exactly what they wanted to do. That wasn't my story at all. In fact, we, if anything, stumbled into the path of entrepreneurship. We were just four friends who were really passionate about gaming. And we'd been given a class project to say like, hey, we've got this new phone, this device. The iPhone was relatively nascent at that time. The App Store was still relatively new. The class project was brainstorm what you could do with location or with some of the new features of this tool. And some people went and explored the camera. And we decided to explore the fact that we had location. And one of my friends and I were brainstorming and they were like, hey, what if we could bring some of the games that we played with, you know, a Zelda type game or a Pokemon-like game to real life? And as soon as they said that, like Pokemon for real life, we were hooked. So we still didn't set out to start a company. We actually wanted to just, you know, let's forego our internships for the summer. And let's just build this out and see where it takes us. More than anything, we just wanted to play this game ourselves. So we spent the summer building it out. It was a lot more ambitious than we had expected. So then we spent the next three months, next six months building it out. Is that the summer of 2010? Yes. 2010. Right. And at some point, one of us, like we couldn't, we ran out of savings. And so the next step was, well, we need to pay ourselves just a little bit, just to like get a house that we can live at together to save more money. So then we reached out to some venture capitalists because that's what we had seen our friends do. So then we raised money. So we really just stumbled into this one piece at a time. Right. We were just innocent and naive and figuring out like how to, how to get this game out there. But yeah, you know, we were very fortunate to be in Silicon Valley where there was a support ecosystem for what we were trying to do. And one thing led to another and we were doing a startup. Kudos also to the professor who assigned a school project and asked you what you wanted to do with phones, right? Yeah, absolutely. It's funny, right? The classmates that we had, I mean, you know, a year afterwards, Snapchat was founded to take advantage of the camera on the phone. A lot of my classmates in that original iPhone class went on to work at a lot of the early iPhone companies. I remember my TA was one of the first hires at Flipboard. So these stories are always very interesting to track. Like, okay, how can we, from a very academic setting, to turn into like real things that people want to use. So if you, as you are now, with all of the knowledge that you've accumulated over the years, got plopped into a magical Zoom call with yourself from the early days of Geomon, what would that conversation sound like, look like? So first of all, I'd be first in line to get that Zoom feature because that sounds really neat. Inspired by a YouTube skit. So, sorry, brief tangent there. I recently got access to GPT-3, which is, you know, this really interesting new algorithm from OpenAI from one of our investors. And people have been really hyped about it. And one of the interesting use cases is that it can write like Shakespeare or like James Joyce or whatever it is. And then ask them their opinions on much more recent and modern events, right? So you can ask them what their opinions are, for example, on Obama or on climate change. And it's super fascinating to get a sneak peek at how historically someone who wrote like one of these authors would write about problems in today's world. So I think that's just a really interesting parallel. Back to your question. There's no one thing that I think was like the breaking point epiphany for me in the past decade. My only advice, and this is something I, you know, I talk to current Stanford students as well. I think it's really important to double down on learning and growing. There have been a lot of pros and cons to having this World Wide Web. But at the end of the day, I really think that any resource you want, you can take the best class from, you know, Harvard or Stanford on filmmaking. You can go to something like Coursera. There are free resources to anything you might want to learn in the world. So I actually think one of the most important human traits moving forward is the ability to stay curious and the willingness to constantly learn and grow, right? When I stepped into DataSore and I was learning sales, it was back to square one for me. I felt very much like a newbie all over again. There was so much support in my network and so many materials online that I can learn from. I really think that's the one piece of advice that I would impart on myself. Never stop retraining the model on new data. Well put, well put, Jade. Let's see, before we go to audience questions, are there any pop culture art recommendations you have for the audience, film or book, TV show, comic book? Yeah, so there's a couple things I'm particularly passionate about, but top of mind, there are two shows that immediately come to mind. One I'm really sad about because I just heard that the Patriot Act with Hasan Minhaj was canceled on Netflix, but that was something that I really enjoyed. It was something very interesting to me as well because not only was it entertainment and educational about the world, but even as a founder, as a salesperson, I actually studied a lot of how he spoke and how he delivered his message. I actually took notes on how he delivered his messages and sought to apply that to my own profession. The other thing that a lot of people have been really hyped about, and I'm really excited to have a resurgence, is Avatar The Last Airbender. No real wisdom on that, except it's a phenomenally written show and ahead of its time. I actually have watched the Patriot Act. I'm not caught up to the end, but I will definitely have to check out Avatar The Last Airbender. Could you share what you learned from Hasan Minhaj? Yeah, this is a really nerdy thing, and my friends tend to laugh at me about this, but I think stand -up comedy is fascinating. Over time, I've gotten comfortable with public speaking, which me from 10 years ago, I would never have believed that statement. But stand-up comedy still is one of the things that I would fear the most, being up on stage and that idea of having a joke fall flat. It's like watching stand-up comics and thinking about how they deliver a punchline, how they do the setup, how they use their hands on stage and their body language, and using the rising fluctuation and intonation of their voice. Because oftentimes it's a monologue, right? So how do you keep people engaged? There's a rule that I saw about Zoom calls. It's like, never talk for more than three minutes, because you will absolutely lose your audience. They will stop paying attention, right? And three minutes is definitely a pretty upper bound on that. Right. And that's not a coincidence either. But a stand-up comic has to be on stage for like an hour and keep people engaged. So how do they do that? There's a lot of tips and tricks that we can learn from that. Mm -hmm. Yeah. Do you have any other favorite stand-up comics? Well, since you ask, another that I absolutely love and admire is Bo Burnham. He has two stand-up specials on Netflix. I realize I am very much hyping up Netflix right now. But his two stand-up specials are out of this world. And again, it's just kudos. He was somebody who became famous in the early days of YouTube, turned it into a full -fledged career in Hollywood. But his specials are some of the smartest that I've seen. Cool. Thank you. We don't have any audience questions yet, but anything else you want to share with the audience? Yeah, I think... I just wanted to take a moment and maybe touch on the topic of AI in society. I think it's a very controversial one right now. Even with GPT -3, there was a lot of controversy. And some of it was from the creators themselves, right? When they created the precursor to this GPT-2 last year, they actually said that they would not make this available to the public because they were afraid of how people would misuse it. One of the things that you can do with this is actually create a bunch of misinformation. And it becomes really trivial. The scary thing about this is that people have a really hard time when given an article written by this algorithm. They had a really hard time guessing whether it was written by an algorithm or by a real person. So we're starting to reach that threshold where it becomes increasingly hard to differentiate what technology is doing versus what somebody else is actually doing. So I think it's a double-edged sword. The thing about technology is it will always come. So we have to... The impetus is on us to figure out how to use it responsibly. I'm really proud that we have customers and users of Datasource who are using this to fight misinformation. There's a lot of good that can come out of this and it can propel humankind forward. But we have to be really conscientious about all the biases that are embedded in our AI. We have to be... And I think labeled data is something that a lot of people don't think about. But I think it's something that can be instrumental or we have to think a lot harder about it in order to help make sure that the technology is moving in the right direction. That the biases that the humans have aren't then also reflected in the AI. That's right. I remember reading an article about resumes being built by AI reflecting the same biases that we have. That's right. And reading resumes is one of the top use cases for our users. And so I'm constantly talking to them about, hey, you need to be really careful. It's not just about labeling the fact that Photoshop is a skill. It's also being able to recognize the biases of your human labelers and making sure that that doesn't translate into your model because that can have negative repercussions for your business moving forward. That's a good bunch of food for thought. Well, thank you so much for coming on our show. Thank you, Jade. Thank you, Claire.