It's nice to meet you! I'm Deepayan, one of the Neevlets from the 2021 intern class. I welcome you to learn more about Neeva through my internship reflections and hope you find my perspective useful!
Neeva is a wonderful place to work. You'll own important components to the product and organization start-to-finish, contribute to an exciting challenge with strong market potential, and make lifelong connections as you become part of the work-family.
I had a rather atypical application process at Neeva. Entering my final year of university but thinking about returning for graduate school, I had a tough decision to make on deciding to apply for full-time or internship roles. I ultimately decided it would be best to interview for a full-time position at Neeva; however, by the time of my final interviews, I had decided to return for a Master's program at my university, and joining the internship cohort offered a perfect fit. I'll detail my full-time application experience and discuss the differences in the internship application process further below.
To begin your interview process, Neeva takes the refreshing approach of offering a take-home project on a topic related to the role you apply for. Instead of having to work through an irrelevant coding challenge as a first step, you get to work on a more practical demonstration. After someone from the company reviews your project, you will be reached out to by your recruiter to schedule your first interview.
My first interview, with the storied Todd, was memorable -- we ended up discussing my Compilers class and prior work with database optimizers. Our conversation was the most fun and interesting interview I had throughout the year; after all, I was talking with one of the first TensorFlow compilation engineers. Soon after, I was invited to go through the onsite interview process. The onsite consisted of three interviews -- one with Asim (an encouraging sage) and Daniela (my soon-to-be teammate), another with Matt L. (my future mentor), and one with Bindu (a systems expert and fellow CMU alumna) -- and a debrief with the spirited Vivek, one of the co-founders of the company. Each interview was definitely challenging, going through many design iterations, but rewarding in the same vein. I even learned and applied new general systems knowledge and techniques from the interviews. By the end of the process, I was confident that I would have an exceptional set of mentors that would both be there to support me through the internship and also challenge me to be a "10x engineer," as those at Neeva say.
The internship application process differs slightly. Instead of working on a take-home project, you start with a more targeted coding round carefully selected by our engineers behind the scenes and finish with a shorter series of interviews.
I worked with big data, and by big data, I mean BIG data.
For context, my backup drive at home has 1TB of storage, and I haven't hit that storage cap in over four years of using this laptop. At Neeva, I would consistently run pipelines on compressed data totaling ~310TB, but uncompressed would reach over 1.2 PB -- in other words, 1200 TB, still only a fraction of the complete dataset.
Neeva, as a search engine, is working on bringing up its own search stack, from crawling, to indexing, to serving. I worked on bringing together the former two components of our search stack.
Our pipeline for crawling the web encounters a firehose of input links each day, far more than the reasonable need for the system to fetch, and must select a subset to run the fetchers on. My starter task was writing a lightweight scorer for our URL selection task, for which I decided to implement a quantile estimator.
The North Star
Quickly, however, the focus of my work shifted to a new crawl output selection tool as other teams needed the data produced by our backend faster than expected. This tool would become the primary interface to access and retrieve all the pages and documents we have crawled, and select a subset of the stored data based on a query set of URLs or hostnames. This tool would also become the "north star" of my internship as I worked to develop and improve the component throughout the rest of my internship.
My first pass at the tool was implemented and utilized very quickly -- in just a few weeks, the tool generated a new version of a tech repository used to serve tech-related queries by the search engine. This particular repository consisted of over 171 million URLs and enabled further progress on the serving side.
As Q3 came along, it became clear that the existing infrastructure could improve on two fronts: completeness and efficiency. Given the progress made at the start of my internship, I was granted the responsibility of one of my team's quarterly OKRs (Objectives and Key Results) -- building out this tool to scale to O(1 billion) URLs.
On the completeness end, we were seeing many pages that we hoped to crawl had moved. As the fetchers independently strengthened to retrieve pages relocated elsewhere, I implemented the same functionality in the selection tool so as to ensure we collected as much representative data as possible for the input set.
Efficiency was a much more challenging battlefront. Our retrieved crawl data was stored in the ancient WARC (Web ARChive) file format. Though at the time of creation WARC was a wonderful format integrating well with the (typically HTTP) data streams read by crawlers, it had long since become outdated. The format was originally chosen at Neeva for interoperability with Common Crawl, but in building our own index and serving framework, it now seemed best to move to a new file format. Apache Parquet was the clear winner. The Parquet format has all the benefits of column-oriented storage, letting tools operate on data much more efficiently, and as a large positive, is strongly integrated with both Apache Spark and AWS S3. The last point is significant -- instead of having to inefficiently read, parse, and convert the WARC file records one at a time to be used by our processing pipeline, we could immediately use Spark to operate on Parquet input data stored in S3 in a truly relational way.
Enabling the change, though, meant our fetcher had to write a new output format and our tools needed to be able to utilize these newly generated files. I took on supporting Parquet output in our fetcher, updating our selection pipeline to operate on data more efficiently using the altered storage format, and writing validation tooling to demonstrate information preservation in our new output files. In the meantime, my teammates Daniela and Meri took on the task of updating all the previously crawled data produced by our fetcher and updating our extraction and annotation tool, respectively, while my mentor Matt took on other stability improvements in the fetcher to improve error-handling at various points of failure in the system.
With a few sprints to finish up my internship, we finished all of the above, and eventually, the targeted dump was successfully retrieved!
In the absence of an explicit thank-you section, it's important to express that I owe a lot of credit to the team at Neeva for being so open to discussion and collaboration. I want to call out a few folks to especially appreciate.
Matt L.: I could not have asked for a better mentor than Matt. He was approachable from day one and has a way of guidance that makes everything just click. His leadership and faith in me made me hope to one day become as strong a mentor myself.
Daniela and Meri: An extension of that is an appreciation for my whole team! Daniela is a wonderful coworker who helped me onboard with her own context in joining a year previously, on top of being a stellar example to follow as an engineer. Meri added even more technical strength and liveliness to this group when she joined as a part of the 2021 new grad class.
Varun, Sean, and Matt D.: It was awesome to work across teams with Varun, Sean, and Matt. They trusted me with building the selection tooling and generating output data for their teams to use, and I enjoyed our opportunities collaborating closely.
Lara: Lara is a superwoman, having helped me from step one of my internship journey. She handled all my hopeful ideas and proposals, on top of her already insane responsibilities managing projects and people.
Vivek and Sridhar: Going back to my first conversations with Vivek and Sridhar, it was very clear that both co-founders are fully committed to their mission in creating Neeva, one of the main reasons I decided to accept their offer. Their excitement and visions are captivating and they see a bright future for the company, which I envision as well.
Arjun, Evan, Ibiyemi, Jed, Paul, Sakura, Savanna, and Veronica: The entire 2021 intern class was amazing, and they were a huge part of making the internship so enjoyable!
And to the many, many people who have been gracious enough to provide advice and perspective on your own journeys as an engineer -- thank you so much for being open to providing mentorship and guidance!
The Big Little Things
It's difficult to express how connected this team is; even after 1.5+ years of dealing with the pandemic, people are continuously cordial, collaborative, and supportive. Small things add up to make a significant difference in the mentality and culture of an organization, and Neeva made every effort to go above and beyond to make the workplace a positive community beyond our jobs.
We interns were introduced to the exceptional work-life balance quickly and with lots of support. The wonderful Kevin set up weekly lunches for the interns to interface with others across every aspect of the company -- we received a complete tour of how this particular startup functions, from customer experience to infrastructure design. In addition, to preserve a sense of camaraderie during these difficult times, a large proportion of the employees would set up and attend weekly virtual lunches and happy hours.
One of the highlights every month is the company-wide virtual game night. Nicole, the trivia extraordinaire, set up our first game night, a two-team scavenger hunt competition that sparked our competitive spirit but ended in a tie (a literal win-win situation with all of us earning the prize 🙂). Michelle, puzzle-hunt aficionado, followed-up to organize the second game night, an 80s throwback escape room, a very fun way to exhibit some quick thinking and teamwork.
Escape Room Team Background (edit creds: Nicole Filipelli)
Savanna, a fellow intern, set up weekly game hours for us to unwind as a group. Group favorites were Catan, Gartic Phone, Scattegories, and Codenames. One week, I experienced the particularly annoying event of having my wisdom teeth removed (no more soup or mashed potatoes for me for a long, **long** time), but which eventually resulted in my favorite portrait, myself chilling with my wisdom tooth:
My Wisdom Tooth and I (drawing creds: Savanna Yee)
I was also given the freedom to plan another intern event with the financial support of Neeva! As our internships began to wind down, we set up a dinner and movie night where we prepared and ate a meal as we watched Pixar's Luca together.
Neeva launched to the general public on July 29, a few weeks after most interns joined. In celebration of all the work gone into building and launching Neeva, a rooftop launch party was organized for those part of the journey (following all health and safety guidelines).
The event was a huge success with those in the Bay area coming together, many for the first time, having joined the organization in pandemic times. A night of mingling, wonderful city views, and quality dining was capped by a pair of powerful speeches by Sridhar and Vivek, speaking the praise of this work family.
Those that know me personally can appreciate how much I enjoyed and valued this summer experience; for those that don't, I'm not one to sing unearned praise, so when I do, it really counts. To those prospective candidates making a decision, I strongly encourage you to take the next step in your journey with Neeva -- you might just dive into a wonderful experience.