With all that personal identity information out there on the Internet that we all hear about, why is it so hard to find? On reason is that with the tremendous amount of data available, it actually becomes very difficult to extract useful information. But luckily with the advent of recent technologies, such tasks become approachable.
The primary mission of this role is to design and optimize crawlers for this purpose. Furthermore, one should develop and execute data mining solutions to extract personal identity information from the collected data.
Work at an industry pioneer with the best engineers and data scientists in the business to build the world’s largest database of user identity information.
• B.S. or B.A. in Computer Science or an a similar degree – required
• Experience with PHP or Python – required
• Ability to work with big data and highly scalable programming experience – required
• Experience working with databases including SQL and NoSQL (MySQL/MongoDB) – required
• Experience with Linux (Bash scripting) – required
• Experience with R, Hadoop, MapReduce – nice to have
• Experience building web crawlers and robots – nice to have
• Experience with message queues (RabbitMQ) - nice to have
• Experience with GIT, Subversion – nice to have
• Ability to document and experience in the field – very nice to have
• Knowledge in machine learning a BIG plus
• Proven ability to work in a fast-paced environment and to meet changing deadlines/priorities in simultaneous projects;
• Excellent organizational, communication and interpersonal skills; enjoy working in both individual and team settings;
• Very High Proficiency in both speaking and writing English.
Trulioo - The Identity Bureau. Trulioo (www.trulioo.com) provides global consumer information and ID verification based on aggregated cyber identity data from sources including social login providers, ad networks, mobile applications, e-commerce websites, payment processors and our customers. Trulioo specializes in scoring cyber identities as authentic, machine generated or fraudulent. more