Follow Palash’s Progress
New York Institute of Technology congratulates Palash Jain of MUMBAI for successfully completing the rigorous Internship Certificate Program this summer. Jain worked as a data intern at United Air...Summer 2018Verified by New York Institute of Technology
Data Science InternJune 2018 - Present
Performed data cleaning operations, statistical analysis and Sentiment Analysis of dense and unstructured data
Developed a classification model with efficiency of 87% which classifies textual maintenance data into various categories
Predicted the time taken by the agent to respond to the last customer email by building and validating regression models
Exported the cleaned data to Data Robot to get the best model by running 10 different models in the Hadoop cluster
Created Teradata SQL queries in Teradata SQL Assistant for ad-hoc data pull requests & generating reports
New York Institute of Technology-Manhattan Campus
Data Science AssistantSeptember 2017 - May 2018
Assisted professor Dr. Houwei Cao on her ongoing projects by performing data wrangling operations on huge datasets. Analyzed the datasets by performing exploratory data analysis, missing value imputations and variable transformation
Applied Natural Language Processing methods in Python to provide meaningful insights about users textual data. Deployed classification models with the efficiency of 80% using various machine learning algorithms in Python. Developed interactive dashboards and visualizations worksheets in Tableau to analyze the data as per the requirements
One of the projects was analyzing the Yelp dataset. Performed data mining and data manipulation operations on Yelp dataset which have 4.1M reviews and 947K tips by 1M users for 144K business and connected the dataset with Tableau to build interactive worksheets and dashboards.
Created a classifier to classify the reviews using Machine learning algorithms like Nave- Bayes, Support Vector Machine, Natural Language Processing, Random Forest and Decision Tree in Python.
Created unigrams of the words of the text in the reviews dataset based on different star ratings assigned to them. Created a document-term matrix (dtm) using TF-IDF Vectorizer ( term frequency and inverse document frequency) and then did 5 and 10 cross validations to calculate the overall accuracy.
In case of 5 cross-validations achieved an accuracy of 83 % while in the case of 10 cross-validation achieved an accuracy of 85 %. Achieved an accuracy of 80% using the classifier Support Vector Machine and 79% accuracy using the Random Forest classifier. Both the classifiers were used to create the dtm using TF-IDF Vectorizer
Used Latent Semantic Analysis (LSA) which is a technique in Natural Language Processing and created bag of words to identify the positive and negative reviews
Graduate AssistantJuly 2017 - December 2017
Perform statistical analysis using Excel of the undergraduate and the graduate students dataset for the Global Engagement department collected from several heterogeneous sources
Manage and analyze the student database to inspect the student profiles based on their performances
Draft financial budget reports using student database to provide financial aid to students in an effective manner
Data AnalystAugust 2015 - July 2016
Client: Retail Industry
Conducted Joint Application Development (JAD) sessions with the end-users, SMEs and development team throughout SDLC to gather and analyze various requirements and data.
Extracted, aggregated and manipulated huge data sets from multiple sources of various sales data of the Retail Industry. Performed data mining, statistical and regression techniques on the gathered data using Python packages and Excel. Examined & corrected data with issues like completeness, accuracy, redundancy using SSIS (ETL packages). Wrote complex SQL queries for easy data analysis among the different competitors and achieved efficiency by 10%
Designed monthly sales reports dashboards in Tableau by combining heterogeneous data sources using data blending. Build reports, and dashboard to identify key performance indicators (KPI) for the business to aid strategic decision makings. Took part in daily scrum meetings, developed daily and weekly reports to present the findings to the higher management
Data Science AnalystJanuary 2015 - August 2015
Performed exploratory data analysis, feature engineering & extracted numerical features from existing variables in Python. After applying preprocessing, feature engineering and feature selection, the following features were extracted:
Average call duration( (start time - end time) / total count) computed for inbound, outbound, International, and Roaming in one month.
Counts of incoming, outgoing, international, roaming calls over six months.
Billing features such as Bill Amount, Number of times average spending < Rs 300 in the past 6 months(binary).
User features such as age, tenure, location, total revenue.
Deployed multiple machine learning classification modeling techniques to model likelihood of telecom customer churn
Using the model predictions, retained 30% of the likely-to-be churn prepaid customers through churn management
Designed ad-hoc visualizations and impactful dashboards in Tableau for communicating actionable insights to clients
Summer InternMay 2014 - June 2014
Maintained data as per requirement of management authority by the use of various aspects such as templates, graphs, charts, and pivot tables. Improved data by filtering out irrelevant information and prepared it for use in existing processes.
Documented the technical and business requirements and worked with teams to reach the target on time. Developed interactive visualizations and dashboards as per the requirements from the stakeholders using Tableau