NLP interview Q&A:
- What do you mean by NLP? and discuss real time use cases of NLP?
- Define structured and unstructured data?
- Discuss NLU and NLG?
- What are the tools required to perform NLP?
- Discuss the difference between NLTK & Spacy?
- What are the steps involved in NLP pipeline?
- Discuss the importance pre-processing techniques in NLP?
- What is the difference between NLP and CI?
- What are regular expressions and their applications?
- What is Information extraction?
- What is Text similarity?
- What is Text classification?
- What is Text summarization?
- Is it necessary to convert text into number for model training?
- What is Tokenization?
- What do you mean by stemming?
- What is Lemmatization?
- What are the differences between stemming and lemmatization?
- What are stop words in text process?
- Explain TF-IDF and it's purpose?
- Discuss Named entity recognition?
- Explain how feature engineering implement in NLP?
- Explain parts of speech in text processing?
- What do you mean by Bag of words in NLP?
- Discuss N-grams? and where we use in real life scenarios?
- What is Syntactic analysis?
- Explain Semantic analysis?
- Explain how parsing done in NLP?
- What is Latent semantic indexing in NLP?
- Discuss NLP metrics?
What do you mean by NLP? and discuss real time use cases of NLP?
Natural language processing is a field of AI and computer science, that gives machines an ability to understand human language better and assist in language related tasks.
For instance, face-to-face conversations, tweets, blogs, emails, websites, SMS messages, all comes under natural language. In NLP we have to find useful information from natural language.
NLP use cases:
- Information extraction
- Text summarization
- Text classification
- Text similarity
- Voice recognition
- Language translation
- Chat bots
According to industry estimates, more than 80% of the data being generated in unstructured format, may be in the form of text, image, audio, video, etc. A few examples include- posts/tweets on social media, chat conversations, news, blogs, product or service reviews of E-commerce and patient records in health care sector.
Structured data: The elements in data, organized in a pre-defined format, like rows and columns( Excel-file).
Unstructured data: The elements in data are not organized in pre-defined form.
In order to produce significant and actionable insights from text data, we use Natural language processing coupled with machine learning and deep learning.
Discuss NLU and NLG?
NLP consists of both Natural language generation(NLG) and Natural language understanding(NLU) to achieve language related tasks.
NLU is an ability of a machine to understand, process the speech or text of human language, the capability to make sense of natural language.
NLG is an another sub-category of NLP, that construct the sentences based on the context.
What are the tools required to perform NLP?- NLTK
- Spacy
- TextBlob
- Standford NLP
Discuss the difference between NLTK & Spacy?
- NLTK is a python library, which means Natural Language Tool-kit. NLTK is a mother of all NLP libraries, where Spacy is newly developed NLP library.
- NLTK supports wide range of languages as compared to Spacy.
- Spacy is object oriented library, NLTK is string processing library.
- Spacy can support word vector while NLTK cannot.
What are the steps involved in NLP pipeline?
Data Acquisition: The procedure of collecting required data to find insights and patterns. This data in text form, audio, chat and SMS messages and etc.
Data cleaning: The data we gathered will be in different formats like structured and unstructured so we need to clean and extract the required data. Like performing deleting null values, duplicates
Pre-processing: In this stage, we perform tokenization, stemming, lemmatization and many.
Feature Engineering: In this step, we will create and manipulate essential features for the model.
Model building: We will choose the perfect model that suits our requirements.
Evaluation: In Evaluation stage we test how the model performing with new instance. We check the model accuracy and try to get best accuracy.
Deployment: In this step we deploy our model in server, for the users.
Monitor & Update: After deploying the model, accuracy may decrease over the time. So it is essential to monitor and update the model for better usage.
Discuss the importance of pre-processing techniques in NLP?
The data we gathered is in combination structured and unstructured formats, there may be so much unwanted text. This unimportant text will leads to low accuracy, and might it hard to understand and analyze. So, proper preprocessing is must be done on raw data.
The pre-processing techniques in NLP are:
- Tokenization
- Stemming
- Lemmatization
- Parts of speech
- Named entity recognition
- Bag of words
- TF-IDF
- N-grams
- s1 = " Hear peace "
- s2 = " See peace "
- s3 = " Speak peace "
- D1= " I am happy for your success "
- D2= " I am sorry for your loss"
- D3= " He is sorry, he cannot come "
- I ---> Person
- Master's ---> Education
- Acharya Nagarjuna university ---> Organization
- Guntur ---> Location.
- D1= " I am happy for your success "
- D2= " I am sorry for your loss"
- D3= " He is sorry, he cannot come "
- One Hot encoding
- Count vectorizer
- N-grams
- Co-occurrence matrix
- Hash vectorizer
- TF-IDF
- Word embedding
- Implementing fastText.
- NER is the process of information retrieval that helps identify entities such as the name of a person, organization, place, time, emotion, etc,
- WSD helps identify the sense of a word used in different sentences.
- NLG is a process to generate the text.
Comments
Post a Comment