Sentence Segmentation Using NLP

0
2798

Introduction Sentence Segmentation Using NLP

Before going to start how we can do Sentence Segmentation Using NLP. We need to understand what NLP is? NLP is a branch of data science that consists of systematic processes for analyzing, understanding, and deriving information from the text data in a smart and efficient manner.

Natural Language Processing can be stated in layman terms as the automatic processing of the natural human language by a machine. It is a specialized branch of Artificial Intelligence that primarily focuses on interpretation as well as human-generated.

But the computer can understand language?

When developing the first computer in the world they can understand only binary code i.e. in the form of 0 and 1.Humans have been writing things down for thousands of years and it would be really helpful if a computer could read and understand all that data.

But what about a text or hundred or million of a sentence, is it possible in binary code.

Computers can’t yet truly understand English in the way that humans do — but they can already do a lot! In certain limited areas, what you can do with NLP already seems like magic. You might be able to save a lot of time by applying NLP techniques to your own projects.

And even better, the latest advances in NLP are easily accessible through open-source Python libraries like spaCy, textacy. What you can do with just a few lines of python is amazing.

Building the NLP step-by-step

Let’s look at sample text, 

Mumbai or Bombay is the capital city of the Indian State of Maharashtra. According to the United Nations, as of 2018, Mumbai was the second most populated city in India after Delhi. In the world with a population of roughly 20 million. As per the Indian government population census of 2011, Mumbai was the most populated city in India.An estimated city-proper population of 12.5 million living under Municipal Corporation of Greater Mumbai.

This paragraph contains several useful facts. It would be great if a computer could read this text and understand that Mumbai is a city, Mumbai is located in India, and so on. But to get there, we have to first teach our computer the most basic concepts of written language and then move up from there.

Sentence Segmentation Using NLP

The first step in the pipeline is to break the text apart into separate sentences. That gives us this:

  1. “Mumbai or Bombay is the capital city of the Indian state of Maharashtra.”
  2. “According to the United Nations, as of 2018, Mumbai was the second most populated city in India after Delhi.”
  3. “In the world with a population of roughly 20 million.”

We can assume that each sentence in English is a separate thought or idea. It will be a lot easier to write a program to understand a single sentence than to understand a whole paragraph.

Coding a Sentence Segmentation model can be as simple as splitting apart sentences whenever you see a punctuation mark. But modern NLP pipelines often use more complex techniques that work even when a document isn’t formatted cleanly.

Tutorial


Widget not in any sidebars
Tutorial
Import ‘nltk’ library. This library Best for NLP including all process.
Import nltk  #library 
Text = is a variabl that store whole paragraph.
sentences = nltk.sent_tokenize(text)  #whole paragraph break into sentence.
for sentence in sentences:
	print(sentence)
	print()
Output: 
Mumbai or Bombay is the capital city of the Indian State of Maharashtra.

According to the United Nations, as of 2018, Mumbai was the second most populated city in India after Delhi.

In the world with a population of roughly 20 million.

As per the Indian government population census of 2011, Mumbai was the most populated city in India.An estimated city-proper population of 12.5 million living under Municipal Corporation of Greater Mumbai.

Conclusion

In this article, we are learning about break a paragraph into a number of sentences with the help of the NLTK library. Generally, the sentence is broken after a full stop.


Widget not in any sidebars

LEAVE A REPLY

Please enter your comment!
Please enter your name here