An Essential Guide to Python Text Editing & Processing

As a programmer, a large part of my work may involve working with texts & Strings. That’s why it is crucial to know some of the fun aspects of Python text editing & processing using the standard library. 

As a result, I thought about writing this tutorial for those who want to get their hands dirty with Python text editing & processing. 

Here’s what we will cover:

  • Work on three basic String operations through the standard library. 
  • Learn how to search Strings for specific information. 
  • Understand how to parse and manipulate string. 
  • Learn how to format Strings. 

Before I move further, you already need to be familiar with Python. Moreover, I highly recommend that you have the basics down first before jumping into Python text editing & processing.

Need a little refresher on Python? Check out this cheat sheet for reference:

Video tutorials on Python text editing & processing are available from Linkedin Learning. Check out the following course:

So without a further due, let’s get started.

Basic String Operations

In the first part of the Python text editing & processing article, we will learn how to perform basic String operations.

One of the powerful features of the Python standard library is that it provides essential tools for testing and filtering String content.

Go ahead and open your IDE to import the string module.

import string

Let’s see some of the pre-defined String constants the string module provides.  And these are:

print(string.ascii_letters)
print(string.digits)
print(string.hexdigits)
print(string.punctuation)

Output:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
0123456789abcdefABCDEF
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

Here we used the string module to print out some of the constants that the standard library provides. And as a result, you can see we have some values such as letters, digits, and even hex values.

Moving on, let’s create three variables and assign some values or Strings to them.

stringOne = 'It is raining cats and dogs.'
stringTwo = 'Labratory'
stringThree = '01234567'

Moreover, I also want to check if a String is alphanumeric or not. For that, we can use the alnum() function. Here’s how it works:

print(stringOne.isalnum())
print(stringTwo.isalnum())
print(stringThree.isalnum())

Output:

False
True
True

The outputs are all boolean values. The first result shows that our variable stringOne is not alphanumeric since there are spaces within the Strings. On the other hand, stringTwo & stringThree came out to be alphanumerics since they meet the condition of having alphabets or numbers.

Similarly, we can also check if a String only contains alphabets. To do that, we can use the isalpha() function. Let’s check it on the stringOne. 

print(stringOne.isalpha())

Output:

False

The output is False because stringOne has spaces between the words. 

Spaces are not considered alphabets by Python. As a result, when you run the function isalpha() on stringOne, we get False.

Next, we can also verify if a String is numeric or not using the isnumeric() function. 

And this is how it works:

print(stringThree.isnumeric())

Output:

True

Let’s run this function on stringTwo:

print(stringTwo.isnumeric())

Output:

False

As you can see, that the outputs are True and False

True because stringThree has numeric characters, while the other one is False since that has no numeric values. 

Recommended: Python Modules & Packages for Absolute Beginners

Searching Strings

We can also search Strings for specific data. Fortunately, the Python String object comes with built-in methods to do this.

First, let’s see how we can use the startswith() endswith() functions to check if the String starts or ends with a specific sequence of characters. 

Go ahead and type the following:

sampleString = 'The sky is blue'

Now I want to check whether our sampleString has ‘the’ in it. Thus, I can type:

print(sampleString.startswith('The'))

Output:

True

We get True since the word is present.

Okay, now what if we passed the lowercase version of that word.

Type the following:

print(sampleString.startswith('the'))

Output:

False

So, these functions are case-sensitive. Since lowercase ‘the’ is not present, we are getting False.

Lastly, let’s try the endswith() function.

print(sampleString.endswith('blue'))

Output:

True

We get True because we can see that the word ‘blue’ is at the end of our phrase. 

Apart from the startswith() and endswith() functions, there are also other general functions such as find() and rfind()

You can use these two functions for finding data anywhere in the String. 

First, let’s see the find() function in action. We can try it on the sampleString that we created earlier.

print(sampleString.find('sky'))

Output:

4

For example, we passed the sky. This String exists at index position 4.

Additionally, you can also check for sub-String using the in operator:

print('blue' in sampleString)

Output:

True

As a result, we get True since the word blue is present within the sampleString.

Moving on, let’s talk about the replace() function to replace contents within a String. 

Let’s try it on our sampleSring. For example:

print(sampleString.replace('blue','black'))

Output:

The sky is black

So, I used the function to replace the word blue with black.

Before I end this section, I also want to show you the count() function. 

This function allows us to count the instances of a sub-String and returns the number of times it occurs. 

I am going to create a new String and use the count() function on that. 

newString = 'I can see a car parked outside our house. It is a red car.'

Now let’s see the count() function in action:

print(newString.count('car'))

Output:

2

And as you can see that we get two since the word car occurs twice. Also, remember that these functions are all case-sensitive. If I pass the word car with an upper case C, then the result would be 0. Since no words are within our String, starting with an upper case C

You may like: 

String Manipulation

An essential concept that involves Python text editing & processing is the ability to manipulate String.

First, let’s start by defining a new sample String. So, I will type:

newString = 'It is a sunny day in Southern California.'

To convert all the letters into upper case, we can use the upper() method. 

print(newString.upper())

Output:

IT IS A SUNNY DAY IN SOUTHERN CALIFORNIA.

The same goes with converting all the letters to lower case. We will use the lower() method. 

print(newString.lower())

Output:

it is a sunny day in southern california.

Now let’s talk about the split() function, which you can use to break up a single text String into multiple Strings. For instance, suppose we want to split each of the words within our newString into an array based on spaces between them. 

Here’s what I mean. Type the following:

splitString = newString.split(" ")
print(splitString)

Output:

['It', 'is', 'a', 'sunny', 'day', 'in', 'Southern', 'California.']

As I ran that, you can see that we have split the String into an array.

What if you want to join all the words inside your array? For that, we can use the join() function. It does the exact opposite. 

The join() function will take the individual Strings and convert them into one single String. 

For example:

joinString = ' '.join(splitString)
print(joinString)

Output:

It is a sunny day in Southern California.

Now we have successfully converted the splitString array and joined them together again.

We joined each String into one single String, with the spaces between them being the separators.

On the other hand, if you don’t want any space between the words, then type this:

joinString = ''.join(splitString)
print(joinString)

Output:

ItisasunnydayinSouthernCalifornia.

As a result, the function went with attaching all the words since we passed no separators.

Python has a feature that allows us to perform batch replacement of characters in a String through a translation table. 

Here’s what I mean. 

First, let’s create a translation table with the maketrans() function. 

transTable = str.maketrans('abcdef','123456')

So now we have our translation table defined as transTable with some characters that we want to replace with some numbers. For instance, the letter a will become 1, and b will become 2, and so on

Then go ahead and define a new variable with the following:

myString = 'This is a big deal for California.'
print(myString)

Output:

This is a big deal for California.

Now let’s go ahead and use the translation table transTable to perform the translate operation. To do that, we’ll use the translate() function.

So type:

print(myString.translate(transTable))

Output:

This is 1 2ig 451l 6or C1li6orni1.

You can see that some of the characters have now become numbers we defined in our translation table transTable earlier. 

To sum up, batch replacement in Python allows us to define translation tables that we can use to replace characters with some other characters that we specify.

In the next part of the Python text editing & processing tutorial, we learn how to work with String formatting.

You may like:

String Formatting

In this part of the Python text editing & processing tutorial, we will take a good look into String formatting.

Above all, we will solely focus on formatting Strings using the template Strings and the format() function. 

Now, let me create and define a new variable. We can call it newString2.

newString2 = 'It is a $weather day in the $direction Coast.'

I will explain in a bit why I am using the dollar signs.

Okay, the first concept in this part is how to use template Strings to perform String formatting.

I have to import the class Template from the String module. Here’s how you do it:

from string import Template

Now you can see that  newString2 has two dollar signs which shows that these are placeholders or variables.

We can substitute these placeholders with Strings or values. There are usually two steps involved to do this. 

First, we create an instance of the Template class:

theInstance = Template(newString2)

Then you have to call the substitute() function on the template and pass the placeholders with values. 

Here’s how it works: 

outputString = theInstance.substitute(weather='Rainy',direction='East')
print(outputString)

Output:

It is a Rainy day in the East Coast.

The output shows that we have actual values rainy East replacing the placeholders we defined before using the dollar sign. 

Let’s move on to learning how to use the format() function. First, I will start by creating two variables as follows:

fruit = 'apples'
color = 'red'

So what we are going to do is substitute these two values into a new String using the format() function.

And this is how it works:

newStr = 'Those {} are {}.'.format(fruit,color)
print(newStr)

Output:

Those apples are red.

We can see that the curly braces acted as placeholders. After that, we used the format() function and passed the names of the variables fruit and color.

In short, it took the two variables in order and put their values into the String where we have the curly braces. 

In addition, you can also substitute named variables. Here’s how it works:

print('Those {var1} are {var2}.'.format(var1=fruit,var2=color))

Output:

Those apples are red.

The variables var1 and var2 are substituted.

Replacing & Removing Characters from a String

One of the fastest ways to replace a character with a new character in Python is by using the replace() function. 

For example, we have the following String:

myStr = 'abcdefghijk12345'

I want to replace the first character a with the letter z, and here’s how I can do it:

print(myStr.replace('a','z'))

Output:

zbcdefghijk12345

The replace() function returns a new String, while the original String remains unchanged. If you want the change to be permanent, then you can do this:

myStr = myStr.replace('a','z')
print(myStr)

Output:

zbcdefghijk12345

Now the change is permanent for myStr.

The replace() function can also remove characters if we don’t pass any values as the second parameter. 

For example:

print(myStr.replace('z',''))

Output:

bcdefghijk12345

As a result, we can now see that the letter z is gone from our String.

Lastly, the replace() can also remove spaces from a String. 

Let’s say I have the following String:

newStr = 'a b c d e f g h'

You can see that we have extra spaces between the characters, so I want to get rid of them.

For that I can type:

print(newStr.replace(' ',''))

Output:

abcdefgh

Moreover, to make the change permanent:

newStr = newStr.replace(' ','')
print(newStr)

Output:

abcdefgh

As a result, the change is now permanent for newStr

Python tutorial you may want to read:

Conclusion

This section brings us to the end of Python text editing and processing. We went over some of the crucial concepts that you will need to know if you want to edit, process, and manipulate text with Python.

Depending on your situation, you may have to search Strings, format Strings, remove characters from a String or even replace them with new values. When it comes to Python text editing & processing, the functionality of Python can be endless.

Most importantly, if you work with big data or need to parse specific characters or sub-String, Python text editing and processing is an efficient way to handle those data.

On a different note, I like to recommend the following video courses on Python from Linkedin Learning:

And after the essential training, check out the advanced version of the Python course: 

Also, the world is moving toward a newer version of Python. The Python 3.9.

Learn about Python 3.9 through an interactive video course:

The text editing & processing functionality of Python is pretty cool. What other programming languages do you think are also suited for working with text?

Leave a Reply