Introduction

In regular expressions (regex) a character set is used to match one character from a set of characters. This tutorial is for anyone who is interested in learning about character sets in regex and how to use them in Python. We will start from the basics and provide simple and practical examples. By the end of this tutorial, you will be able to create your own character sets and use them effectively in your Python code. So, let's get started!

Table of Contents :

  • Python Regex Character Sets
  • \d: Digit Character Set
  • \w: The Word Character Set
  • \s: Whitespace Character Set
  • Inverse Character Sets
  • The Dot(.) Character Set

Python Regex Character Sets

  • A character set, also known as a character class, is a way to match one or more characters in a regular expression.
  • Character sets are enclosed in square brackets and match any single character within the brackets.
  • Let's look at some of the commonly used character sets in regex.

\d: Digit Character Set

  •  \d  matches any digit character  (0-9) .
  • Here's an example of using  \d  in a regular expression:
  • Code Sample : 

import re

pattern = r"\d"
string = "There are 123 apples"
result = re.findall(pattern, string)
print(result)

# The above example will match all digit characters in the string and return them as a list.



\w: The Word Character Set :

  •  \w  matches any word character, which includes letters, digits and underscore.
  • Here's an example of using \w  in a regular expression:
  • Code Sample : 

import re

pattern = r"\w+"
string = "Hello, my name is John. I am 28 years old."
result = re.findall(pattern, string)
print(result)

# The above example will match all word characters in the string and return them as a list.



\s: Whitespace Character Set :

  •  \s  matches any whitespace character, including space, tab, and newline.
  • Here's an example of using  \s  in a regular expression:
  • Code Sample : 

import re

pattern = r"\s+"
string = "Hello,\n\t my name is John. I am 28 years old."
result = re.sub(pattern, " ", string)
print(result)

# The above example will match all whitespace characters in the string and replace them with a single space character.



Inverse Character Sets :

  • An inverse character set matches any character that is not in the set.
  • The inverse character set is indicated by placing a  symbol at the beginning of the set.
  • Here's an example of using an inverse character set:
  • Code Sample : 

import re

pattern = r"[^aeiou]"
string = "Python is easy"
result = re.findall(pattern, string)
print(result)

# The above example will match any character that is not a vowel and return them as a list.



The Dot(.) Character Set :

  • The dot character,  matches any character except for newline.
  • It's often used as a wildcard character to match any character.
  • Here's an example of using  in a regular expression:
  • Code Sample : 

import re

pattern = r".at"
string = "The cat in the hat sat on the flat mat."
result = re.findall(pattern, string)
print(result)

The above example will match any three-letter word ending with "at" and return them as a list.


Prev. Tutorial : Regular expressions

Next Tutorial : Regex Anchors