This article was published over 2 years ago. Some information may be outdated.
Strings are the most fundamental data type you will work with in any programming language. No application exists without them.
Python has a rich set of string methods. This post covers the most important ones, but first we need to understand how strings work in Python at a foundational level.
Strings in Python
Python 3 stores strings as a sequence of Unicode code points. This means you can represent any Unicode character -- Arabic, Hebrew, Danish, and yes, even emojis.
Strings are immutable. Once you create a string, you cannot change it.
You can create a string by enclosing it in double or single quotes, or by using the str object. Both approaches produce a str object, because everything in Python is an object:
# Double quotes
hello = "Hello World I'm using Python!"
# Single quotes
hello = 'Hello World I\'m using Python!'
# str object
hello = str("Hello World I'm using Python!")
print(type(hello)) # This should always return "<class 'str'>"
Sometimes you need a raw string -- one that treats escape characters like \n, \r, and \t literally. Use the r prefix:
print(r"Hello World\nI'm using Python!\t\t\t Amazing")
Triple quotes """ handle multi-line strings:
python = """
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.[26] Van Rossum led the language community until stepping down as leader in July 2018
"""
print(python)
Triple quotes also serve as multi-line code comments in Python.
Concatenation uses the plus (+) sign:
hello = "Hello" + " " + "World"
Python treats strings as sequences, meaning each character has its own index, just like a list() or dict(). Here is what that looks like in practice:
hello = "Hello World! I'm using Python!"
h = hello[0]
print(h) # Output: H
for char in hello:
print(char, " ") #Output: H\ne\nl etc...
The slice operator lets you extract portions of a string:
hello = "Hello World"
print( hello[:5] ) # Output: Hello
print( hello[6:] ) # Output: World
print( hello[-1:-5] ) # Output: World
The multiplication operator repeats a string:
print("Hi" * 2) # Output: HiHi
Since Python 3 works with Unicode code points, you can print emojis just like any other string:
# All these statements produce the same output
print("I love Python ♥️")
print("I love Python \U0001F600") # Emoji using the Unicode code number
print("I love Python \N{grinning face}") #Emojis using the CLDR
- Visit the full emoji list page from the Unicode website for all emoji codes.
- For emoji representation, use the CLDR or the Unicode character. Avoid embedding direct emoji glyphs in your code.
- You might also find the emoji module useful.
Now for non-Latin characters. Since Python 3 uses Unicode by default, you can print any Unicode string directly:
print("أنا سَعيدٌ جِدّاً بلقاءك!") # Arabic
print("Jeg er så glad for at møde dig!") # Danish
print("Jeg er s\u00e5 glad for at m\xf8ode dig!") # Danish with Unicode chars.
Writing non-Latin strings directly in your code is a bad practice. Use something like GNU gettext instead. See Python Multilingual internationalization Services.
How about bytes?
The bytes type is similar to str except it stores a sequence of bytes instead of Unicode code points. It is used for binary data and fixed single-byte character encodings.
You can create bytes with the b prefix or the bytes() constructor:
hello = bytes(source="Hello World", encoding="utf8")
print(hello) #Output: b'Hello World'
The source parameter accepts different data types:
- String: converts the string to bytes (as shown above).
- Integer: creates an array of zero values with the provided size.
- Object.
- Iterable: creates a byte array from the iterable; each element must be between 0 and 255.
Here is how these types work with bytes:
n = bytes(5)
print(n) # Output: b'\x00\x00\x00\x00\x00'
print(list(n)) # Output: [0, 0, 0, 0, 0]
items = [1, 2, 4, 8, 16, 32]
arr = bytes(items)
print(arr) # Output: b'\x01\x02\x04\x08\x10 '
print(list(arr)) # Output: [1, 2, 4, 8, 16, 32]
Use encode() and decode() to convert between strings and bytes:
my_string = ("Jeg er så glad for at møde dig!")
my_string_encode = my_string.encode()
my_string_decode = my_string_encode.decode('utf8')
print(my_string_encode) # Output: b'Jeg er s\xc3\xa5 glad for at m\xc3\xb8de dig!'
print(my_string_decode) # Output: Jeg er så glad for at møde dig!
String operators
We have already seen concatenation (+), repetition (*), slice ([]), and range slice ([from:to]). Here is the full list:
| Operator | Description |
|---|---|
+ |
String concatenation |
* |
String repetition |
[n] |
Slice a string by getting a specific char. |
[from:to] |
Range slice, gets a portion of a string |
in |
Returns true if the char. exists in the string |
not in |
Returns true if the char. doesn't exist in the string |
% |
String formatting |
string = "Hello World I love Python"
print("Python" in string) # True
string = "Hello World I love Python"
print("Python" not in string) # False
The % formatting operator lets you insert placeholders into a string:
string = "My name is %s and I love to use %s"
print(string % ("Ahmad", "Python")) # Output: My name is Ahmad and I love to use Python
Here, %s is a string placeholder. Python provides other placeholders: %c for character, %d for decimal integer, %f for floating point, and more.
The
format()method is more readable and should be preferred over%formatting. We cover it in the string methods section below.
String methods
Python has many string methods. Some are self-explanatory, others deserve a closer look.
- Capitalize the first letter:
print( "hello".capitalize() ) # Output: Hello
- Title-case all words:
print( "hello world".title() ) # Output: Hello World
- Convert case or swap case:
print( "hello world".upper() ) # Output: HELLO WORLD
print( "HELLO WORLD".lower() ) # Output: hello world
print( "Hello WORLD".swapcase() ) # Output: hELLO world
- Get the length of a string:
print (len("Hello World")) # Output: 11
lenis not a string method. It works on lists, dictionaries, tuples, and other types as well.
- Center a string with padding:
# Output: ----Hello World-----
print ( "Hello World".center(20, "-") )
The dash is repeated ten times total -- five on each side.
- Count occurrences of a substring:
string = """
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.[26] Van Rossum led the language community until stepping down as leader in July 2018
"""
print( string.count("i") ) # Output: 23
#Starting from 10->20 char
print( string.count("i", 10, 20)) #Output: 1
- Check if a string starts or ends with a substring:
string = "Python is one of the most popular programming languages"
print( string.startswith("Python") ) # Output: True
print( string.endswith("languages") ) # Output: True
#Start from 0 position to 6th position.
print( string.startswith("Python", 0, 6) ) # Output: True
is methods
Every method starting with is returns a boolean and checks whether the string has a particular property:
| Method | Description |
|---|---|
isalnum() |
Checks if the string is an alpha-numeric string. |
isalpha() |
Check if the string is an alphabetic string. |
isascii() |
Checks if all characters in the string are ASCII. |
isdecimal() |
Checks if the string is a decimal string. |
isdigit() |
Checks if the string is a decimal string. |
isidentifier() |
Checks if the string is a valid Python identifier. |
islower() |
Checks if the string is a valid Python identifier. |
isnumeric() |
Checks if the string is a valid Python identifier. |
isprintable() |
Checks if if the string is printable. |
isspace() |
Checks if the string is printable. |
istitle() |
Checks if the string is a title-cased string. |
isupper() |
Checks if the string is an uppercase string. |
Some are obvious. Others need examples:
#isalnum
print( "Copenhagen 2000".isalnum() ) # Output: False because of the space
print( "2000i".isalnum() ) # True
#isalpha()
print( "Iraq".isalpha() ) # Output: True
print( "ar_IQ".isalpha() ) # Output: False because of the underscore charachter
#isascii
print( "Hello World \xb6".isascii() ) # Output: false because \xb6 is a hex. character not an ascii
print( "Hello World".isascii() ) # Output: True
#islower
print( "hello world".islower() ) # Output: True
#isupper
print( "HELLO WORLD".isupper() ) # Output: True
#isspace
print( " ".isspace() ) # Output: True
print( "".isspace() ) # Output: False
print( "\t \n".isspace() ) # Output: True
#istitle
print( "hello world".istitle() ) # Output: False
print( "Hello World".istitle() ) # Output: True
The methods isdigit, isnumeric, and isdecimal all deal with numbers but differ in their Unicode classification. The distinction matters.
isdecimal checks for standard decimal numbers (0-9 only):
print( "123".isdecimal() ) # Output: True
print( "-123".isdecimal() ) # Output: False
print( "¼".isdecimal() ) # Output: False
print( "١٢٣".isdecimal() ) # Output: False ١٢٣ is the 123 in Hindu numeric system
isnumeric checks for numbers in any Unicode numeric system:
#isnumeric
print( "123".isnumeric() ) # Output: True (Arabic numerals)
print( "١٢٣".isnumeric() ) # Output: True (Hindu numerals)
print( "۴۵۶".isnumeric() ) # Output: True (Farsi numerals)
print( "¼½".isnumeric() ) # Output: True
print( "四五六".isnumeric() ) # Output: True (Chinese numerals)
If you are building a multilingual application that handles different numeric systems,
isnumericis what you want.
isdigit checks for decimal digits including typographic variants:
print( "①".isdigit() ) # Output: True
print( "⒈".isdigit() ) # Output: True
print( "¹".isdigit() ) # Output: True
isidentifier checks whether a string is a valid Python identifier. This is useful when you need to validate a variable name programmatically:
print( "hello".isidentifier() ) # Output: True
print( "123hello".isidentifier() ) # Output: False
print( "\t".isidentifier() ) # Output: False
print( "hello123".isidentifier() ) # Output: True
isprintable checks if every character in the string is printable. Printable characters include:
- Letters from A-Z (Uppercase)
- Letters from a-z (Lowercase)
- Digits from 0-9
- Punctuation characters ( !"#$%&'()*+, -./:;?@[]^_`{ | }~ )
- Space.
print( "Hello World".isprintable() ) # Output: True
print( "أنا أتحدّث العربيّة".isprintable() ) # Output: True
print( "123".isprintable() ) #Output: True
print("Hello\nWorld".isprintable() ) # Output: False
print("Hello\r\tWorld".isprintable() ) # Output: False
Join a sequence of elements by a separator
join concatenates all items in a list, tuple, or set using a separator:
# All these statements produce A,B,C string
print( ",".join(["A", "B", "C"]) ) # Using lists
print( ",".join(("A", "B", "C")) ) # Using tuples
print( ",".join({"A", "B", "C"}) ) # Using dictionaries
Create a list from a given string using a separator
split breaks a string into a list using a separator:
# Output: ['Baghdad', ' Basra', ' Anbar', ' Erbil']
print( "Baghdad, Basra, Anbar, Erbil".split(",") )
By default, split splits on every occurrence. Pass a second argument to limit the number of splits:
# ['Baghdad', ' Basra', ' Anbar, Erbil']
print( "Baghdad, Basra, Anbar, Erbil".split(",", 2) )
Use
rsplit()to split from the right side.
splitlines converts line breaks into a list:
# Output: ['Hello World', 'I love Python!']
print("Hello World\nI love Python!".splitlines())
Formatting
String formatting makes your code more readable. Compare this:
name = input("Enter your name? ")
age = input("How old are you? ")
print("Hello " + name + " , you are " + age + " years old")
With the format() method:
print("Hello {name}, you are {age} years old.".format(name=name, age=age))
You can omit the named placeholders:
print("Hello {}, you are {} years old.".format(name,age))
You can specify a conversion type with a colon:
# Format number 123456 to hex.
print("{name:x}".format(name=123456)) # Output: 1e240
Available conversion types:
| Conversion | Meaning |
|---|---|
| d | Signed integer decimal. |
| i | Signed integer decimal. |
| o | Unsigned octal. |
| u | Unsigned decimal. |
| x | Unsigned hexadecimal (lowercase). |
| X | Unsigned hexadecimal (uppercase). |
| e | Floating point exponential format (lowercase). |
| E | Floating point exponential format (uppercase). |
| f | Floating point decimal format. |
| F | Floating point decimal format. |
| g | Same as "e" if exponent is greater than -4 or less than precision, "f" otherwise. |
| G | Same as "E" if exponent is greater than -4 or less than precision, "F" otherwise. |
| c | Single character (accepts integer or single character string). |
| r | String (converts any python object using repr()). |
| s | String (converts any python object using str()). |
| % | No argument is converted, results in a "%" character in the result. |
The r and s conversions deserve special attention.
Python has dunder methods (double underscore methods) like __str__ and __repr__ that extend classes with custom behavior.
By default, printing a class returns its identifier:
class Person(object): pass
print( Person ) # Ouput: <class '__main__.MyClass'>
Using __str__ or __repr__, you control what gets returned:
class Person(object):
def __init__(self, name, age):
self.name, self.age = name, age
def __str__(self):
return "Your name is {name}, and you are {age} years old".format(name=self.name, age=self.age)
print( Person("Ahmad", "32") ) # Output: Your name is Ahmad, and you are 32 years old
With r and s conversions, you can format any object that implements __repr__ or __str__:
class Ahmad():
def __str__(self):
return "Ahmad";
print( "My name is {name}".format(name=Ahmad()) )
- If you reference the class name without parentheses, Python returns the class name instead of calling
__str__.__repr__adds single quotes to the output and is more precise for number representation.__str__does not.__str__is used far more often. Prefer it whenever you need a string representation of an object.
Tab expansion
The expandtabs method controls the width of tab characters (\t):
print( "Hello\tWorld".expandtabs(16) ) # Output: Hello World
Partitioning strings
partition splits a string into three parts: the portion before the separator, the separator itself, and the portion after it.
# Separate Hello World by a space
greeting = "Hello World".partition(" ")
# This outputs a tuple containing three parts
print(greeting)
# Since it's a tuple we can use the unpacking feature
hello, _, world = greeting
print(hello) # Output: Hello
print(world) # Output: World
Finding substring
find and index locate the first occurrence of a substring and return its position as an integer:
python = """
Python is powerful... and fast;
plays well with others;
runs everywhere;
is friendly & easy to learn;
is Open.
"""
print( python.find("Python") ) # Output: 0
print( python.find("like") ) # Output: -1 (not found)
print( python.find("is", 31) ) # Output: 76 (start from)
print( python.find("is", 79, len(python)) ) # Output: 106 (start and end)
indexbehaves identically tofindexcept it raises aValueErrorwhen the substring is not found.
Use rfind() or rindex() to search from the right side of the string.
Replacing
replace() performs string replacement:
print( "Hello World".replace("Hello", "Hi") ) # Output: Hi World
The third argument limits the number of replacements:
print( "Hello World, Hello, Hello".replace("Hello", "Hi", 1) ) # Output: Hi World, Hello, Hello
Summary
- Python 3 strings are Unicode sequences -- they support any script and are always immutable.
- The
bytestype stores raw binary data as a sequence of bytes, not Unicode code points. Useencode()anddecode()to convert between the two. - String operators include concatenation (
+), repetition (*), slicing ([],[from:to]), membership (in,not in), and formatting (%). - The
is*methods are boolean checks for string properties;isdecimal,isdigit, andisnumericdiffer by their Unicode classification scope. format()is the preferred formatting approach over%placeholders, and it integrates with__str__and__repr__dunder methods.findvsindex-- both locate substrings, butindexraises an exception on failure whilefindreturns-1.- Full documentation for all string methods is available on the Python documentation page.