Comparing words in python

Python String Comparison: A Step-by-Step Guide (with Examples)

You commonly see comparisons made between numeric types in Python. But you can compare strings just as well. As it turns out, comparing strings translates to comparing numbers under the hood.

Before jumping into the details, let’s briefly see how to compare strings in Python.

Comparing Strings with == and !=

Comparing strings with equal to and not equal to operators is easy to understand. You can check if a string is or is not equal to another string.

name = "Jack" print(name == "John") print(name != "John")

Comparing Strings with , =

To compare strings alphabetically, you can use the operators , >, , >=.

For instance, let’s compare the names “Alice” and “Bob”. This comparison corresponds to checking if Alice is before Bob in alphabetical order.

Now you have the tools for comparing strings in Python. Next, let’s understand how the string comparison works behind the scenes.

String Unicodes in Python

In reality, comparing Python strings means comparing integers under the hood.

To understand how it works, you first need to understand the concept of Unicode.

Python string uses the Unicode Standard for representing characters. This means each character has a unique integer code assigned to it. It is this Unicode integer value that is compared when comparing strings in Python

Here is the Unicode table for English characters (also known as the ASCII values).

Unicode Character Unicode Character Unicode Character Unicode Character
64 @ 80 P 96 ` 112 p
65 A 81 Q 97 a 113 q
66 B 82 R 98 b 114 r
67 C 83 S 99 c 115 s
68 D 84 T 100 d 116 t
69 E 85 U 101 e 117 u
70 F 86 V 102 f 118 v
71 G 87 W 103 g 119 w
72 H 88 X 104 h 120 x
73 I 89 Y 105 i 121 y
74 J 90 Z 106 j 122 z
75 K 91 [ 107 k 123
76 L 92 \ 108 l 124 |
77 M 93 ] 109 m 125 >
78 N 94 ^ 110 n 126 ~
79 O 95 _ 111 o

When a Python program compares strings, it compares the Unicode values of the characters.

By the way, to check the Unicode of a character, you do not have to look it up from this table. Instead, you can use the built-in ord() function.

>>> ord('a') 97 >>> ord('b') 98 >>> ord('c') 99 >>> ord('d') 100

Now, let’s check the Unicode values for the capitalized versions of the above four characters:

>>> ord('A') 65 >>> ord('B') 66 >>> ord('C') 67 >>> ord('D') 68

As you can see, the Unicode values for capitals characters differ from their lowercase counterparts. This highlights an important point—Python is case-sensitive with characters and strings.

For example, the result of this comparison:

Yields True.

How Python String Comparison Works Under the Hood

When you compare strings in Python the strings are compared character by character using the Unicode values.

When you compare two characters, the process is rather simple. But what happens when you compare strings, that is, sequences of characters?

Let’s demonstrate the process with examples.

Example 1—Which String Comes First in Alphabetic Order

Let’s compare the two names “Alice” and “Bob” to see if “Alice” is less than “Bob”:

This states that “Alice” is less than “Bob”. In real life, this means that Alice comes before Bob in alphabetical order, which totally makes sense.

But how does Python know it?

Python starts by comparing the first characters of the strings. In the case of “Alice” and “Bob” it starts by checking if ‘A’ is less than ‘B’ in Unicode:

As ord(‘A’) returns the Unicode value of 65 and ord(‘B’) 66, the comparison evaluates to True.

This means Python does not need to continue any further. Based on the first letters it is already able to determine that “Alice” is less than “Bob” because ‘A’ is less than ‘B’ in Unicode.

This is the simplest way to understand how Python compares strings.

Let’s see another a bit trickier example where the compared strings have same first letters.

Example 2—How to Compare Strings with Equal First Letters

What if the first letters are equal when comparing two strings? No problem, Python then compares the second letters.

For instance, let’s check if “Axel” comes before “Alex” in alphabetical order.

This suggests that Alex comes before Axel, which is indeed the case.

Let’s see how Python was able to determine this:

  1. The first letters are compared. Both are ‘A’, so there is a “tie”. The comparison continues to the next characters.
  2. The second characters are are ‘x’ and ‘l’. The unicode value for ‘x’ is 120 and 108 for ‘l’. And 120 < 108returns False. Thus the whole string comparison returns False.

Example 3—How to Compare Strings with Identical Beginning

What if the strings are otherwise equal, but there are additional characters at the end of the other one?

For instance, can you determine if “Alex” comes before “Alexis” in alphabetical order?

Let’s check this using Python:

In this case, the Python interpreter simply treats the longer string as the greater one. In other words, “Alex” is before “Alexis” in alphabetical order.

Now you understand how the string comparison works under the hood in Python.

Finally, let’s take a look at an interesting application of string comparison by comparing timestamps.

Compare Timestamps in Python with String Comparison

In this guide, you have learned that each character in Python has a Unicode value which is an integer. This is no exception to numeric strings.

For example, a string “1” has a Unicode value of 49 and “2” has a Unicode value of 50 and so on:

The Unicode value of a numeric character grows as the number grows.

This means comparing the order of numeric strings gives you a correct result:

But why would you ever compare numbers as strings?

Comparing numeric strings is useful when talking about ISO 8601 timestamps of format 2021-12-14T09:30:16+00:00.

For example, let’s check if “2021-12-14T09:30:16+00:00” comes before “2022-01-01T00:00:00+00:00“:

But wait a minute! Does the comparison operator have any idea about dates and their precedence?

It does not. It only knows how to perform string comparison character by character.

As you learned in the previous examples, the comparison starts with the 1st character. If they are the same, the comparison continues from the 2nd character and so on.

When comparing an ISO 8601 timestamps in Python, the procedure is the same as comparing any other strings in Python. (Notice that this works because of the ordering of the time components. A year comes before the month. A month comes before the day, and so on. Thus if the years between two timestamps differ, you can draw the conclusion without looking at the rest of the timestamps.)

This is exactly how you would compare the timestamps in real life. You would start with the year and notice that 2021 comes before 2022, so no matter what the rest of the timestamps say, the 2021 one must precede 2022.

Conclusion

Comparing strings is an important feature in Python.

Python’s built-in comparison operators can be used in string comparison. These built-in operators are:

Under the hood, there is no such thing as string comparison. Instead, the numeric codes (Unicodes) of the characters are compared with one another. When two strings have equal first letters, then the second letters are compared. If they are equal too, then the third ones are compared and so on.

Thanks for reading. I hope you find it useful.

Источник

Читайте также:  Select lines from file python
Оцените статью