9. Working with Text Data
Working with Text Data
Series and Index are equipped with a set of string processing methods that make it easy to operate on each element of the array. Perhaps most importantly, these methods exclude missing/NA values automatically. These are accessed via the str
attribute and generally have names matching the equivalent (scalar) built-in string methods:
In [1]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) In [2]: s.str.lower() Out[2]: 0 a 1 b 2 c 3 aaba 4 baca 5 NaN 6 caba 7 dog 8 cat dtype: object In [3]: s.str.upper() Out[3]: 0 A 1 B 2 C 3 AABA 4 BACA 5 NaN 6 CABA 7 DOG 8 CAT dtype: object In [4]: s.str.len() Out[4]: 0 1.0 1 1.0 2 1.0 3 4.0 4 4.0 5 NaN 6 4.0 7 3.0 8 3.0 dtype: float64
In [5]: idx = pd.Index([' jack