[Python]pandasのDataFrameで部分一致で文字列を置換

DataFrameで部分一致で文字列を置換

下記のように、replace()を使う場合は、完全一致であることに注意が必要です。

dataReplace = data.replace('before', 'after')

正規表現でreplace()を使うことで、部分一致で置換ができるようになります。
regex=Trueを指定して、正規表現でreplace()を使うことができます。

dataReplace = data.replace('before', 'after', regex=True)

特定の列の文字列を正規表現で置換する場合には、下記のように書きます。
下記の場合は、col1の列の'before'を'after'に置換します。

data.col1 = data.col1.replace('before', 'after', regex=True)

サンプルコード

例えば、下記のin.txtがあったとします。

 $ cat in.txt 
1before1,1,1
2,2before2,2
3,3,3before3

下記がサンプルコードになります。

 $ cat sample.py 
#!/usr/bin/env python3
# coding: UTF-8

import pandas as pd

data = pd.read_csv('in.txt', names=('col1', 'col2', 'col3'))
print(data)

dataReplace = data.replace('before', 'after')
print(dataReplace)

dataReplace = data.replace('before', 'after', regex=True)
print(dataReplace)

data.col1 = data.col1.replace('before', 'after', regex=True)
print(data)

下記が実行結果になります。

 $ ./sample.py 
       col1      col2      col3
0  1before1         1         1
1         2  2before2         2
2         3         3  3before3
       col1      col2      col3
0  1before1         1         1
1         2  2before2         2
2         3         3  3before3
      col1     col2     col3
0  1after1        1        1
1        2  2after2        2
2        3        3  3after3
      col1      col2      col3
0  1after1         1         1
1        2  2before2         2
2        3         3  3before3