Regex in pandas filter the columns with ^

Question

I am working with Pandas and want to filter the columns with an regex. It returns something when I change the regex to rf"{c}(\.)?(\d)*" but if I want it to start with a certain letter it breaks and the filtered dataframe is empty.

for c in self.variables.split():
             reg = rf"^{c}(\.)?(\d)*$"
             print(reg)
             filtered = self.raw_data.filter(regex=reg)

What did I do wrong and how can I fix it.

PS: This a sample of the data

variable      T    T.1    T.2    T.3    T.4  ...   T.8    T.9      l       phi     dl
0         29.63  27.87  26.95  26.64  26.25  ...  23.3  22.42  2.141  0.093551  0.002
1         29.70    NaN    NaN    NaN    NaN  ...   NaN    NaN  2.043  0.098052  0.002
2         29.62    NaN    NaN    NaN    NaN  ...   NaN    NaN  1.892  0.089973  0.002
3         29.65    NaN    NaN    NaN    NaN  ...   NaN    NaN  1.828  0.093132  0.002

And I would like it to return 4 dfs each only containing the data of a specific variable eg

variable      T    T.1    T.2    T.3    T.4    T.5    T.6    T.7   T.8    T.9
0         29.63  27.87  26.95  26.64  26.25  25.62  24.99  23.85  23.3  22.42
1         29.70    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN
2         29.62    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN
3         29.65    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN
4         29.38    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN

or only l without the dl(this is why I thought I needed to use ^ in my regex)

variable      l   
0         2.141  
1         2.043  
2         1.892  
3         1.828

Thx in advance dear community

Answer 1

Details

variable match literal string variable
| logical or, since you want the column variable with every other dataframe
^ - start of a string
{c} - followed by an f-string with the desired variable
(\.\d+)? - an optional sequence of a literal . follow by one or more digits
$ - end of string.

import pandas as pd

df = pd.read_csv("sample.csv", sep='\s+')
print(df)

variables = ['T', 'l', 'phi', 'dl']

for c in variables:
    ds = df.filter(regex=rf"variable|^{c}(\.\d+)?$")
    print(f'\n---Variable: [{c}] ---')
    print(ds)

---Variable: [T] ---
   variable      T    T.1    T.2    T.3    T.4    T.5    T.6    T.7   T.8    T.9
0         0  29.63  27.87  26.95  26.64  26.25  25.62  24.99  23.85  23.3  22.42
1         1  29.70    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN
2         2  29.62    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN
...

---Variable: [l] ---
   variable      l
0         0  2.141
1         1  2.043
2         2  1.892
...

---Variable: [phi] ---
   variable       phi
0         0  0.093551
1         1  0.098052
2         2  0.089973
...

---Variable: [dl] ---
   variable     dl
0         0  0.002
1         1  0.002
2         2  0.002
...

Regex in pandas filter the columns with ^

Question

1 answers

solution1
0 ACCPTED 2021-04-06 22:10:58

Regex in pandas filter the columns with ^

Question

1 answers

solution1 0 ACCPTED 2021-04-06 22:10:58

solution1
0 ACCPTED 2021-04-06 22:10:58