[英]Regex in pandas filter the columns with ^
I am working with Pandas and want to filter the columns with an regex.我正在使用 Pandas 并希望使用正则表达式过滤列。 It returns something when I change the regex to
rf"{c}(\.)?(\d)*"
but if I want it to start with a certain letter it breaks and the filtered dataframe is empty.当我将正则表达式更改为
rf"{c}(\.)?(\d)*"
时,它会返回一些内容,但如果我希望它以某个字母开头,它会中断并且过滤后的 dataframe 是空的。
for c in self.variables.split():
reg = rf"^{c}(\.)?(\d)*$"
print(reg)
filtered = self.raw_data.filter(regex=reg)
What did I do wrong and how can I fix it.我做错了什么,我该如何解决。
PS: This a sample of the data PS:这是数据样本
variable T T.1 T.2 T.3 T.4 ... T.8 T.9 l phi dl
0 29.63 27.87 26.95 26.64 26.25 ... 23.3 22.42 2.141 0.093551 0.002
1 29.70 NaN NaN NaN NaN ... NaN NaN 2.043 0.098052 0.002
2 29.62 NaN NaN NaN NaN ... NaN NaN 1.892 0.089973 0.002
3 29.65 NaN NaN NaN NaN ... NaN NaN 1.828 0.093132 0.002
And I would like it to return 4 dfs each only containing the data of a specific variable eg我希望它返回 4 个 dfs,每个 dfs 只包含特定变量的数据,例如
variable T T.1 T.2 T.3 T.4 T.5 T.6 T.7 T.8 T.9
0 29.63 27.87 26.95 26.64 26.25 25.62 24.99 23.85 23.3 22.42
1 29.70 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 29.62 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 29.65 NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 29.38 NaN NaN NaN NaN NaN NaN NaN NaN NaN
or only l without the dl(this is why I thought I needed to use ^ in my regex)或者只有 l 没有 dl(这就是为什么我认为我需要在我的正则表达式中使用 ^)
variable l
0 2.141
1 2.043
2 1.892
3 1.828
Thx in advance dear community提前谢谢亲爱的社区
Details细节
variable
match literal string variable
variable
匹配文字字符串variable
|
logical or, since you want the column variable with every other dataframe^
- start of a string ^
- 字符串的开头{c}
- followed by an f-string
with the desired variable {c}
- 后跟带有所需变量的f-string
(\.\d+)?
- an optional sequence of a literal .
.
follow by one or more digits$
- end of string. $
- 字符串结束。import pandas as pd
df = pd.read_csv("sample.csv", sep='\s+')
print(df)
variables = ['T', 'l', 'phi', 'dl']
for c in variables:
ds = df.filter(regex=rf"variable|^{c}(\.\d+)?$")
print(f'\n---Variable: [{c}] ---')
print(ds)
---Variable: [T] ---
variable T T.1 T.2 T.3 T.4 T.5 T.6 T.7 T.8 T.9
0 0 29.63 27.87 26.95 26.64 26.25 25.62 24.99 23.85 23.3 22.42
1 1 29.70 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2 29.62 NaN NaN NaN NaN NaN NaN NaN NaN NaN
...
---Variable: [l] ---
variable l
0 0 2.141
1 1 2.043
2 2 1.892
...
---Variable: [phi] ---
variable phi
0 0 0.093551
1 1 0.098052
2 2 0.089973
...
---Variable: [dl] ---
variable dl
0 0 0.002
1 1 0.002
2 2 0.002
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.