[英]Pandas find rows with value in any column
I am reading data from a lot of csv files as pandas dataframes.我正在从许多 csv 文件中读取数据作为 pandas 数据帧。 But the format of csv files is not consistent.
但是 csv 文件的格式并不一致。 An example:
一个例子:
Unnamed:1 Unnamed:2 .... Unnamed:20
Data NaN .... NaN
Nan Temp .... NaN
id name .... year
.
.
Now I want to find the first row which contains id
or ID
or Id
, make that row as column names and drop any rows above it.现在我想找到包含
id
或ID
或Id
的第一行,将该行作为列名并删除其上方的任何行。 So finally I will get:所以最后我会得到:
id name .... year
.
.
Now id
column may not always be the first column, ie, Unnamed:1
column, so I am checking entire rows like so:现在
id
列可能并不总是第一列,即Unnamed:1
列,所以我正在检查整个行,如下所示:
df.isin(["id"]).any(axis=1)
The issue with the above code is that I am not sure how to check for all different ways id
may be written, ie, ID/Id/id
.上面代码的问题是我不确定如何检查
id
的所有不同写入方式,即ID/Id/id
。 Ideally, I would like to use regex here, but I know it can be done without regex for a particular column like so:理想情况下,我想在这里使用正则表达式,但我知道它可以在没有正则表达式的情况下为特定列完成,如下所示:
df['Unnamed:1'].str.lower().str.contains('id')
I am just not getting how to do both at the same time, ie, check for all ways id
may be written in all the columns.我只是不知道如何同时做这两个,即检查所有列中可能写入
id
的所有方式。
You can use for match first ID/id/Id
substring in all columns by filter output rows before and then convert first row to columns:您可以通过过滤 output 行之前在所有列中使用匹配第一个
ID/id/Id
substring,然后将第一行转换为列:
mask = (df.select_dtypes(object)
.apply(lambda x: x.str.contains('id', case=False))
.any(axis=1)
.cumsum()
.gt(0))
df = df[mask].copy()
df.columns = df.iloc[0].rename(None)
df = df.iloc[1:].reset_index(drop=True)
Another idea for test not subtrings:测试不是子字符串的另一个想法:
mask = df.isin(['id','ID','Id']).any(axis=1).cumsum().gt(0)
df = df[mask].copy()
df.columns = df.iloc[0].rename()
df = df.iloc[1:].reset_index(drop=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.