Pandas 在任何列中查找具有值的行

Question

I am reading data from a lot of csv files as pandas dataframes.我正在从许多 csv 文件中读取数据作为 pandas 数据帧。 But the format of csv files is not consistent.但是 csv 文件的格式并不一致。 An example:一个例子：

Unnamed:1 Unnamed:2 .... Unnamed:20
Data      NaN       .... NaN
Nan       Temp       .... NaN
id        name      .... year
.
.

Now I want to find the first row which contains id or ID or Id , make that row as column names and drop any rows above it.现在我想找到包含id或ID或Id的第一行，将该行作为列名并删除其上方的任何行。 So finally I will get:所以最后我会得到：

id        name      .... year
.
.

Now id column may not always be the first column, ie, Unnamed:1 column, so I am checking entire rows like so:现在id列可能并不总是第一列，即Unnamed:1列，所以我正在检查整个行，如下所示：

df.isin(["id"]).any(axis=1)

The issue with the above code is that I am not sure how to check for all different ways id may be written, ie, ID/Id/id .上面代码的问题是我不确定如何检查id的所有不同写入方式，即ID/Id/id 。 Ideally, I would like to use regex here, but I know it can be done without regex for a particular column like so:理想情况下，我想在这里使用正则表达式，但我知道它可以在没有正则表达式的情况下为特定列完成，如下所示：

df['Unnamed:1'].str.lower().str.contains('id')

I am just not getting how to do both at the same time, ie, check for all ways id may be written in all the columns.我只是不知道如何同时做这两个，即检查所有列中可能写入id的所有方式。

Answer 1

You can use for match first ID/id/Id substring in all columns by filter output rows before and then convert first row to columns:您可以通过过滤 output 行之前在所有列中使用匹配第一个ID/id/Id substring，然后将第一行转换为列：

mask = (df.select_dtypes(object)
          .apply(lambda x: x.str.contains('id', case=False))
          .any(axis=1)
          .cumsum()
          .gt(0))

df = df[mask].copy()
df.columns = df.iloc[0].rename(None)
df = df.iloc[1:].reset_index(drop=True)

Another idea for test not subtrings:测试不是子字符串的另一个想法：

mask = df.isin(['id','ID','Id']).any(axis=1).cumsum().gt(0)

df = df[mask].copy()
df.columns = df.iloc[0].rename()
df = df.iloc[1:].reset_index(drop=True)

Pandas 在任何列中查找具有值的行

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-18 12:24:14

Pandas 在任何列中查找具有值的行

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-18 12:24:14

解决方案1
1 已采纳 2021-01-18 12:24:14