简体   繁体   English

Pandas 在任何列中查找具有值的行

[英]Pandas find rows with value in any column

I am reading data from a lot of csv files as pandas dataframes.我正在从许多 csv 文件中读取数据作为 pandas 数据帧。 But the format of csv files is not consistent.但是 csv 文件的格式并不一致。 An example:一个例子:

Unnamed:1 Unnamed:2 .... Unnamed:20
Data      NaN       .... NaN
Nan       Temp       .... NaN
id        name      .... year
.
.

Now I want to find the first row which contains id or ID or Id , make that row as column names and drop any rows above it.现在我想找到包含idIDId的第一行,将该行作为列名并删除其上方的任何行。 So finally I will get:所以最后我会得到:

id        name      .... year
.
.

Now id column may not always be the first column, ie, Unnamed:1 column, so I am checking entire rows like so:现在id列可能并不总是第一列,即Unnamed:1列,所以我正在检查整个行,如下所示:

df.isin(["id"]).any(axis=1)

The issue with the above code is that I am not sure how to check for all different ways id may be written, ie, ID/Id/id .上面代码的问题是我不确定如何检查id的所有不同写入方式,即ID/Id/id Ideally, I would like to use regex here, but I know it can be done without regex for a particular column like so:理想情况下,我想在这里使用正则表达式,但我知道它可以在没有正则表达式的情况下为特定列完成,如下所示:

df['Unnamed:1'].str.lower().str.contains('id')

I am just not getting how to do both at the same time, ie, check for all ways id may be written in all the columns.我只是不知道如何同时做这两个,即检查所有列中可能写入id的所有方式。

You can use for match first ID/id/Id substring in all columns by filter output rows before and then convert first row to columns:您可以通过过滤 output 行之前在所有列中使用匹配第一个ID/id/Id substring,然后将第一行转换为列:

mask = (df.select_dtypes(object)
          .apply(lambda x: x.str.contains('id', case=False))
          .any(axis=1)
          .cumsum()
          .gt(0))

df = df[mask].copy()
df.columns = df.iloc[0].rename(None)
df = df.iloc[1:].reset_index(drop=True)

Another idea for test not subtrings:测试不是子字符串的另一个想法:

mask = df.isin(['id','ID','Id']).any(axis=1).cumsum().gt(0)

df = df[mask].copy()
df.columns = df.iloc[0].rename()
df = df.iloc[1:].reset_index(drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Pandas的第三个拆分列中查找2个列之一包含任何值的行 - Find rows where one of 2 cols contains any value from a third split column in Pandas Pandas数据帧 - 在任何列中标识值超过阈值的行 - Pandas dataframe - identify rows with value over threshold in any column 在pandas数据框的任何列中删除带有“问号”值的行 - Drop rows with a 'question mark' value in any column in a pandas dataframe 在 Pandas Dataframe 中查找相似的行并减去特定的列值 - Find similar rows and subtract a particular column value in Pandas Dataframe Pandas按行查找第一个nan值并返回列名 - Pandas find first nan value by rows and return column name 熊猫数据框找到具有特定列值的所有行? - pandas data frame find all rows with particular column value? 给定值列表的大熊猫在列中查找具有此值的行 - pandas, given list of values, find rows that has this value at column Pandas - 查找具有特定值的所有行并保留具有匹配列值的所有行 - Pandas - Find all rows with a specific value and keep all rows with matching column value Pandas:如果关键字出现在任何列中,请选择行 - Pandas: select rows if keyword appears in any column 查找将一列的值作为另一列中的子字符串以及熊猫中的其他 OR 条件的行 - Find rows which have one column's value as substring in another column along with other OR conditions in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM