简体   繁体   English

如何过滤数据框中仅包含特定重复字符的元素

[英]How to filter elements containing only specific repeated characters in a dataframe

I am looking to create a new dataframe that filters out redundant information from a previous dataframe. 我希望创建一个新的数据框,以从先前的数据框中过滤掉多余的信息。 The original dataframe is created from looking through many file folders and providing a column of elements each containing a string of the full path to access each file. 原始数据帧是通过浏览许多文件夹并提供一列元素(每个元素包含访问每个文件的完整路径的字符串)而创建的。 Each file is named according to trial number and score in a corresponding test folder. 每个文件均根据试验编号和分数在相应的测试文件夹中命名。 I need to remove all reiterations of scores that are 100 for each trial, however, the first score of 100 for each trial must remain. 我需要删除所有针对每个试验的100分数的重复,但是必须保留针对每个试验的100的第一分。

With python Pandas, I am aware of using df[df[col_header].str.contains('text')] to specifically filter out what is needed and the use of '~' as a boolean NOT. 对于python Pandas,我知道使用df [df [col_header] .str.contains('text')]专门过滤掉所需的内容以及将“〜”用作布尔NOT。

The unfiltered dataframe column with redundant scores looks like this 带有多余分数的未经过滤的数据框列如下所示

\\desktop\Test_Scores\test1\trial1-98
\\desktop\Test_Scores\test1\trial2-100
\\desktop\Test_Scores\test1\trial3-100       #<- must remove
\\desktop\Test_Scores\test2\trial1-95
\\desktop\Test_Scores\test2\trial2-100
\\desktop\Test_Scores\test2\trial3-100       #<- must remove
\\desktop\Test_Scores\test2\trial3-100       #<- must remove
.
.
.
n

The expected result after using some code as a filter would be a dataframe that looks like this 使用一些代码作为过滤器后的预期结果将是一个看起来像这样的数据框

\\desktop\Test_Scores\test1\trial1-98
\\desktop\Test_Scores\test1\trial2-100
\\desktop\Test_Scores\test2\trial1-95
\\desktop\Test_Scores\test2\trial2-100
.
.
.
.
n

This one line should solve your problem. 这一行应该可以解决您的问题。

df = df.loc[df["col"].shift().str.contains("-100") != df["col"].str.contains("-100")]

Update: 更新:

df["col"] = df["col"].str.replace('\t','\\t')
df['test_number'] = df.col.str.split('-').str[0].str.split('\\').str[-2]
df['score'] = df.col.str.split('-').str[1]
df.drop_duplicates(["test_number","score"], inplace = True)
df.drop(["test_number","score"],1,inplace = True)

Check this solution out. 签出此解决方案。 The reason why I am doing the replace in very first line is your data contains \\t which in programming is a tab delimiter. 我在第一行进行替换的原因是您的数据包含\\t ,在编程中这是一个制表符分隔符。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何过滤 Pandas DataFrame 并保留特定元素? - How to filter a pandas DataFrame and keep specific elements? 如何过滤python中包含特定字符的字符串列表 - How to filter a string list containing specific characters in python 如何使用仅包含数字的特定列删除Pandas Dataframe中的行? - How to remove rows in a Pandas Dataframe with a specific column containing numbers only? 如何使用 Python 减少 Pandas dataframe 中的重复元素 - How to reduce repeated elements in a Pandas dataframe with Python 如何仅从python中的数据框中过滤特定日期? - How to filter only specific dates from a dataframe in python? 如何检查我的列表中是否只有2个可以重复的特定元素? - How can I check if my list has only 2 specific elements that can be repeated? 如何根据包含特定值的行(在任何列中)过滤数据框 - How can one filter a dataframe based on rows containing specific value (in any of the columns) Pandas:删除特定字符重复4次的dataframe列中特定字符(最后一个特定字符)之前的所有字符 - Pandas: Remove all characters before a specific character (last specific character) in a dataframe column that specific character is repeated 4 times 仅将列表中的项目保留在包含某些字符的数据框中 - Only keep items from lists in a dataframe containing certain characters 如何将 select 包含特定字符的索引? - How to select an index containing specific characters?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM