[英]How to iterate through a dataframe and find a specific part of a string and add it too a new column?
I have a dataframe and there is a specific string I want to pull out and delete apart of it.我有一个数据框,并且有一个特定的字符串我想拉出并删除它。 The string repeats throughout the file with different endings.
该字符串在整个文件中以不同的结尾重复。 I want to find part of the string, delete some of it, and add the part I want to keep to several columns.
我想找到字符串的一部分,删除其中的一部分,然后将我想要保留的部分添加到几列中。 I have an empty dataframe column that I want to add the kept part too.
我有一个空的数据框列,我也想添加保留的部分。 I have included a picture of the current dataframe with the empty column where I want the data to go.
我已经包含了当前数据框的图片,其中包含我希望数据去的空列。 I will also add a screenshot of what I want the data to look like.
我还将添加我希望数据看起来像的屏幕截图。 I want it too repeat this until there is no longer that specific string.
我希望它也重复这个,直到不再有那个特定的字符串。
As long as you have a way of identifying the values you want to turn into the group data and a way of manipulating those values to make them what you want, then you can do something like this.只要您有一种方法来识别要转换为组数据的值,并且有一种方法可以操纵这些值以使它们成为您想要的,那么您就可以做这样的事情。
import pandas as pd
data = [
[None, 'Group: X', None, None],
[None, 1, 'A1', 20],
[None, 1, 'A1', None],
[None, 2, 'B1', 40],
[None, 2, 'B1', None],
[None, 'Group: Y', None, None],
[None, 1, 'A1', 30],
[None, 1, 'A1', None],
[None, 2, 'B1', 60],
[None, 2, 'B1', None],
]
columns = ['Group', 'Sample', 'Well', 'DiluationFactor']
def identifying_function(value):
return isinstance(value, str) and 'Group: ' in value
def manipulating_function(value):
return value.replace('Group: ', '')
df = pd.DataFrame(data=data, columns=columns)
print(df)
# identify which rows contain the group data
mask = df['Sample'].apply(identifying_function)
# manipulate the data from those rows and write them to the Group column
df.loc[mask, 'Group'] = df.loc[mask, 'Sample'].apply(manipulating_function)
# forward fill the Group column
df['Group'].ffill(inplace=True)
# eliminate the no longer needed rows
df = df.loc[~mask]
print(df)
DataFrame Before:之前的数据框:
Group Sample Well DiluationFactor
0 None Group: X None NaN
1 None 1 A1 20.0
2 None 1 A1 NaN
3 None 2 B1 40.0
4 None 2 B1 NaN
5 None Group: Y None NaN
6 None 1 A1 30.0
7 None 1 A1 NaN
8 None 2 B1 60.0
9 None 2 B1 NaN
DataFrame After:数据帧之后:
Group Sample Well DiluationFactor
1 X 1 A1 20.0
2 X 1 A1 NaN
3 X 2 B1 40.0
4 X 2 B1 NaN
6 Y 1 A1 30.0
7 Y 1 A1 NaN
8 Y 2 B1 60.0
9 Y 2 B1 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.