如何遍历数据框并找到字符串的特定部分并将其添加为新列？

Question

I have a dataframe and there is a specific string I want to pull out and delete apart of it.我有一个数据框，并且有一个特定的字符串我想拉出并删除它。 The string repeats throughout the file with different endings.该字符串在整个文件中以不同的结尾重复。 I want to find part of the string, delete some of it, and add the part I want to keep to several columns.我想找到字符串的一部分，删除其中的一部分，然后将我想要保留的部分添加到几列中。 I have an empty dataframe column that I want to add the kept part too.我有一个空的数据框列，我也想添加保留的部分。 I have included a picture of the current dataframe with the empty column where I want the data to go.我已经包含了当前数据框的图片，其中包含我希望数据去的空列。 I will also add a screenshot of what I want the data to look like.我还将添加我希望数据看起来像的屏幕截图。 I want it too repeat this until there is no longer that specific string.我希望它也重复这个，直到不再有那个特定的字符串。

Answer 1

As long as you have a way of identifying the values you want to turn into the group data and a way of manipulating those values to make them what you want, then you can do something like this.只要您有一种方法来识别要转换为组数据的值，并且有一种方法可以操纵这些值以使它们成为您想要的，那么您就可以做这样的事情。

import pandas as pd
data = [
    [None, 'Group: X', None, None],
    [None, 1, 'A1', 20],
    [None, 1, 'A1', None],
    [None, 2, 'B1', 40],
    [None, 2, 'B1', None],
    [None, 'Group: Y', None, None],
    [None, 1, 'A1', 30],
    [None, 1, 'A1', None],
    [None, 2, 'B1', 60],
    [None, 2, 'B1', None],
]
columns = ['Group', 'Sample', 'Well', 'DiluationFactor']

def identifying_function(value):
    return isinstance(value, str) and 'Group: ' in value

def manipulating_function(value):
    return value.replace('Group: ', '')

df = pd.DataFrame(data=data, columns=columns)
print(df)

# identify which rows contain the group data
mask = df['Sample'].apply(identifying_function)

# manipulate the data from those rows and write them to the Group column
df.loc[mask, 'Group'] = df.loc[mask, 'Sample'].apply(manipulating_function)

# forward fill the Group column
df['Group'].ffill(inplace=True)

# eliminate the no longer needed rows
df = df.loc[~mask]

print(df)

DataFrame Before:之前的数据框：

  Group    Sample  Well  DiluationFactor
0  None  Group: X  None              NaN
1  None         1    A1             20.0
2  None         1    A1              NaN
3  None         2    B1             40.0
4  None         2    B1              NaN
5  None  Group: Y  None              NaN
6  None         1    A1             30.0
7  None         1    A1              NaN
8  None         2    B1             60.0
9  None         2    B1              NaN

DataFrame After:数据帧之后：

  Group Sample Well  DiluationFactor
1     X      1   A1             20.0
2     X      1   A1              NaN
3     X      2   B1             40.0
4     X      2   B1              NaN
6     Y      1   A1             30.0
7     Y      1   A1              NaN
8     Y      2   B1             60.0
9     Y      2   B1              NaN

如何遍历数据框并找到字符串的特定部分并将其添加为新列？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-08 16:36:11

如何遍历数据框并找到字符串的特定部分并将其添加为新列？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-08 16:36:11

解决方案1
1 已采纳 2022-07-08 16:36:11