简体   繁体   English

如何遍历数据框并找到字符串的特定部分并将其添加为新列?

[英]How to iterate through a dataframe and find a specific part of a string and add it too a new column?

I have a dataframe and there is a specific string I want to pull out and delete apart of it.我有一个数据框,并且有一个特定的字符串我想拉出并删除它。 The string repeats throughout the file with different endings.该字符串在整个文件中以不同的结尾重复。 I want to find part of the string, delete some of it, and add the part I want to keep to several columns.我想找到字符串的一部分,删除其中的一部分,然后将我想要保留的部分添加到几列中。 I have an empty dataframe column that I want to add the kept part too.我有一个空的数据框列,我也想添加保留的部分。 I have included a picture of the current dataframe with the empty column where I want the data to go.我已经包含了当前数据框的图片,其中包含我希望数据去的空列。 I will also add a screenshot of what I want the data to look like.我还将添加我希望数据看起来像的屏幕截图。 I want it too repeat this until there is no longer that specific string.我希望它也重复这个,直到不再有那个特定的字符串。

我有的

我想要的是

As long as you have a way of identifying the values you want to turn into the group data and a way of manipulating those values to make them what you want, then you can do something like this.只要您有一种方法来识别要转换为组数据的值,并且有一种方法可以操纵这些值以使它们成为您想要的,那么您就可以做这样的事情。

import pandas as pd
data = [
    [None, 'Group: X', None, None],
    [None, 1, 'A1', 20],
    [None, 1, 'A1', None],
    [None, 2, 'B1', 40],
    [None, 2, 'B1', None],
    [None, 'Group: Y', None, None],
    [None, 1, 'A1', 30],
    [None, 1, 'A1', None],
    [None, 2, 'B1', 60],
    [None, 2, 'B1', None],
]
columns = ['Group', 'Sample', 'Well', 'DiluationFactor']

def identifying_function(value):
    return isinstance(value, str) and 'Group: ' in value

def manipulating_function(value):
    return value.replace('Group: ', '')

df = pd.DataFrame(data=data, columns=columns)
print(df)

# identify which rows contain the group data
mask = df['Sample'].apply(identifying_function)

# manipulate the data from those rows and write them to the Group column
df.loc[mask, 'Group'] = df.loc[mask, 'Sample'].apply(manipulating_function)

# forward fill the Group column
df['Group'].ffill(inplace=True)

# eliminate the no longer needed rows
df = df.loc[~mask]

print(df)

DataFrame Before:之前的数据框:

  Group    Sample  Well  DiluationFactor
0  None  Group: X  None              NaN
1  None         1    A1             20.0
2  None         1    A1              NaN
3  None         2    B1             40.0
4  None         2    B1              NaN
5  None  Group: Y  None              NaN
6  None         1    A1             30.0
7  None         1    A1              NaN
8  None         2    B1             60.0
9  None         2    B1              NaN

DataFrame After:数据帧之后:

  Group Sample Well  DiluationFactor
1     X      1   A1             20.0
2     X      1   A1              NaN
3     X      2   B1             40.0
4     X      2   B1              NaN
6     Y      1   A1             30.0
7     Y      1   A1              NaN
8     Y      2   B1             60.0
9     Y      2   B1              NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 遍历嵌套字典以创建 Dataframe 并添加新列值 - Iterate Through Nested Dictionary to Create Dataframe and Add New Column Value 如何遍历特定列中的行,找到一个值,如果找到该值,则将 count = 1 添加到另一列? - How do I iterate through rows in a specific column, find a value, and add a count = 1 to another column if it finds that value? Pandas 遍历一个数据帧,将行值和列值连接到一个关于特定列值的新数据帧中 - Pandas-iterate through a dataframe concatenating row values and column values into a new dataframe with respect to a specific column value 遍历特定列中的行以将行添加到 dataframe 中的新列 - Iterating through rows in a specific column to add rows to a new column in dataframe 如何遍历数据框中的列并同时更新两个新列? - How to iterate through a column in dataframe and update two new columns simultaneously? 遍历行时如何在pandas数据框中添加新列? - How to add a new column to pandas dataframe while iterate over the rows? 如何为数据框的新列添加字符串作为值 - How to add string as value for new column for dataframe 遍历 DataFrame 列名列表,仅将值为整数或浮点数的列名添加到新列表中 - Iterate through a list of DataFrame column names and add only the column names whose values are integers or floats to a new list 我想知道如何遍历df.column3在df.column2中找到匹配项,并根据匹配将df.column1的名称添加到新列df.column4中 - I would like to know how to iterate through df.column3 find match in df.column2 and add name of df.column1 based on matches to a new column df.column4 使用熊猫组合工作表,遍历特定列,将行添加到新列表 - Using pandas combine worksheets, iterate through a specific column, add rows to a new list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM