简体   繁体   English

当找到列中的特定字符串时,在子数据框中切片 Dataframe

[英]Slice Dataframe in sub-dataframes when specific string in column is found

Assume I have the dataframe df and I want to slice this in multiple dataframes and store each in a list (list_of_dfs).假设我有 dataframe df,我想将它分成多个数据帧并将每个数据帧存储在一个列表 (list_of_dfs) 中。

Each sub-dataframe should only contain the rows "Result".每个子数据框应仅包含“结果”行。 One sub-dataframe starts, when in column "Point" the value "P1" and in column "X_Y" the value "X" is given.一个子数据帧开始,在“Point”列中给出值“P1”,在“X_Y”列中给出值“X”。

I tried this with first finding the indicies of each "P1" and then slicing the overall dataframe within a list comprehension using the indicies of "P1".我尝试这样做,首先找到每个“P1”的索引,然后使用“P1”的索引在列表理解中对整个 dataframe 进行切片。 But I receive a list with two empty dataframes.但是我收到了一个包含两个空数据框的列表。 Can someone advise?有人可以建议吗? Thanks!谢谢!

import pandas as pd

df = pd.DataFrame(
    {
        "Step": (
            "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "Result", "Result", "Result", "Result", "Result",
            "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "Result", "Result", "Result", "Result", "Result"
        ),
        "Point": (
            "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3",
            "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3",
        ),
        "X_Y": (
            "X", "X", "Y", "X", "Y",  "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", 
            "X", "X", "Y", "X", "Y",  "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y",
        ),
        "Value A": (
            70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
            70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 
        ),
        "Value B": (
            70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
            70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
        ),
    }
)

dff = df.loc[df["Step"] == "Result"]

value = "P1"
tuple_of_positions = list()

result = dff.isin([value])

seriesObj = result.any()
columnNames = list(seriesObj[seriesObj == True].index)

for col in columnNames:
    rows = list(result[col][result[col] == True].index)
    for row in rows:
        tuple_of_positions.append((row, col))

length_of_one_df = (len(dff["Point"].unique().tolist()) * 2 ) - 1

list_of_dfs = [dff.iloc[x : x + length_of_one_df] for x in rows]

print(list_of_dfs)
sub    = df.query("Step == \"Result\"")
pivots = sub[["Point", "X_Y"]].eq(["P1", "X"]).all(axis=1)
out    = [fr for _, fr in sub.groupby(pivots.cumsum())]
  • get the subset of the frame where Step is equal to "Result"获取帧的子集,其中 Step 等于“Result”
  • check in which rows there is "P1" and "X" sequence检查哪些行有“P1”和“X”序列
    • that gives a True/False series给出真/假系列
    • cumulative sum of it determines the group as the "pivoting" (turning) points will be True since False == 0 in numeric context它的累积总和确定该组作为“枢轴”(转向)点将为真,因为在数字上下文中为 False == 0
    • iterating over a GroupBy object yields "group_label, sub_frame" pairs, out of which we pull the sub_frames迭代 GroupBy object 产生“group_label,sub_frame”对,我们从中提取 sub_frames

to get要得到

>>> out

[      Step Point X_Y  Value A  Value B
 10  Result    P1   X    70.00    70.00
 11  Result    P2   X    68.00    68.00
 12  Result    P2   Y    66.75    66.75
 13  Result    P3   X    68.08    68.08
 14  Result    P3   Y    66.72    66.72,
       Step Point X_Y  Value A  Value B
 25  Result    P1   X    70.00    70.00
 26  Result    P2   X    68.00    68.00
 27  Result    P2   Y    66.75    66.75
 28  Result    P3   X    68.08    68.08
 29  Result    P3   Y    66.72    66.72]

where the intermediares were中间人在哪里

>>> sub

      Step Point X_Y  Value A  Value B
10  Result    P1   X    70.00    70.00
11  Result    P2   X    68.00    68.00
12  Result    P2   Y    66.75    66.75
13  Result    P3   X    68.08    68.08
14  Result    P3   Y    66.72    66.72
25  Result    P1   X    70.00    70.00
26  Result    P2   X    68.00    68.00
27  Result    P2   Y    66.75    66.75
28  Result    P3   X    68.08    68.08
29  Result    P3   Y    66.72    66.72
>>> pivots 

10     True
11    False
12    False
13    False
14    False
25     True
26    False
27    False
28    False
29    False
dtype: bool
# groups
>>> pivots.cumsum()

10    1
11    1
12    1
13    1
14    1
25    2
26    2
27    2
28    2
29    2
dtype: int32

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 提取子数据框架 - Extract sub-DataFrames 在python pandas的循环中合并大数据帧中的许多子数据帧 - Merge many sub-dataframes in a big dataframe in a loop in python pandas 如何通过行索引将数据帧拆分为子数据帧 - How to split a Dataframe into Sub-Dataframes by row index 根据时间戳将 dataframe 拆分为多个子数据帧 - Split dataframe into many sub-dataframes based on timestamp 如何将dataframe按照不同的group拆分成子dataframe? - How to split the dataframe into sub-dataframes according to different groups? 当在 dataframe 列中找到某个值时,如何将 pandas dataframe 分解为子数据帧? - How to break a pandas dataframe into sub dataframes when a certain value is found in the dataframe column? 分组并将 function 应用于 Python 中的子数据帧 - Groupby and apply function to sub-dataframes in Python 如何将python函数应用于从头开始拆分的pandas子数据框并获取新的数据框? - How to apply a python function to splitted 'from the end' pandas sub-dataframes and get a new dataframe? Python 循环将 dataframe 拆分为特定的行/索引,以便将子数据帧写入 csv - Python loop to split a dataframe by a particular row/index for writing the sub-dataframes into csv 数据框通过搜索子字符串来切片列内容 - Dataframe to slice column content by searching sub-string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM