简体   繁体   English

每次列中的字符串值更改时如何拆分数据框?

[英]How to split a dataframe each time a string value changes in a column?

I've got a dataframe of the form:我有一个形式的数据框:

         time     value   label
0  2020-01-01 -0.556014    high
1  2020-01-02  0.185451    high
2  2020-01-03 -0.401111  medium
3  2020-01-04  0.436111  medium
4  2020-01-05  0.412933    high
5  2020-01-06  0.636421    high
6  2020-01-07  1.168237    high
7  2020-01-08  1.205073    high
8  2020-01-09  0.798674    high
9  2020-01-10  0.174116    high

And I'd like to populate a list of dataframes where each dataframe is built when the string in the column label changes.我想填充一个数据框列表,其中每个数据框都是在列label的字符串更改时构建的。 So the first dataframe would be:所以第一个数据帧将是:

         time     value   label
0  2020-01-01 -0.556014    high
1  2020-01-02  0.185451    high

The second dataframe would be:第二个数据帧将是:

         time     value   label
2  2020-01-03 -0.401111  medium
3  2020-01-04  0.436111  medium

And so on.等等。 And the desired list would be [df, df, ...] .所需的列表将是[df, df, ...] If you think that a dict would be a more appropriate container I wouldn't mind that at all.如果你认为 dict 是一个更合适的容器,我一点也不介意。

There's a similar post named split data frame pandas if sequence of column value change , but that only handles changes in numeric values.有一个类似的帖子名为split data frame pandas if sequence of column value change ,但它只处理数值的变化。 I've made a few attempts but keep running into indexing problems when comparing a row value for label with the previous value.我已经做了一些尝试,但在将label的行值与前一个值进行比较时一直遇到索引问题。 So any suggestions would be great!所以任何建议都会很棒!

Here's a reproducible snippet:这是一个可重现的片段:

# imports
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import random

# settings
observations = 100
np.random.seed(5)
value = np.random.uniform(low=-1, high=1, size=observations).tolist()
time = [t for t in pd.date_range('2020', freq='D', periods=observations).format()]

df=pd.DataFrame({'time': time, 
                 'value':value})
df['value']=df['value'].cumsum()

def classify(e):
    if e > 0.75: return 'high'
    if e > 0.25: return 'medium'
    if e >= 0: return 'low'

df['label1'] = [(elem-df['value'].min())/(df['value'].max()-df['value'].min()) for elem in df['value']]
df['label'] = [classify(elem) for elem in df['label1']]
df = df.drop('label1', 1)
df

I would create a column that increments on each change, then group by that column.我会创建一个在每次更改时递增的列,然后按该列分组。 If you need separate dataframes you can assign them in a loop.如果您需要单独的数据帧,您可以在循环中分配它们。

df['group'] = df['label'].ne(df['label'].shift()).cumsum()
df = df.groupby('group')
dfs = []
for name, data in df:
    dfs.append(data)

dfs will be a list of dataframes like so: dfs 将是一个数据框列表,如下所示:

[         time     value label  group
 0  2020-01-01 -0.556014  high      1
 1  2020-01-02  0.185451  high      1,
          time     value   label  group
 2  2020-01-03 -0.401111  medium      2
 3  2020-01-04  0.436111  medium      2,
          time     value label  group
 4  2020-01-05  0.412933  high      3
 5  2020-01-06  0.636421  high      3
 6  2020-01-07  1.168237  high      3
 7  2020-01-08  1.205073  high      3
 8  2020-01-09  0.798674  high      3
 9  2020-01-10  0.174116  high      3]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在列中的每个不同值上拆分 DataFrame? - How to split a DataFrame on each different value in a column? 如何在数据框中拆分一列并将每个值存储为新行(以熊猫为单位)? - How to split a column in a dataframe and store each value as a new row (in pandas)? 基于字符串列值拆分熊猫数据框 - split pandas dataframe based on string column value 我如何以这种方式拆分熊猫数据框,以便为每个拆分值创建一个列 - How can i split a pandas dataframe in such a way that for each split value it creates a column Pandas 数据帧在列中每次出现值 (True) 时拆分或分组数据帧 - Pandas dataframe split or groupby dataframe at each occurence of value (True) in column 如何在 dataframe 的每一列中找到特定字符串的 value_count - How to find the value_count of a specific string in each column of the dataframe 如何将Spark Dataframe列的每个值作为字符串传递给python UDF? - How to pass each value of Spark Dataframe column as string to python UDF? 是否可以拆分列值并同时为数据框添加新列? - Is it possible to split a column value and add a new column at the same time for dataframe? Python:如何在数据框中拆分字符串列? - Python: How to split a string column in a dataframe? 如何拆分火花数据框列字符串? - How to split a spark dataframe column string?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM