简体   繁体   English

Dataframe 根据来自另一列的值连续性从列中添加元素

[英]Dataframe add element from a column based on values contiguity from another columns

I have a df like this:我有一个这样的df:

a=[1,2,10,11,15,16,17,18,30]
b=[5,6,7,8,9,1,2,3,4]
df=pd.DataFrame(list(zip(a,b)),columns=['s','i'])

Using a I need to add elements of b.使用 a 我需要添加 b 的元素。

Result I would like:结果我想要:

(1-2)=5+6=11 (1-2)=5+6=11

(10-11)=7+8=15 (10-11)=7+8=15

(15-18)=9+1+2+3=15 (15-18)=9+1+2+3=15

(30)=4 (30)=4

My idea was to create a list of values that are continuous, take the difference(+1) and use it to calculate the sum of the corresponding b elements.我的想法是创建一个连续值的列表,取差(+1)并用它来计算相应 b 元素的总和。

#find continuous integer 
def r (nums):
    nums= list(df['s'])
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    return (list(zip(edges, edges)))

#difference 
a = r(df)
print (a)
for i in range (len(a)):
    diff = np.diff(a[i])+1

I am trying to use diff as a counter to add the value of b but obviously any single time the addition starts from the first value.我正在尝试使用 diff 作为计数器来添加 b 的值,但显然任何一次添加都从第一个值开始。 There is any simple way to add this number without changing b?有什么简单的方法可以在不改变 b 的情况下添加这个数字?

Using groupby + diff使用groupby + diff

df['i'].groupby(df['s'].diff().ne(1).cumsum()).sum()

1    11
2    15
3    15
4     4
Name: i, dtype: int64

Another solution:另一种解决方案:

df.groupby( ((df.s-df.s.shift(1))!=1).cumsum() ).i.sum()

reult:结果:

1    11
2    15
3    15
4     4
Name: i, dtype: int64

You could use NumPy as:您可以使用NumPy作为:

res = []
arr = df.values.copy()
for i in range(1, arr.shape[0]):
    if arr[i, 0] == arr[i-1, 0] + 1:
        arr[i, 1] = arr[i, 1] + arr[i-1, 1]
        
    else:
        res.append(arr[i-1, 1])
res.append(arr[-1, 1])
res

This will give:这将给出:

[11, 15, 15, 4]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于过滤器添加新列并添加来自另一个 DataFrame 的值 - Add new columns and add values from another DataFrame based on a filter 根据来自另一个数据框的值将列添加到数据框 - Add column to dataframe based on values from another dataframe 将列添加到 DataFrame 中,特定列的差异基于另一列的值 - Add columns to DataFrame with difference of specific columns based on values of another column 将新列添加到 dataframe 这是基于重复日期时间索引的前一个月的另一列的值,其他列作为标识符 - Add new column to dataframe that is another column's values from the month before based repeating datetime index with other columns as identifiers 如何根据第一个数据帧列中的值从另一个数据帧添加新列? - How to add new column from another dataframe based on values in column of first dataframe? 将列添加到 pandas dataframe 中,列中的值除以基于另一列的组的列的最大值? - Add column to pandas dataframe with values in a column divided by max of column based on group from another column? 根据时间标准将值从一个数据框中的多个列传输到另一个数据框中的新列 - Transferring values from multiple columns in a dataframe to a new column in another dataframe, based on time-criterion 如何根据另一个 dataframe 中的列的值合并 dataframe 的两列? - How to merge two columns of a dataframe based on values from a column in another dataframe? 根据数据框中另一列的值为列表中的数据框列分配值 - Assigning values to dataframe columns from a list based on value of another column in dataframe DataFrame 中的新列基于来自另一个 DataFrame 的行和列 - New column in DataFrame based on rows and columns from another DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM