将一些行的几乎相同副本添加到 dataframe

Question

我有以下 dataframe

import pandas as pd
tags = ['3 5', '6', '3 4 8', '5']
row_weight = [1,1,1,1]
all_other_columns = [2.35, 3.37, 7.44, 2.41]
df = pd.DataFrame({'tags' : tags, 'row_weight' : rew_weight, 'all_other_columns' : all_other_columns })
df

可以看出，有些行有多个tags ，这是不想要的。 所以我想制作新的 dataframe 女巫制作这些行的几乎相同的副本，每行只有一个tag ，并通过列row_weight跟踪这一点：

tags1 = ['3', '5', '6', '3', '4', '8', '5']
row_weight1 = [1/2, 1/2 ,1, 1/3, 1/3, 1/3 ,1]
all_other_columns1 = [2.35,2.35, 3.37, 7.44,7.44,7.44, 2.41]
df1 = pd.DataFrame({'tags' : tags1, 'row_weight' : row_weight1, 'all_other_columns' : all_other_columns1 })
df1

Answer 1

import pandas as pd
tags = ['3 5', '6', '3 4 8', '5']
row_weight = [1,1,1,1]
all_other_columns = [2.35, 3.37, 7.44, 2.41]
df = pd.DataFrame({'tags' : tags, 'row_weight' : row_weight, 'all_other_columns' : all_other_columns })

# Turn tags into a list of integers
df['tags']  = df['tags'].str.split()
# Divide row_weight by number of tags
df['row_weight'] = df['row_weight'] / df['tags'].apply(len)
# Explode tags so each tag is its own row
df.explode('tags')

Output

  tags  row_weight  all_other_columns
0   3   0.500000    2.35
0   5   0.500000    2.35
1   6   1.000000    3.37
2   3   0.333333    7.44
2   4   0.333333    7.44
2   8   0.333333    7.44
3   5   1.000000    2.41

将一些行的几乎相同副本添加到 dataframe

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-12-10 17:19:14

将一些行的几乎相同副本添加到 dataframe

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-12-10 17:19:14

解决方案1
2 已采纳 2020-12-10 17:19:14