I have two data frame:
Support data:
support_data = {
'index_value': [
100,
250,
500,
30,
10
]
}
support_df = pd.DataFrame(support_data)
index_value
0 100
1 250
2 500
3 30
4 10
Main data:
data = {
'link_index': [
'0', '0',
'0', '1',
'2', '3',
'3', '4',
'4', '4'
],
'value_1': [
'1', '2',
'3', '4',
'5', '6',
'7', '8',
'9', '0'
],
'value_2': [
'11', '28',
'33', '40',
'50', '60',
'70', '80',
'90', '100'
]
}
df = pd.DataFrame(data)
link_index value_1 value_2
0 0 1 11
1 0 2 28
2 0 3 33
3 1 4 40
4 2 5 50
5 3 6 60
6 3 7 70
7 4 8 80
8 4 9 90
9 4 0 100
I need to slice data frame and to zip value_1 and value_2 and append value from support data frame by link_index.
I have worked solution, but it is slow. Maybe exist more fast decision.
My solution and result:
Function zip values and append value from support data frame.
def write(group):
value_1 = group.value_1.tolist()
value_2 = group.value_2.tolist()
result = [b for a in zip(value_1, value_2) for b in a]
index = group.link_index.astype(int).iloc[0]
result.append(support_df.index_value.iloc[index])
result = ','.join(str(e) for e in result)
return result
Cycle split data frame on slices with length = nrows and step = overlap:
overlap = 1
nrows = 2
for i in range(0, len(df) - overlap, nrows - overlap):
row = write(df.iloc[i : i + nrows])
result = result.append(pd.DataFrame({'seq' : [row]}), ignore_index=True)
Result:
seq
0 1,11,2,28,100
1 2,28,3,33,100
2 3,33,4,40,100
3 4,40,5,50,250
4 5,50,6,60,500
5 6,60,7,70,30
6 7,70,8,80,30
7 8,80,9,90,10
8 9,90,0,100,10
I expect more fast solution.
You can try this (I haven't compared speed but this doesn't involve any for loops):
# prepare data type of link_index to merge
support_df = support_df.reset_index().rename(columns={'index':'link_index'})
support_df['link_index'] = support_df['link_index'].astype(str)
merged = pd.merge(df, support_df, on="link_index")
# split data into two halves with an offset
left = merged[['value_1', 'value_2', 'index_value']].iloc[:-1].reset_index(drop=True)
right = merged[['value_1', 'value_2']].iloc[1:].reset_index(drop=True)
# rename duplicate columns before concatenating them
left = left.rename(columns={'value_1':'left_1', 'value_2':'left_2'})
right = right.rename(columns={'value_1':'right_1', 'value_2':'right_2'})
# rejoin data and convert to Series
result = pd.concat([left, right], axis=1)
result = result[['left_1', 'left_2', 'right_1', 'right_2', 'index_value']]
seq = pd.Series(result.values.tolist())
print(seq)
Output:
0 [1, 11, 2, 28, 100]
1 [2, 28, 3, 33, 100]
2 [3, 33, 4, 40, 100]
3 [4, 40, 5, 50, 250]
4 [5, 50, 6, 60, 500]
5 [6, 60, 7, 70, 30]
6 [7, 70, 8, 80, 30]
7 [8, 80, 9, 90, 10]
8 [9, 90, 0, 100, 10]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.