I have a data set with a header row and multiple sub lines that are associated like this.
Step status
0 010000000409139
1 00001
2 00002
3 00003
4 00004
5 00007
6 00005
7 00006
8 00008
9 010000000473498
10 00001
11 00002
What I want is just the header line repeated for all its' lines:
Step status
0 010000000409139
1 010000000409139
2 010000000409139
3 010000000409139
4 010000000409139
5 010000000409139
6 010000000409139
7 010000000409139
8 010000000409139
9 010000000473498
10 010000000473498
11 010000000473498
I tried to create a lambda function like this:
def logic(step):
if len(step) == 15:
return step
else:
return step.shift()
pm2['StepLogic'] = pm2.apply(lambda x: logic(x['Step status']),axis=1)
I'm getting error: AttributeError: ("'str' object has no attribute 'shift'", 'occurred at index 1')
Is there a smarter way to get what I'm after?
You can create a boolean series by checking the len
of status
, use cumsum
to create a group number, and then groupby
on it and finally transform
:
df["status"] = df.groupby(df["status"].str.len().eq(15).cumsum())["status"].transform("first")
print (df)
Step status
0 0 010000000409139
1 1 010000000409139
2 2 010000000409139
3 3 010000000409139
4 4 010000000409139
5 5 010000000409139
6 6 010000000409139
7 7 010000000409139
8 8 010000000409139
9 9 010000000473498
10 10 010000000473498
11 11 010000000473498
尝试这个:
df['Status'] = df['Status'].where(df['Status'].str.len().gt(5)).fillna(method='ffill')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.