I have a dataframe that looks like this (it has many more rows and columns but this is how it is set up)
col1 col2 col3 col4 col5 col6 col7 col8
MSH a b e e r a d
PID c 6002324^^^WAMT d s PickB x
OBR e pickC PickD v z q
OBX g h e s y
ORC i j p p p m y
\n none none none none none none none
MSH a b e e r a d
PID c ^^^WAMT d s PickF x o
OBX g h e z s y p
ORC i j p p m y
OBR e pickE PickG v z q
OBX g h e s t
OBX i j p p p m t
OBX g h e s y
OBX i j p p p m y
\n none none none none none none none
MSH a b e e r a d
PID c 43222346^^^WAMT d s PickH x e
OBX g h e z s y p
ORC i j p p m y
OBR e pickI PickJ v z q
\n none none none none none none none
MSH a b e e r a d
PID c ^^^WAMT d s PickK x o
OBR e pickL PickM v z q
OBX g h e s y
The expect output dataframe would look like this:
col1 col2 col3 col4
^^^WAMT PickB PickC PickD
^^^WAMT PickK PickL PickM
Here is the Data as a DataFrame Constructor:
d = {'col1': ['MSH', 'PID', 'OBR', 'OBX', 'ORC', '/n', 'MSH', 'PID', 'OBX', 'ORC', 'OBR', 'OBX', 'OBX', 'OBX', 'OBX', '\n', 'MSH', 'PID', 'OBX', 'ORC', 'OBR', '\n', 'MSH', 'PID', 'OBR', 'OBX'], 'col2': ['a', 'b', 'e', 'g', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'g', 'i', 'g', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'none', 'a', 'c', 'e', 'g'], 'col3': ['b', '6002324^^^WAMT', 'pickC', 'h', 'j', 'nine', 'b', '^^^WAMT', 'h', 'j', 'PickE', 'h', 'j', 'h', 'j', 'none', 'b', '43222346^^^WAMT', 'h', 'j', 'PickI', 'none', 'b', '^^^WAMT', 'PickL', 'h'], 'col4': ['e', 'd', '', 'e', 'p', 'none', 'e', 'd', 'e', '', '', 'e', 'p', 'e', 'p', 'none', 'e', 'd', 'e', '', '', 'none', 'e', 'd', '', 'e'], 'col5': ['e', 's', 'PickD', '', 'p', 'none', 'e', 's', 'z', 'p', 'PickG', '', 'p', '', 'p', 'none', 'e', 's', 'z', 'p', 'PickJ', 'none', 'e', 's', 'PickM', ''], 'col6': ['r', 'PickB', 'v', 's', 'p', 'none', 'r', 'PickF', 's', 'p', 'v', 's', 'p', 's', 'p', 'none', 'r', 'PickH', 's', 'p', 'v', 'none', 'r', 'PickK', 'v', 's'], 'col7': ['a', 'b', 'e', '', 'i', 'none', 'a', 'c', 'g', 'i', '', 'g', 'i', 'g', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'none', 'a', 'c', 'e', 'g'], 'col8': ['a', 'b', 'e', 'g', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'g', 'i', '', 'i', 'none', 'a', 'c', 'g', 'i', 'e', 'none', 'a', 'c', 'e', '']}
df = pd.DataFrame(d)
I need to iterate through each row in the df and check if the first column of the df is equal to PID and then i need to check if the field with ^^^WMNT has any numbers ahead of the ^^^WMNT or if it is empty then I want to take out the ^^^WMNT and PickF from PID and PickE and PickG from OBR and out them in a new df. However, if PID column 3 has a number value in front of the ^^^WAMT then I do not want to add the PID or the OBR field to the new df.
So I don't know if it would be easier to just pull out all of the PID and OBR rows and then iterate through them afterwards to do the check and see if PickA has a value or if you could do it all together. I also do not know the best way to iterate through the rows and columns like it would like.
So far I have tried to Iterate through the df using this code but I does not seem to work:
for row, index in range(len(df)):
if df.loc[df[row] == 'MSH']:
if df.loc[df[index] == 0]:
# this is where i would have the pick this column value but I am not sure how to write this
Any help would be appreciated.
You can build a group to split on PID, then use a list comprehension to extract the data and feed it to a DataFrame constructor:
group = df['col1'].eq('PID').cumsum().values
out = pd.DataFrame([
(g.loc['PID', 'col3'], g.loc['OBR', 'col3'], g.loc['PID', 'col6'], g.loc['OBR', 'col5'])
for i,g in df.set_index('col1').groupby(group)
if i and g.loc['PID', 'col3'] == '^^^WAMT'],
columns=['A', 'B', 'C', 'D']
)
print(out)
Output:
A B C D
0 ^^^WAMT PickE PickF PickG
1 ^^^WAMT PickL PickK PickM
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.