简体   繁体   中英

Using Boolean Logic to clean DF in pandas

df

shape   square
shape   circle
animal   NaN
NaN dog
NaN cat
NaN fish
color   red
color   blue

desired_df

shape   square
shape   circle
animal  dog
animal  cat
animal  fish
color   red
color   blue

I have a df contains information that needs to be normalized.

I have noticed a pattern that indicates how to join the columns and normalize the data.

If in Col1 != NaN and Col2 == NaN and directly in the following row Col1 == NaN and Col2 != NaN, then then values from Col1 and Col2 should be joined. This continues until arriving to a row that contains values Col1 != NaN and Col2 !=NaN .

Is there a way to solve this in pandas ?

The first step that I am thinking of is to create an additional column in order containing True/False values in order to determine what columns to join, however, once doing that, I am not sure how to assign the value in Col1 to all of the relevant values in Col2.

Any suggestions to arrive at desired result?

If your identified pattern is a heuristic which, nevertheless, I struggle to follow, you can instead try pd.Series.ffill and pd.Series.bfill to reach your desired result:

df[0] = df[0].ffill()
df[1] = df[1].bfill()

Then drop duplicates:

df = df.drop_duplicates()

print(df)

        0       1
0   shape  square
1   shape  circle
2  animal     dog
4  animal     cat
5  animal    fish
6   color     red
7   color    blue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM