I want to check if the value of a column in a dataframe is equal to 0 and if it is 0, I want to substitute the previous row's value of that column.
Suppose my data is as follows,
X
683 24.48
684 24.48
685 24.48
686 24.48
687 24.48
688 24.48
689 25.02
690 25.02
691 25.02
692 25.02
693 25.02
694 25.02
695 25.02
696 25.02
697 25.32
698 25.32
699 25.32
700 0.00
701 0.00
702 0.00
703 0.00
704 0.00
705 0.00
706 0.00
707 0.00
708 0.00
709 0.00
710 0.00
711 0.00
712 0.00
713 0.00
714 0.00
715 0.00
716 0.00
717 0.00
718 0.00
719 0.00
720 0.00
721 0.00
722 0.00
723 0.00
724 0.00
725 0.00
726 0.00
727 0.00
728 0.00
729 0.00
730 0.00
731 0.00
732 0.00
733 0.00
734 0.00
735 0.00
736 0.00
737 0.00
738 0.00
739 0.00
740 0.00
741 0.00
742 0.00
743 0.00
744 0.00
745 0.00
746 0.00
747 0.00
748 0.00
749 0.00
750 0.00
751 0.00
752 0.00
753 0.00
754 0.00
755 0.00
756 0.00
757 0.00
758 0.00
759 0.00
760 0.00
761 0.00
762 0.00
763 0.00
764 0.00
765 0.00
766 0.00
767 0.00
768 0.00
769 0.00
770 0.00
771 0.00
772 0.00
773 0.00
774 0.00
775 0.00
776 0.00
777 0.00
778 0.00
779 0.00
780 0.00
781 0.00
782 0.00
783 0.00
784 0.00
785 0.00
786 0.00
787 0.00
788 0.00
789 0.00
790 0.00
791 0.00
792 0.00
793 0.00
794 0.00
795 0.00
796 0.00
797 0.00
798 0.00
799 0.00
800 0.00
801 0.00
802 0.00
803 0.00
804 0.00
805 0.00
806 0.00
807 0.00
808 0.00
809 0.00
810 0.00
811 0.00
812 0.00
813 0.00
814 0.00
815 0.00
816 0.00
817 0.00
818 0.00
819 0.00
820 0.00
821 0.00
822 0.00
823 0.00
824 0.00
825 0.00
826 0.00
827 0.00
828 0.00
829 0.00
830 0.00
831 0.00
832 0.00
833 0.00
834 0.00
835 0.00
836 0.00
837 0.00
838 0.00
839 0.00
840 0.00
841 0.00
842 0.00
843 0.00
844 0.00
845 0.00
846 0.00
847 0.00
848 0.00
849 0.00
850 0.00
851 0.00
852 0.00
853 0.00
854 0.00
855 0.00
856 0.00
857 0.00
858 0.00
859 0.00
860 0.00
861 0.00
862 0.00
863 0.00
864 0.00
865 0.00
866 0.00
867 0.00
868 0.00
869 0.00
870 0.00
871 0.00
872 0.00
873 0.00
874 0.00
875 0.00
876 0.00
877 0.00
878 0.00
879 0.00
880 0.00
881 0.00
882 0.00
883 0.00
884 0.00
885 0.00
886 0.00
887 0.00
888 0.00
889 0.00
890 0.00
891 0.00
892 0.00
893 0.00
894 0.00
895 0.00
896 0.00
897 0.00
898 0.00
899 0.00
900 0.00
901 0.00
902 0.00
903 0.00
904 0.00
905 0.00
906 0.00
907 0.00
908 0.00
909 0.00
910 0.00
911 0.00
912 0.00
913 0.00
914 0.00
915 0.00
916 0.00
917 0.00
918 0.00
919 0.00
920 0.00
921 0.00
922 0.00
923 0.00
924 0.00
925 0.00
926 0.00
927 0.00
928 0.00
929 0.00
930 0.00
931 0.00
932 0.00
933 0.00
934 0.00
935 0.00
936 0.00
937 0.00
938 0.00
939 0.00
940 0.00
941 0.00
942 0.00
943 0.00
944 0.00
945 0.00
946 0.00
947 0.00
948 0.00
949 0.00
950 0.00
951 0.00
952 0.00
953 0.00
954 0.00
955 0.00
956 0.00
957 0.00
958 0.00
959 0.00
960 0.00
961 0.00
962 0.00
963 0.00
964 0.00
965 0.00
966 0.00
967 0.00
968 0.00
969 0.00
970 0.00
971 0.00
972 0.00
973 0.00
974 0.00
975 0.00
976 0.00
977 0.00
978 0.00
979 0.00
980 0.00
981 0.00
982 0.00
983 0.00
984 0.00
985 0.00
986 0.00
987 0.00
988 0.00
989 0.00
990 0.00
991 0.00
992 0.00
993 0.00
994 0.00
995 0.00
996 0.00
997 0.00
998 0.00
999 0.00
1000 0.00
1001 0.00
1002 0.00
1003 0.00
1004 0.00
1005 0.00
1006 0.00
1007 0.00
1008 0.00
1009 0.00
1010 0.00
1011 0.00
1012 0.00
1013 0.00
1014 0.00
1015 0.00
1016 0.00
1017 0.00
1018 0.00
1019 0.00
1020 0.00
1021 0.00
1022 0.00
1023 0.00
1024 0.00
1025 0.00
1026 0.00
1027 0.00
1028 0.00
1029 0.00
1030 0.00
1031 0.00
1032 0.00
1033 0.00
1034 0.00
1035 0.00
1036 0.00
1037 0.00
1038 0.00
1039 0.00
1040 0.00
1041 0.00
1042 0.00
1043 0.00
1044 0.00
1045 0.00
1046 0.00
1047 0.00
1048 0.00
1049 0.00
1050 0.00
1051 0.00
1052 0.00
1053 0.00
1054 0.00
1055 0.00
1056 0.00
1057 0.00
1058 0.00
1059 0.00
1060 0.00
1061 0.00
1062 0.00
1063 0.00
1064 0.00
1065 0.00
1066 0.00
1067 0.00
1068 0.00
1069 0.00
1070 0.00
1071 0.00
1072 0.00
1073 0.00
1074 0.00
1075 0.00
1076 0.00
1077 0.00
1078 0.00
1079 0.00
1080 0.00
1081 0.00
1082 0.00
1083 0.00
1084 0.00
1085 0.00
1086 0.00
1087 0.00
1088 0.00
1089 0.00
1090 0.00
1091 0.00
1092 0.00
1093 0.00
1094 0.00
1095 0.00
1096 0.00
1097 0.00
1098 0.00
1099 0.00
1100 0.00
1101 0.00
1102 0.00
1103 0.00
1104 0.00
1105 0.00
1106 0.00
1107 0.00
1108 0.00
1109 0.00
1110 0.00
1111 0.00
1112 0.00
1113 0.00
1114 0.00
1115 0.00
1116 0.00
1117 0.00
1118 0.00
1119 0.00
1120 0.00
1121 0.00
1122 0.00
1123 0.00
1124 0.00
1125 0.00
1126 0.00
1127 0.00
1128 0.00
1129 0.00
1130 0.00
1131 0.00
1132 0.00
1133 0.00
1134 0.00
1135 0.00
1136 0.00
1137 0.00
1138 0.00
1139 0.00
1140 0.00
1141 0.00
1142 0.00
1143 0.00
1144 0.00
1145 0.00
1146 0.00
1147 0.00
1148 0.00
1149 0.00
1150 0.00
1151 0.00
1152 0.00
1153 0.00
1154 0.00
1155 0.00
1156 0.00
1157 0.00
1158 0.00
1159 0.00
1160 0.00
1161 0.00
1162 0.00
1163 0.00
1164 0.00
1165 0.00
1166 0.00
1167 0.00
1168 0.00
1169 0.00
1170 0.00
1171 0.00
1172 0.00
1173 0.00
1174 0.00
1175 0.00
1176 0.00
1177 0.00
1178 0.00
1179 0.00
1180 0.00
1181 0.00
1182 0.00
1183 0.00
1184 0.00
1185 0.00
1186 0.00
1187 0.00
1188 0.00
1189 0.00
1190 0.00
1191 0.00
1192 0.00
1193 0.00
1194 0.00
1195 0.00
1196 0.00
1197 0.00
1198 26.16
1199 26.16
1200 26.16
1201 26.34
Here in the 0s, I want the 25.32 value. The point is, initially my data starts at 0.
How can we do this in Python? I am able to do this in R. The following is the command I used,
library(zoo)
is.na(DF$a) <- DF$a == 0
DF$a <- na.locf(DF$a,na.rm=FALSE)
Any help would be appreciated.
Thanks
It sounds like you're looking for the method ffill.
Assuming you don't have NaN (or also want to fill in NaN in the same way as 0):
df = df.replace(0, np.NaN).ffill()
If you do have NaN and want keep NaN as is:
df.loc[~df.x.isnull(), 'x'] = df.loc[~df.x.isnull(), 'x'].replace(0, np.NaN).ffill()
This should work, provided you are willing to use numpy:
First, identify the indices which meet your condition using the where
function. (Assuming that you already have your data in an array, called x
in this case)
indices = np.where(x == 0.0)[0]
The use of [0]
in the previous line is necessary because calling where
without the optional replacement values returns a tuple with one element: an array of the indices where the condition is met (see the documentation for more info).
To replace the the zeroes with the previous non-zero:
for index in indices:
c[index] = c[index - 1]
This will automatically leave leading zeroes in place, and only correct the gaps after there have been non-zero values.
Version 1: you only want to substitute the first row with a 0.00 value:
a = range(1, 10)
a = a + [0,0,0,0,0]
# a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0]
out = [a[0]] + [x if x !=0 else y for x,y in zip(a[1:], a[:-1])]
# out = [1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 0, 0, 0, 0]
So now we have a list with two 9 values. The first zero has been replaced.
NOTE: I am assuming you cannot replace a 0 value in the first row.
Version 2: you want to substitute all 0.0 values, even if the previous row was substituted.
a = range(1, 10)
a = a + [0,0,0,0,0]
# a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0]
out = [a[0]]
for x in a[1:]:
if x != 0:
out.append(x)
else:
out.append(out[-1])
# out = [1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 9, 9]
Now all the 0 values have taken the last non-zero value.
Version 3: based on the op's feedback:
import itertools
a = [0,0,1,2,5,4,0,0,8,9,0,5,0,6]
# accept all leading zeros
out = [x for x in itertools.takewhile(lambda x: x == 0, a[1:])]
for x in a[len(out):]:
if x != 0:
out.append(x)
else:
out.append(out[-1]))
Really though, the best solution in my opnion, especially if you are familiar with R, is to use the Python Pandas package . See the "Fill gaps forward or backward" section. The solution is much more elegant than trying to do it all with native packages.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.