I am new to the python and pandas
.Now, Here , I have the value_counts from three different dataframe columns , which I have converted into data-frame using the following,
df1 = pd.DataFrame()
df1 = first_count.rename_axis('PredictedFeature').reset_index(name='counts') ,In the same way I got three dataframes ,
df1 =
predictedFeature counts
100 100
200 300
2200 150
0 11
10 15
dF2 =
predictedFeature counts
100 200
200 310
2100 150
2200 123
160 4
0 100
df3 =
predictedFeature counts
100 112
200 190
3600 89
156 2
2200 180
0 10
Now, for merging these dataframes , I tried
df_final = [df1, df2, df3]
df_final_percentage = reduce(lambda left, right: pd.merge(left, right, on='PredictedFeature'), df_final)
after doing this, it is creating the dataframe, but it is taking only the common predictedFeatures values.
So, I am getting the final dataframe like ,
predictedFeature counts_x counts_y counts
100 100 200 112
200 300 310 190
2200 150 123 180
How can I get all the values from these three, if a predictedFeature is not present for a data-frame then there should be 0 at that place .
Output would be like ,
PredictedFeature counts_x counts_y counts
100 100 200 112
200 300 310 190
2200 150 123 180
2100 0 150 0
160 0 4 0
3600 0 0 89
156 0 0 2
can any one help me with this ?
One thing is that while dividing the
df["counts_y"] = df["counts_y"] * 100 / df["counts_x"]
df["counts_per"] = df["counts"] * 100 / df["counts_x"]
will the 0 in the values affects the percentage calculation ?
cols = ["PredictedFeature", "counts_per", "counts_y"]
df_percentage.to_csv('data.csv', columns=cols)
for creating the percentage csv.
I think you can use outer
join with replace missing values to 0
:
df_final = [df1, df2, df3]
df_final_percentage = (reduce(lambda left, right: pd.merge(left,
right,
on='predictedFeature',
how='outer'), df_final)
.fillna(0)
.astype(int))
print (df_final_percentage)
predictedFeature counts_x counts_y counts
0 100 100 200 112
1 200 300 310 190
2 2200 150 123 180
3 2100 0 150 0
4 160 0 4 0
5 3600 0 0 89
6 156 0 0 2
Another solution with concat
:
dfs = [x.set_index('predictedFeature') for x in df_final]
df_final_percentage = pd.concat(dfs, axis=1).fillna(0).reset_index().astype(int)
print (df_final_percentage)
predictedFeature counts counts counts
0 100 100 200 112
1 156 0 0 2
2 160 0 4 0
3 200 300 310 190
4 2100 0 150 0
5 2200 150 123 180
6 3600 0 0 89
EDIT1:
For filter out 0
and 10
values use:
df_final = [df1, df2, df3]
df_final = [x[~x['predictedFeature'].isin([0,10])] for x in df_final]
df_final_percentage = (reduce(lambda left, right: pd.merge(left,
right,
on='predictedFeature',
how='outer'), df_final)
.fillna(0)
.astype(int))
print (df_final_percentage)
predictedFeature counts_x counts_y counts
0 100 100 200 112
1 200 300 310 190
2 2200 150 123 180
3 2100 0 150 0
4 160 0 4 0
5 3600 0 0 89
6 156 0 0 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.