[英]How do I save the .csv file where i converted the string data into numerical data of a specific column of a .csv file using pandas?
I wrote this program where i converted the string data into numerical data of the given rows.我编写了这个程序,将字符串数据转换为给定行的数字数据。 The actual csv file is in here.
实际的 csv 文件在这里。
> df.Sex[df.Sex == 'M'] = 1 df.Sex[df.Sex == 'F'] = 0
> #changing ChestPainType of TA ,ATA,NAP and ASY into 1,2,3 and 4 df.ChestPainType[df.ChestPainType == 'TA'] = 1
> df.ChestPainType[df.ChestPainType == 'ATA'] = 2
> df.ChestPainType[df.ChestPainType == 'NAP'] = 3
> df.ChestPainType[df.ChestPainType == 'ASY'] = 4
> # changing ExerciseAngina of N = 0 and Y = 1 df.ExerciseAngina[df.ExerciseAngina == 'N'] = 0
> df.ExerciseAngina[df.ExerciseAngina == 'Y'] = 1
> # changing RestingECG of Normal,ST and LVH into 1,2 and 3 df.RestingECG[df.RestingECG == 'Normal'] = 1
> df.RestingECG[df.RestingECG == 'ST'] = 2 df.RestingECG[df.RestingECG
> == 'LVH'] = 3
>
> df.ST_Slope[df.ST_Slope == 'Up'] = 1 df.ST_Slope[df.ST_Slope ==
> 'Flat'] = 2 df.ST_Slope[df.ST_Slope == 'Down'] = 3 df.head()
and it is showing the output of the first 5 rows of the file.它显示了文件前 5 行的输出。
but afterwards when i try to print the correlations using this program:但之后当我尝试使用这个程序打印相关性时:
pearsoncorr = df.corr(method = 'pearson') #df = the .csv file i am working with.
pearsoncorr
the output it is showing me is this.它显示给我的输出是这样的。
Here, I want to see the correlations of the new csv file i made changes earlier which should be this file and the expected correlation output should be almost like this showing all the columns .在这里,我想查看我之前更改的新 csv 文件的相关性,该文件应该是这个文件,并且预期的相关性输出应该几乎像这样显示所有列。 But this correlative table is showing me the correlations of this csv file.
但是这个相关表向我展示了这个csv 文件的相关性。
The question is, How can i save the modified .csv file?问题是,如何保存修改后的 .csv 文件?
PS I am new in this site so if there are any errors I made, i apologize and i will be glad if you let me know how can i change it. PS我是这个网站的新手,所以如果我犯了任何错误,我深表歉意,如果你让我知道如何更改它,我会很高兴。
You're most probably trying to set the values on a copy of a slice from the DataFrame您很可能正在尝试在 DataFrame 中的切片副本上设置值
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
SettingWithCopyWarning:试图在 DataFrame 中的切片副本上设置值
You can see on view-versus-a-copy and SettingWithCopyWarning why this may be a problem.您可以在view-versus-a-copy和SettingWithCopyWarning上看到为什么这可能是一个问题。
When using df.Sex[df.Sex == 'M'] = 1
the original DataFrame may not have been altered.使用
df.Sex[df.Sex == 'M'] = 1
时,原始 DataFrame 可能没有被更改。 You can check this using df.info()
and inspect the Dtype
column.您可以使用
df.info()
检查并检查Dtype
列。
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 918 non-null int64
1 Sex 918 non-null object
2 ChestPainType 918 non-null object
3 RestingBP 918 non-null int64
...
In the previously mentioned link from Pandas documentation the recommended access method is using .loc
for multiple items (using mask, df.Sex == 'M'
in your case) and a single item using a fixed index:在前面提到的 Pandas 文档链接中,推荐的访问方法是对多个项目使用
.loc
(在您的情况下使用掩码, df.Sex == 'M'
)和使用固定索引的单个项目:
df.loc[df.Sex == 'M', 'Sex'] = 1
df.loc[df.Sex == 'F', 'Sex'] = 0
Another option would be to use Pandas map , that, in my opinion better expresses the code intent.另一种选择是使用 Pandas map ,在我看来,它更好地表达了代码意图。
df.Sex = df.Sex.map({'M':1, 'F':0})
df.ChestPainType = df.ChestPainType.map({'TA': 1, 'ATA': 2, 'NAP': 3, 'ASY': 4})
df.ExerciseAngina = df.ExerciseAngina.map({'N': 0, 'Y': 1})
...
Cheking again for the Dtype
column we can ensure that the values are in fact of type integer, allowing you to use df.corr
(or save the Dataframe with the new values to a csv with to_csv
).再次检查
Dtype
列,我们可以确保这些值实际上是整数类型,允许您使用df.corr
(或使用 to_csv 将具有新值的to_csv
保存到 csv)。
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 918 non-null int64
1 Sex 918 non-null int64
2 ChestPainType 918 non-null int64
3 RestingBP 918 non-null int64
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.