简体   繁体   English

Pandas DataFrame 列命名约定

[英]Pandas DataFrame column naming conventions

Is there any commonly used Pandas DataFrame column naming convention?是否有任何常用的 Pandas DataFrame 列命名约定? Is PEP8 recommended here (ex. instance variables)?此处是否推荐PEP8 (例如实例变量)?

Concious that lots of data is loaded from external sources with headers but I'm curious what is the correct approach when I have to name/rename the columns on my own?意识到大量数据是从带有标题的外部源加载的,但我很好奇当我必须自己命名/重命名列时,正确的方法是什么?

Some people tend to use snake_case (lower case with underscores) so that they can access the column using period like this df.my_column有些人倾向于使用snake_case (带下划线的小写),以便他们可以使用像df.my_column这样的df.my_column访问列

I tend to always access colums using the df['my_column'] syntax because it avoids confusion with DataFrame methods and properties, and it easier to extend to slices and fancy indexing, so the snake case is not necessary.我倾向于总是使用df['my_column']语法访问列,因为它避免了与 DataFrame 方法和属性的混淆,并且更容易扩展到切片和花哨的索引,因此不需要蛇的情况。

In short, I think you should use whatever is clearest to a potential reader.简而言之,我认为您应该使用对潜在读者来说最清楚的内容。

要记住的另一件事是,如果您的应用程序也使用关系数据库 - 我建议您将 Pandas 命名约定与关系数据库表的列名保持一致。

There is no clear guidance from pandas founding fathers and the choice is really between already mentioned snake vs Pascal (camel) case , or df[my_column] vs df[MyColumn] , and is a matter of preference. pandas创始人没有明确的指导,选择实际上是在已经提到的帕斯卡(骆驼)案例之间,或者df[my_column]df[MyColumn] ,这是一个偏好问题。 A lot of R packages use snake case for dataframes.许多R包对数据帧使用蛇形大小写 I'd say snake case is more readable while Pascal case require fewer characters.我会说蛇形案例更具可读性,而Pascal 案例需要更少的字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM