Pandas 在从 CSV 读取的 DataFrame 的最后一列中添加额外的逗号

Question

How could I remove the extra comma that pandas adds to the last column of my dataset please?请问如何删除 pandas 添加到我的数据集最后一列的额外逗号？

Here's a sample of how the data looks in the CSV file: Sample of the data in the CSV file以下是 CSV 文件中的数据示例： CSV 文件中的数据示例

Here's the syntax that I'm using to import it:这是我用来导入它的语法：

import pandas as pd
df = pd.read_csv (r'*path*', sep='|')
print (df)

And here's how it appears in Spyder: Sample of the output in Spyder's console以下是它在 Spyder 中的显示方式： Spyder 控制台中 output 的示例

Adding lambda functions as following produced an error in Spyder's console:添加 lambda 函数如下在 Spyder 的控制台中产生错误：

import pandas as pd
df = pd.read_csv (r'C:\Users\mohamed.a.eshra\Downloads\us_cities_states_counties.csv', sep='|')
df["City alias,"] = df["City alias,"].apply(lambda x: x if x[-1]!="," else x[:-1])
df = df.rename(columns={"City alias,": "City alias"})
print (df)

Error after adding the previous functions:添加前面的函数后报错：

Traceback (most recent call last): File "C:\Users\mohamed.a.eshra.spyder-py3\Scripts\Opening a CSV file using Pandas.py", line 9, in df["City alias,"] = df["City alias,"].apply(lambda x: x if x[-1],=":" else x[:-1]) File "C.\ProgramData\Anaconda3\lib\site-packages\pandas\core\series,py", line 3848. in apply mapped = lib,map_infer(values, f. convert=convert_dtype) File "pandas_libs\lib,pyx", line 2329. in pandas._libs.lib.map_infer回溯（最近一次通话最后）：文件“C:\Users\mohamed.a.eshra.spyder-py3\Scripts\Opening a CSV file using Pandas.py”，第 9 行，在 df[“城市别名”] = df["城市别名,"].apply(lambda x: x if x[-1],=":" else x[:-1]) File "C.\ProgramData\Anaconda3\lib\site-packages\pandas \core\series,py", line 3848. in apply mapped = lib,map_infer(values, f. convert=convert_dtype) File "pandas_libs\lib,pyx", line 2329. in pandas._libs.lib.map_infer
File "C:\Users\mohamed.a.eshra.spyder-py3\Scripts\Opening a CSV file using Pandas.py", line 9, in df["City alias,"] = df["City alias,"].apply(lambda x: x if x[-1],=":" else x[:-1]) TypeError: 'float' object is not subscriptable文件 "C:\Users\mohamed.a.eshra.spyder-py3\Scripts\Opening a CSV file using Pandas.py", line 9, in df["City alias,"] = df["City alias,"] .apply(lambda x: x if x[-1],=":" else x[:-1]) TypeError: 'float' object 不可下标

I would appreciate your help in solving the issue.感谢您在解决问题方面的帮助。

Thank you!谢谢！

Answer 1

You can apply a function to the last column in order to remove the commas, and then rename the column:您可以将 function 应用于最后一列以删除逗号，然后重命名该列：

Edited to handle missing values:编辑处理缺失值：

import numpy as np
import pandas as pd

def clean_city_name(x):
   if pd.isnull(x):
        return np.NaN
   else:
       if x[-1]==',':
           return x[:-1]
   return x

df["City alias,"] = df["City alias,"].apply(clean_city_name)

df = df.rename(columns={"City alias,": "City alias"})

This way you will get rid of the commas.这样，您将摆脱逗号。

It would be however interesting to investigate why these commas appears, so that you can avoid running into this again.然而，调查这些逗号出现的原因会很有趣，这样您就可以避免再次遇到这种情况。

Usually the comma is the default separator in CSV files, so this comma at the end could be the sign that an empty column was present at the end in the original CSV before the separator was changed to "|"通常逗号是 CSV 文件中的默认分隔符，因此末尾的逗号可能表示在分隔符更改为“|”之前，原始 CSV 末尾存在空列( if this happened at some point, I don't know how your dataset was created so that stays hypothetical! ). （如果这发生在某个时候，我不知道您的数据集是如何创建的，所以这只是假设！ ）。

Answer 2

So I deleted several preceding columns in the source CSV file using Excel and the data frame loaded correctly.因此，我使用 Excel 删除了源 CSV 文件中的前几列，并正确加载了数据框。

Dataframe print in Spyder Dataframe 在 Spyder 中打印

Seems that there was an empty column in the underlying CSV.似乎底层 CSV 中有一个空列。 However, not sure how the dataset was loading in the first place as mentioned by "A Co".但是，不确定“A Co”提到的数据集是如何加载的。

Thank you.谢谢你。

Pandas 在从 CSV 读取的 DataFrame 的最后一列中添加额外的逗号

问题描述

2 个解决方案

解决方案1
1 2020-04-22 10:32:51

解决方案2
1 已采纳 2020-04-22 11:59:07

Pandas 在从 CSV 读取的 DataFrame 的最后一列中添加额外的逗号

问题描述

2 个解决方案

解决方案1 1 2020-04-22 10:32:51

解决方案2 1 已采纳 2020-04-22 11:59:07

解决方案1
1 2020-04-22 10:32:51

解决方案2
1 已采纳 2020-04-22 11:59:07