[英]Using a Pandas Dataframe, how can I split the strings in a specific column and then replace that string with the first index of the split?
I am trying to clean the location data of a data set and some of the locations have multiple cities seperated by commas.我正在尝试清理数据集的位置数据,并且某些位置有多个以逗号分隔的城市。 I want to split the strings that have commas on the comma and then replace each string with the first index of the split.
我想拆分逗号上有逗号的字符串,然后用拆分的第一个索引替换每个字符串。 (ie; Mumbai, Delhi, Calcutta and then make it just Mumbai) This is the code I wrote to try and do it.
(即;孟买、德里、加尔各答,然后让它成为孟买)这是我写的代码,试图做到这一点。 Can show me tell me what I am doing wrong?
可以告诉我我做错了什么吗?
df_train = pd.read_csv("Final_Train_Dataset.csv", index_col= None)
for cell in df_train["location"]:
new = df_train["location"].str.split(",")
df_train["new_location"] = new[0]
df_train["new_location"].head()
Any help is much appreciated.任何帮助深表感谢。 I dont think this is too hard to figure out, but I am new to pandas and we are using it for a project in a class.
我不认为这很难弄清楚,但我是大熊猫的新手,我们正在将它用于课堂项目。
这将解决您的问题.split(expand=True)
df_train["new_location"] = df_train["location"].str.split(expand=True)[0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.