Using a Pandas Dataframe, how can I split the strings in a specific column and then replace that string with the first index of the split?

Question

I am trying to clean the location data of a data set and some of the locations have multiple cities seperated by commas. I want to split the strings that have commas on the comma and then replace each string with the first index of the split. (ie; Mumbai, Delhi, Calcutta and then make it just Mumbai) This is the code I wrote to try and do it. Can show me tell me what I am doing wrong?

df_train = pd.read_csv("Final_Train_Dataset.csv", index_col= None)

for cell in df_train["location"]:
  new = df_train["location"].str.split(",")
df_train["new_location"] = new[0]
df_train["new_location"].head()

Any help is much appreciated. I dont think this is too hard to figure out, but I am new to pandas and we are using it for a project in a class.

Answer 1

这将解决您的问题.split(expand=True)

df_train["new_location"] = df_train["location"].str.split(expand=True)[0]

Using a Pandas Dataframe, how can I split the strings in a specific column and then replace that string with the first index of the split?

Question

1 answers

solution1
0 2021-11-13 19:42:17

Using a Pandas Dataframe, how can I split the strings in a specific column and then replace that string with the first index of the split?

Question

1 answers

solution1 0 2021-11-13 19:42:17

solution1
0 2021-11-13 19:42:17