Pandas 将列添加到关联字符串值的新数据帧？

Question

I am trying to add a column from one dataframe to another,我正在尝试将一列从一个 dataframe 添加到另一个，

df.head()

street_map2[["PRE_DIR","ST_NAME","ST_TYPE","STREET_ID"]].head()

The PRE_DIR is just the prefix of the street name. PRE_DIR只是街道名称的前缀。 What I want to do is add the column STREET_ID at the associated street to df .我想要做的是将相关街道的STREET_ID列添加到df 。 I have tried a few approaches but my inexperience with pandas and the comparison of strings is getting in the way,我尝试了一些方法，但是我对 pandas 缺乏经验，并且字符串的比较妨碍了我，

street_map2['STREET'] = df["STREET"]
street_map2['STREET'] = np.where(street_map2['STREET'] == street_map2["ST_NAME"])

The above code shows an "ValueError: Length of values does not match length of index".上面的代码显示了“ValueError：值的长度与索引的长度不匹配”。 I've also tried using street_map2['STREET'].str in street_map2["ST_NAME"].str .我也试过street_map2['STREET'].str in street_map2["ST_NAME"].str 。 Can anyone think of a good way to do this?谁能想到一个好的方法来做到这一点？ (note it doesn't need to be 100% accurate just get most and it can be completely different from the approach tried above) （请注意，它不需要 100% 准确，只需获得最多，它可能与上面尝试的方法完全不同）

EDIT Thank you to all who have tried so far I have not resolved the issues yet.编辑感谢到目前为止所有尝试过的人，我还没有解决问题。 Here is some more data,这里还有一些数据，

street_map2["ST_NAME"]

I have tried this approach as suggested but still have some indexing problems,我已经按照建议尝试了这种方法，但仍然存在一些索引问题，

def get_street_id(street_name):
     return street_map2[street_map2['ST_NAME'].isin(df["STREET"])].iloc[0].ST_NAME

df["STREET_ID"] = df["STREET"].map(get_street_id)
df["STREET_ID"]

This throws this error,这会引发此错误，

If it helps the data frames are not the same length.如果有帮助，数据帧的长度不同。 Any more ideas or a way to fix the above would be greatly appreciated.任何更多的想法或解决上述问题的方法将不胜感激。

Answer 1

For you to do this, you need to merge these dataframes.为此，您需要合并这些数据框。 One way to do it is:一种方法是：

df.merge(street_map2, left_on='STREET', right_on='ST_NAME')

What this will do is: it will look for equal values in ST_NAME and STREET columns and fill the rows with values from the other columns from both dataframes.这将做的是：它将在ST_NAME和STREET列中查找相等的值，并用来自两个数据帧的其他列的值填充行。

Check this link for more information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html查看此链接以获取更多信息： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.ZFC35FDC70D5FC69D269883A8227C

Also, the strings on the columns you try to merge on have to match perfectly (case included).此外，您尝试合并的列上的字符串必须完全匹配（包括大小写）。

Answer 2

You can do something like this, with a map function:您可以使用map function 执行以下操作：

df["STREET_ID"] = df["STREET"].map(get_street_id)

Where get_street_id is defined as a function that, given a value from df["STREET"] .其中get_street_id被定义为 function ，给定来自df["STREET"]的值。 will return a value to insert into the new column:将返回一个值以插入新列：

(disclaimer; currently untested) （免责声明；目前未经测试）

def get_street_id(street_name):
    return street_map2[street_map2["ST_NAME"] == street_name].iloc[0].ST_NAME

We get a dataframe of street_map2 filtered by where the st-name column is the same as the street-name:我们得到 street_map2 的 dataframe 过滤，其中 st-name 列与 street-name 相同：

street_map2[street_map2["ST_NAME"] == street_name]

Then we take the first element of that with iloc[0] , and return the ST_NAME value.然后我们使用iloc[0]获取它的第一个元素，并返回ST_NAME值。

We can then add that error-tolerance that you've addressed in your question by updating the indexing operation:然后，我们可以通过更新索引操作来添加您在问题中解决的容错：

...
street_map2[street_map2["ST_NAME"].str.contains(street_name)]
...

or perhaps,也许，

...
street_map2[street_map2["ST_NAME"].str.startswith(street_name)]
...

Or, more flexibly:或者，更灵活：

...
street_map2[
    street_map2["ST_NAME"].str.lower().replace("street", "st").startswith(street_name.lower().replace("street", "st"))
]
...

...which will lowercase both values, convert, for example, "street" to "st" (so the mapping is more likely to overlap) and then check for equality. ...这会将两个值都小写，例如将“street”转换为“st”（因此映射更有可能重叠），然后检查是否相等。

If this is still not working for you, you may unfortunately need to come up with a more accurate mapping dataset between your street names.如果这仍然不适合您，您可能需要在街道名称之间提供更准确的映射数据集。 It is very possible that the street names are just too different to easily match with string comparisons.街道名称很可能太不同而无法轻松匹配字符串比较。

(If you're able to provide some examples of street names and where they should overlap, we may be able to help you better develop a "fuzzy" match!) （如果您能够提供一些街道名称的示例以及它们应该在哪里重叠，我们也许可以帮助您更好地开发“模糊”匹配！）

Answer 3

Alright, I managed to figure it out but the solution probably won't be too helpful if you aren't in the exact same situation with the same data.好吧，我设法弄明白了，但如果你不是在完全相同的情况下使用相同的数据，该解决方案可能不会有太大帮助。 Bernardo Alencar's answer was essential correct except I was unable to apply an operation on the strings while doing the merge (I still am not sure if there is a way to do it). Bernardo Alencar 的回答基本正确，只是在进行合并时我无法对字符串应用操作（我仍然不确定是否有办法做到这一点）。 I found another dataset that had the street names formatted similar to the first.我发现另一个数据集的街道名称格式与第一个相似。 I then merged the first with the third new data frame.然后我将第一个与第三个新数据框合并。 After this I had the first and second both with columns ["STREET_ID"] .在此之后，我有第一个和第二个列["STREET_ID"] 。 Then I finally managed to merge the second one with the combined one by using,然后我终于设法通过使用将第二个与合并的一个合并，

temp = combined["STREET_ID"]
CrimesToMapDF = street_maps.merge(temp, left_on='STREET_ID', right_on='STREET_ID')

Thus getting the desired final data frame with associated street ID's从而获得具有相关街道 ID 的所需最终数据框

Pandas 将列添加到关联字符串值的新数据帧？

问题描述

3 个解决方案

解决方案1
2 2019-11-15 00:47:06

解决方案2
1 2019-11-15 00:54:19

解决方案3
0 已采纳 2019-11-16 21:58:12

Pandas 将列添加到关联字符串值的新数据帧？

问题描述

3 个解决方案

解决方案1 2 2019-11-15 00:47:06

解决方案2 1 2019-11-15 00:54:19

解决方案3 0 已采纳 2019-11-16 21:58:12

解决方案1
2 2019-11-15 00:47:06

解决方案2
1 2019-11-15 00:54:19

解决方案3
0 已采纳 2019-11-16 21:58:12